Pricing

from $2.90 / 1,000 results

Try for free

Go to Apify Store

Youtube Text Scraper

Try for free

Extract YouTube transcripts, subtitles, video metadata, hashtags, thumbnails, views, duration, and release dates from YouTube videos using either a search query or a list of direct YouTube URLs. It helps you turn YouTube search results and specific video links into structured JSON data

Pricing

from $2.90 / 1,000 results

Rating

5.0

(1)

Developer

Fabio Borsotti

Actor stats

Bookmarked

Total users

Monthly active users

a day ago

Last modified

YouTube Transcript Scraper Actor

Extract YouTube transcripts, subtitles, video metadata, hashtags, thumbnails, views, duration, and release dates from YouTube videos using either a search query or a list of direct YouTube URLs. This YouTube transcript scraper helps you turn YouTube search results and specific video links into structured JSON data for research, SEO, lead generation, monitoring, and content analysis.

What does YouTube Transcript Scraper do?

YouTube Transcript Scraper is an APIFY Actor that can search YouTube videos by keyword, apply a time filter, collect video metadata, and retrieve the first available transcript based on your preferred language order. It can also process direct YouTube video URLs and include those videos in the final output.

This Actor is ideal if you want to:

train RAG algorithms for AI
scrape YouTube transcripts
extract YouTube subtitles
collect YouTube video metadata
monitor YouTube search results
build datasets from YouTube videos
automate YouTube content research
extract transcripts from specific YouTube videos

The Actor uses:

pytubefix for YouTube search and metadata extraction
youtube-transcript-api for transcript and subtitle retrieval
Scrape.do as a proxy layer

Why use this YouTube scraper?

If you work with YouTube data, transcripts are one of the most valuable sources of structured content. Video transcripts help you analyze what creators actually say, not just what appears in titles and descriptions.

This Actor is useful because it supports two collection modes in the same run:

search by keyword using query
direct processing of specific YouTube videos using direct_url

This makes it practical both for broad monitoring and for targeted transcript extraction from known videos.

Common use cases

SEO research for YouTube videos and keywords
competitor monitoring on YouTube
AI training and text dataset preparation
content repurposing from video to text
lead generation from niche YouTube channels
trend monitoring by date range
transcript-based topic clustering
YouTube video catalog enrichment
transcript extraction from manually selected videos

What data does the Actor extract?

For each processed YouTube video, the Actor can return:

YouTube video ID
video title
video URL
channel name
channel URL
transcript text with timestamps
available subtitles metadata
video view count
video duration in seconds
release date
thumbnail URL
keywords / hashtags

Note: By default, the Actor extracts transcript only and returns empty values for metadata fields. To include video metadata and subtitles, set enableMetadata: true in the input. See the "Metadata Retrieval Strategy" section for details.

Why this Actor is useful

This YouTube Transcript Scraper is useful when you need structured YouTube data without building and maintaining your own scraping workflow. It is designed for users who want fast access to transcript-rich video data in a reusable JSON format.

Compared with a basic YouTube metadata scraper, this Actor focuses on transcript extraction and subtitle discovery, which makes it especially helpful for:

content intelligence
SEO workflows
machine learning pipelines
research automation
enrichment of video datasets

Input

Metadata Retrieval Strategy (`enableMetadata` flag)

By default, the Actor extracts transcripts only for optimal performance and minimal proxy credit consumption.

However, you can enable comprehensive metadata retrieval by setting enableMetadata: true:

enableMetadata: false (default): Fast, low-credit mode
- Extracts: transcript text + video URL
- Output fields for metadata are present but empty (None, [])
- Recommended for large-scale transcript extraction
enableMetadata: true: Full metadata mode
- Extracts: transcript + video metadata + available subtitles
- Metadata fields populated: channel name/URL, view count, duration, release date, thumbnail, hashtags
- Subtitles field includes full list of available subtitle languages
- Useful for comprehensive video data collection

When enableMetadata: true, the Actor emits a single apify-actor-metadata event per run to notify external systems.

Example: Enabling metadata

{
  "query": "apify tutorial",
  "limit": 5,
  "langs": ["it", "en"],
  "enableMetadata": true,
  "file_output": "output.json"
}

Supported input fields

direct_url (array of strings, optional): list of direct YouTube video URLs to process. Supported formats include youtube.com/watch?v=..., youtu.be/..., and youtube.com/shorts/...
query (string, optional): YouTube search query. Optional if direct_url is provided
range (array of strings): time filter for YouTube search, one of hour, today, this_week, this_month, this_year
limit (integer): maximum number of videos to process from search results
langs (array of strings): preferred language order used when selecting transcripts (see table before)
file_output (string): name of the JSON file saved in the key-value store
enableMetadata (boolean, optional): if true, fetches video metadata and subtitles; if false or omitted, extracts transcript only (default: false)

At least one between query and direct_url must be provided.

Supported languages (iso ISO 639-1)

Codice	Lingua	Codice	Lingua	Codice	Lingua	Codice	Lingua	Codice	Lingua	Codice	Lingua
aa	Afar	ab	Abkhaz	ae	Avestan	af	Afrikaans	ak	Akan	am	Amharic
ar	Arabic	as	Assamese	av	Avaric	ay	Aymara	az	Azerbaijani	ba	Bashkir
be	Belarusian	bg	Bulgarian	bi	Bislama	bm	Bambara	bn	Bengali	bo	Tibetan
br	Breton	bs	Bosnian	ca	Catalan	ce	Chechen	ch	Chamorro	co	Corsican
cr	Cree	cs	Czech	cu	Church Slavic	cv	Chuvash	cy	Welsh	da	Danish
de	German	dv	Divehi	dz	Dzongkha	ee	Ewe	el	Greek	en	English
eo	Esperanto	es	Spanish	et	Estonian	eu	Basque	fa	Persian	ff	Fulah
fi	Finnish	fj	Fijian	fo	Faroese	fr	French	fy	Western Frisian	ga	Irish
gd	Scottish Gaelic	gl	Galician	gn	Guarani	gu	Gujarati	gv	Manx	ha	Hausa
he	Hebrew	hi	Hindi	ho	Hiri Motu	hr	Croatian	ht	Haitian	hu	Hungarian
hy	Armenian	hz	Herero	ia	Interlingua	id	Indonesian	ie	Interlingue	ig	Igbo
ii	Sichuan Yi	ik	Inupiaq	io	Ido	is	Icelandic	it	Italian	iu	Inuktitut
ja	Japanese	jv	Javanese	ka	Georgian	kg	Kongo	ki	Kikuyu	kj	Kuanyama
kk	Kazakh	kl	Kalaallisut	km	Central Khmer	kn	Kannada	ko	Korean	kr	Kanuri
ks	Kashmiri	ku	Kurdish	kv	Komi	kw	Cornish	ky	Kirghiz	la	Latin
lb	Luxembourgish	lg	Ganda	li	Limburgan	ln	Lingala	lo	Lao	lt	Lithuanian
lu	Luba-Katanga	lv	Latvian	mg	Malagasy	mh	Marshallese	mi	Maori	mk	Macedonian
ml	Malayalam	mn	Mongolian	mr	Marathi	ms	Malay	mt	Maltese	my	Burmese
na	Nauru	nb	Norwegian Bokmål	nd	North Ndebele	ne	Nepali	ng	Ndonga	nl	Dutch
nn	Norwegian Nynorsk	no	Norwegian	nr	South Ndebele	nv	Navajo	ny	Chichewa	oc	Occitan
oj	Ojibwa	om	Oromo	or	Oriya	os	Ossetian	pa	Panjabi	pi	Pali
pl	Polish	ps	Pashto	pt	Portuguese	qu	Quechua	rm	Romansh	rn	Rundi
ro	Romanian	ru	Russian	rw	Kinyarwanda	sa	Sanskrit	sc	Sardinian	sd	Sindhi
se	Northern Sami	sg	Sango	sh	Serbo-Croatian	si	Sinhala	sk	Slovak	sl	Slovenian
sm	Samoan	sn	Shona	so	Somali	sq	Albanian	sr	Serbian	ss	Swati
st	Southern Sotho	su	Sundanese	sv	Swedish	sw	Swahili	ta	Tamil	te	Telugu
tg	Tajik	th	Thai	ti	Tigrinya	tk	Turkmen	tl	Tagalog	tn	Tswana
to	Tonga	tr	Turkish	ts	Tsonga	tt	Tatar	tw	Twi	ty	Tahitian
ug	Uighur	uk	Ukrainian	ur	Urdu	uz	Uzbek	ve	Venda	vi	Vietnamese
vo	Volapük	wa	Walloon	wo	Wolof	xh	Xhosa	yi	Yiddish	yo	Yoruba
za	Zhuang	zh	Chinese	zu	Zulu

Example input with direct URLs only

{
  "direct_url": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://youtu.be/9bZkp7q19f0"
  ],
  "langs": ["it", "en"],
  "file_output": "output.json"
}

Example input with query and direct URLs

{
  "direct_url": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  ],
  "query": "apify tutorial",
  "range": ["this_week"],
  "limit": 5,
  "langs": ["it", "en"],
  "file_output": "output.json"
}

Example input with direct URLs only and enableMetadata = true

{
  "direct_url": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://youtu.be/9bZkp7q19f0"
  ],
  "langs": ["it", "en"],
  "enableMetadata": true,
  "file_output": "output.json"
}

Example input with query and direct URLs and enableMetadata = true

{
  "direct_url": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  ],
  "query": "apify tutorial",
  "range": ["this_week"],
  "limit": 5,
  "langs": ["it", "en"],
  "enableMetadata": true,
  "file_output": "output.json"
}

When both direct_url and query are provided, all videos listed in direct_url are processed and added to the output together with the search results. When enableMetadata is set to true, the Actor will fetch comprehensive metadata and available subtitles for each video and emit an apify-actor-metadata event upon completion.

Output

Each dataset item contains structured YouTube transcript and metadata fields like these:

{
  "id": "video_id",
  "title": "Video title",
  "url": "https://www.youtube.com/watch?v=video_id",
  "autor": "Channel name",
  "text": "[00:00] transcript text...",
  "channel_name": "Channel name",
  "channel_url": "https://www.youtube.com/@channel",
  "video_title": "Detailed video title",
  "video_url": "https://www.youtube.com/watch?v=video_id",
  "views_count": 123456,
  "duration_seconds": 542,
  "release_date": "2026-04-13T00:00:00",
  "thumbnail_url": "https://i.ytimg.com/vi/video_id/maxresdefault.jpg",
  "hashtags": ["apify", "youtube", "scraping"],
  "subtitles": [
    {
      "language_code": "en",
      "language": "English",
      "is_generated": true,
      "is_translatable": true
    }
  ]
}

The Actor also saves the complete output array into the APIFY key-value store using the file name specified in file_output.

How the YouTube transcript extraction works

The Actor follows this strategy:

If direct_url is provided, process each direct YouTube URL and extract transcript and metadata.
If query is provided, search YouTube videos using the selected time filter.
For each video, try to find a manually created transcript in the languages you requested.
If no manual transcript is available, try an auto-generated transcript.
If no requested language is available, fall back to the first available transcript.
Save transcript text and structured metadata in the output dataset.

This makes the Actor practical for multilingual transcript scraping, targeted video extraction, and broad topic monitoring.

Who is this Actor for?

This Actor is a good fit for:

SEO specialists
marketers
data engineers
content analysts
AI data learning teams
researchers
agencies monitoring YouTube niches
users who need transcripts from specific YouTube links

Limitations

The actor works only with public YouTube videos that have transcripts enabled. Private or restricted videos are not supported.

Summary

If you need a YouTube transcript scraper for APIFY that extracts subtitles, transcript text, hashtags, thumbnails, views, duration, and release date from either search results or direct video URLs, this Actor gives you a clean starting point with structured JSON output and proxy support.

YouTube Videos Scraper

thescrapelab/Apify-YouTube-Videos-Scraper-2-0

Scrape YouTube video metadata from search queries, channels, and video URLs without the YouTube API. Export titles, links, views, thumbnails, and channel data.