Under maintenance

Pricing

Pay per usage

Try for free

Go to Apify Store

YouTube Transcript Scraper

Under maintenance

Try for free

Get transcripts from YouTube videos and Shorts as plain text or structured timestamped segments. Results come with title, description, likes, channel details, and other metadata.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Embion

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

▶️ Scrape transcripts and metadata from YouTube videos and Shorts

This Actor gets transcripts supplied by the video creator or generated by YouTube. It works in two modes: full text or structured segments with exact timestamps. Built for automation pipelines: stable output, reliable retries, structured error codes, and proxy support.

Features:

Extracts transcripts as plain text or timestamped segments
Includes title, description, keywords, category, duration, publish date, channel name, subscriber count, and more
Supports HTML and plain text transcript formats
Writes consistent structure even when transcripts are missing
Retries on failures, tracks error reasons, produces structured error items
Residential proxies for highest reliability (recommended setting)

If you want to see the exact output format, check the section "Successful item example".

📦 Output dataset

Each processed URL produces one dataset item with the following structure:

✅ Successful item example

We truncated some long texts with "...etc" symbols to make examples easy to read. The real dataset output will contain full results without truncation.

Also note, that some videos do not have any transcripts for the language that you want, so the actor will still write the result to the dataset with caption_text and captions_structured fields both set to null.

Example of output when with_timestamps is true (enabled). captions_structured field is filled while caption_text field is null:

{
    "url": "https://youtu.be/dqwpQarrDwk", // URL provided as input in 'start_urls'
    "id": "dqwpQarrDwk", // video ID
    "title": "1,000km Cable to the Stars - The Skyhook", // video title
    "channel_name": "Kurzgesagt – In a Nutshell", // name of the channel which posted the video
    "channel_id": "UCsXVk37bltHxD1rDPwtNM8Q", // ID of the channel which posted the video
    "channel_url": "http://www.youtube.com/@kurzgesagt", // URL of the channel which posted the video
    "channel_subscribers_text": "24.8M subscribers", // subscriber count as shown on video page
    "channel_subscribers": 24800000, // subscriber count parsed as number
    "category": "Education", // category of the video
    "duration": 420, // total duration of the video in seconds, 0 if livestream
    "view_count": 12705758, // total number of views from microdata
    "like_count": 409346, // total number of likes from microdata
    "published_date": "2019-11-17T13:30:03.000Z", // when the video was published from microdata
    "published_date_text": "Nov 17, 2019", // when the video was published as it appears on the website
    "keywords": [ // keywords of the video
        "Skyhook",
        "Spacetether",
        "Tether",
        "Space",
        "Spavetravel",
        ...etc
    ],
    "caption_lang": "en", // language code of captions
    "caption_generated": false, // true if captions are auto-generated
    "caption_text": null, // null when "with_timestamps" input is true
    "captions_structured": [ // filled when "with_timestamps" is true and captions (transcript) exist
        {
            "start_ms": "1200", // milliseconds from start of the video when the text appears, guaranteed to be non-null if object exists
            "end_ms": "2920", // milliseconds from start of the video when the text hides, guaranteed to be non-null if object exists
            "snippet": "Getting to space is hard." // actual text in subtitles, may be null
        },
        {
            "start_ms": "3080",
            "end_ms": "6580",
            "snippet": "Right now, it’s like going up on a mountain on a unicycle-"
        },
        ...etc
    ],
    "available_captions": [ // list of all captions YouTube declares for the video
        {
            "language_code": "sq", // guaranteed to be non-null if object exists
            "name": "Albanian", // may be null depending on what YouTube returns
            "generated": false // guaranteed to be non-null if object exists
        },
        {
            "language_code": "ar",
            "name": "Arabic",
            "generated": false
        },
        ...etc
    ],
    "unlisted": false, // video requires direct URL and is hidden from search
    "live": false, // video is a livestream
    "error_code": null, // null on success
    "description": "Sources: https://sites.google.com/view/sources-skyhooks/\n\nGet your 12,020 SPACE Calendar here: https://shop.kurzgesagt.org/\nWORLDWIDE SHIPPING IS AVAILABLE!\n\nGetting to space is incredibly hard, expensive and needs a lot of resources. \nA more efficient way to get there is a Skyhook (or Spacetether), an ever rotating cable with a counter weight, that catapults spaceships from earth orbit into the depths of space. \n\n\nOUR CHANNELS\n▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀\nGerman Channel: https://kgs.link/youtubeDE \nSpanish Channel: https://kgs.link/youtubeES \n\n\nHOW CAN YOU SUPPORT US?\n▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀\nThis is how we make our living and it would be a pleasure if you support us!\n\nGet Merch designed with ❤ from https://kgs.link/shop" ...etc // raw description as shown below video
}

Example of output when with_timestamps setting is false (disabled). caption_text field is filled while captions_structured field is null:

{
    "url": "https://youtu.be/dqwpQarrDwk",
    "id": "dqwpQarrDwk",
    "title": "1,000km Cable to the Stars - The Skyhook",
    "channel_name": "Kurzgesagt – In a Nutshell",
    "channel_id": "UCsXVk37bltHxD1rDPwtNM8Q",
    "channel_url": "http://www.youtube.com/@kurzgesagt",
    "channel_subscribers_text": "24.8M subscribers",
    "channel_subscribers": 24800000,
    "category": "Education",
    "duration": 420,
    "view_count": 12705758,
    "like_count": 409346,
    "published_date": "2019-11-17T13:30:03.000Z",
    "published_date_text": "Nov 17, 2019",
    "keywords": [
        "Skyhook",
        "Spacetether",
        "Tether",
        "Space",
        "Spavetravel",
        ...etc
    ],
    "caption_lang": "en",
    "caption_generated": false,
    "caption_text": "Getting to space is hard. Right now, it’s like going up on a mountain on a unicycle- with a backpack full of explosives. Incredibly slow, you can’t transport a lot of stuff, and you might die. A rocket needs to reach a velocity about 40,000km an hour to escape from Earth. To get to that speed, rockets are mostly containers for fuel with a tiny tip of payload. This is bad if you want to go to other planets, because you need a lot of heavy stuff if you want to survive, and maybe even come back. So, is there a way to get to space with less fuel and more payload? A nice thing that solved most of our transport problems on Earth is what you call infrastructure. Whether it’s roads for cars, ports for ships, or rails for trains, we’ve made it easier to get to places. We can apply the same solution to space travel. Space infrastructure will make getting into orbit and out to the Moon, Mars, and beyond easier and cheaper." ...etc,
    "captions_structured": null,
    "available_captions": [
        {
            "language_code": "sq",
            "name": "Albanian",
            "generated": false
        },
        {
            "language_code": "ar",
            "name": "Arabic",
            "generated": false
        },
        ...etc
    ],
    "unlisted": false,
    "live": false,
    "error_code": null,
    "description": "Sources: https://sites.google.com/view/sources-skyhooks/\n\nGet your 12,020 SPACE Calendar here: https://shop.kurzgesagt.org/\nWORLDWIDE SHIPPING IS AVAILABLE!\n\nGetting to space is incredibly hard, expensive and needs a lot of resources. \nA more efficient way to get there is a Skyhook (or Spacetether), an ever rotating cable with a counter weight, that catapults spaceships from earth orbit into the depths of space. \n\n\nOUR CHANNELS\n▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀\nGerman Channel: https://kgs.link/youtubeDE \nSpanish Channel: https://kgs.link/youtubeES \n\n\nHOW CAN YOU SUPPORT US?\n▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀\nThis is how we make our living and it would be a pleasure if you support us!\n\nGet Merch designed with ❤ from https://kgs.link/shop" ...etc
}

❌ Error item example

If actor fails to get video information for the given URL, it will write an error record to the dataset. This will allow your downstream automation to verify that the actor actually tried working on the given URL.

Actor has access only to the publicly available videos, including unlisted ones. Errors may happen due to any of the following reasons:

Anti-bot protection or suspicious traffic block
The URL is not a valid YouTube video page (homepage, channel, search, Shorts feed, etc.)
Video was unpublished, deleted, set to private, or expired
Video requires login due to age restriction or membership-only access
Video is blocked in the region of your proxy
Redirect or network issue prevented resolving the video URL
Transcripts exist but the selected track could not be fetched or parsed
Critical metadata (e.g., video ID, title, or caption manifest) was missing
Global or regional YouTube outages

Here's how the dataset record looks like when the actor fails to fetch information about the specific video:

{
    "url": "https://example.com/",
    "error_code": "not_youtube"
}

{
    "url": "https://youtube.com/",
    "error_code": "invalid_page_type"
}

{
    "url": "https://www.youtube.com/watch?v=aAkMkVFwAoX",
    "error_code": "video_unavailable"
}

List of possible values written to error_code field of the dataset:

Code	Meaning
`not_youtube`	Input link is not recognised as a valid YouTube video URL.
`resolve_error`	The link could not be resolved to a playable video (redirect or network issue).
`invalid_page_type`	The page type is unsupported (for example, experimental formats).
`transcript_fetch_error`	Transcript metadata could not be retrieved.
`transcript_selection_error`	Transcript metadata exists but the selected track failed to load.
`missing_critical_data`	Essential metadata was missing, preventing a complete record.
`video_info_fetch_error`	Video metadata retrieval returned an unexpected response.
`video_unavailable`	The video is blocked, private, removed, or otherwise unavailable.
`failed`	All retries were exhausted due to an unexpected error.
`null`	No error encountered.

⚙️ Inputs

Field	Type	Description
`start_urls`	array of request objects	Each entry must include a `url` that points to a YouTube video or Shorts page. Optional HTTP method and headers are supported.
`caption_language`	string	Language code prioritised for transcripts (for example `en`, `es`, `de`).
`with_description`	boolean	Include the video description text in the output when `true`.
`with_timestamps`	boolean	Emit timestamped transcript segments when `true` in `captions_structured`; otherwise a single transcript string in `caption_text`.
`allow_generated_captions`	boolean	Fall back to auto-generated transcripts if creator-supplied transcripts are unavailable.
`caption_format`	string	Accepts `plain_text` or `html`, applied to transcript fields.
`concurrency`	integer	Maximum number of videos processed simultaneously. Tune to match your proxy capacity.
`max_retries`	integer	Number of retry attempts per video before an error item is written.
`proxy`	object	Standard Apify proxy configuration payload. We recommend enabling residential proxies for reliable results

Example input

{
    "allow_generated_captions": true,
    "caption_format": "plain_text",
    "caption_language": "en",
    "concurrency": 10,
    "proxy": {
        "useApifyProxy": true,
        "apifyProxyGroups": ["RESIDENTIAL"]
    },
    "start_urls": [
        {
            "url": "https://youtu.be/dqwpQarrDwk"
        }
    ],
    "with_description": true,
    "with_timestamps": true,
    "max_retries": 5
}

💬 Language codes for transcripts

There isn't any complete list of possible language codes for caption_language input field, because some may be internal and undocumented. The most reliable way to discover the language codes is to scrape a few videos of your interest and check available_captions.language_code field in the resulting dataset.

However, the following list is closely following the set of language codes we had seen on YouTube:

Code	Language
ab	Abkhazian
aa	Afar
af	Afrikaans
sq	Albanian
ase	American Sign Language
am	Amharic
ar	Arabic
arc	Aramaic
hy	Armenian
as	Assamese
ay	Aymara
az	Azerbaijani
bn	Bangla
ba	Bashkir
eu	Basque
be	Belarusian
bh	Bihari
bi	Bislama
bs	Bosnian
br	Breton
bg	Bulgarian
yue	Cantonese
yue-HK	Cantonese (Hong Kong)
ca	Catalan
chr	Cherokee
zh	Chinese
zh-CN	Chinese (China)
zh-HK	Chinese (Hong Kong)
zh-Hans	Chinese (Simplified)
zh-SG	Chinese (Singapore)
zh-TW	Chinese (Taiwan)
zh-Hant	Chinese (Traditional)
cho	Choctaw
co	Corsican
hr	Croatian
cs	Czech
da	Danish
nl	Dutch
nl-BE	Dutch (Belgium)
nl-NL	Dutch (Netherlands)
dz	Dzongkha
en	English
en-CA	English (Canada)
en-IE	English (Ireland)
en-GB	English (United Kingdom)
en-US	English (United States)
eo	Esperanto
et	Estonian
fo	Faroese
fj	Fijian
fil	Filipino
fi	Finnish
fr	French
fr-BE	French (Belgium)
fr-CA	French (Canada)
fr-FR	French (France)
fr-CH	French (Switzerland)
gl	Galician
ka	Georgian
de	German
de-AT	German (Austria)
de-DE	German (Germany)
de-CH	German (Switzerland)
el	Greek
kl	Greenlandic (Kalaallisut)
gn	Guarani
gu	Gujarati
hak	Hakka Chinese
hak-TW	Hakka Chinese (Taiwan)
ha	Hausa
iw	Hebrew
hi	Hindi
hi-Latn	Hindi (Phonetic)
hu	Hungarian
is	Icelandic
ig	Igbo
id	Indonesian
ia	Interlingua
ie	Interlingue
iu	Inuktitut
ik	Inupiaq
ga	Irish
it	Italian
ja	Japanese
jv	Javanese
kn	Kannada
ks	Kashmiri
kk	Kazakh
km	Khmer
rw	Kinyarwanda
tlh	Klingon
ko	Korean
ku	Kurdish
ky	Kyrgyz
lo	Lao
la	Latin
lv	Latvian
ln	Lingala
lt	Lithuanian
lb	Luxembourgish
mk	Macedonian
mg	Malagasy
ms	Malay
ml	Malayalam
mt	Maltese
mi	Maori
mr	Marathi
mas	Masai
nan	Min Nan Chinese
nan-TW	Min Nan Chinese (Taiwan)
mo	Moldavian
mn	Mongolian
my	Myanmar (Burmese)
na	Nauru
nv	Navajo
ne	Nepali
no	Norwegian
oc	Occitan
or	Odia
om	Oromo
ps	Pashto
fa	Persian
fa-AF	Persian (Afghanistan)
fa-IR	Persian (Iran)
pl	Polish
pt	Portuguese
pt-BR	Portuguese (Brazil)
pt-PT	Portuguese (Portugal)
pa	Punjabi
qu	Quechua
ro	Romanian
rm	Romansh
rn	Rundi
ru	Russian
ru-Latn	Russian (Phonetic)
sm	Samoan
sg	Sango
sa	Sanskrit
gd	Scottish Gaelic
sr	Serbian
sr-Cyrl	Serbian (Cyrillic)
sr-Latn	Serbian (Latin)
sh	Serbo-Croatian
sdp	Sherdukpen
sn	Shona
sd	Sindhi
si	Sinhala
sk	Slovak
sl	Slovenian
so	Somali
st	Southern Sotho
es	Spanish
es-419	Spanish (Latin America)
es-MX	Spanish (Mexico)
es-ES	Spanish (Spain)
su	Sundanese
sw	Swahili
ss	Swati
sv	Swedish
tl	Tagalog
tg	Tajik
ta	Tamil
tt	Tatar
te	Telugu
th	Thai
bo	Tibetan
ti	Tigrinya
to	Tongan
ts	Tsonga
tn	Tswana
tr	Turkish
tk	Turkmen
tw	Twi
uk	Ukrainian
ur	Urdu
uz	Uzbek
vi	Vietnamese
vo	Volapük
cy	Welsh
fy	Western Frisian
wo	Wolof
xh	Xhosa
yi	Yiddish
yo	Yoruba
zu	Zulu

Source: https://gist.github.com/stpe/f0ef216bda12ffed8b939a455f0d4b65

🚀 Running the Actor

Register or log into Apify.
Open the actor in Apify Console and configure your preferred input (inline or JSON).
Start the run and observe progress in the live log stream.
Download the dataset once the run finishes.

Youtube Transcript

bulletproof/youtube-transcript

Extract transcripts, subtitles, and captions from any YouTube video — standard videos, Shorts, and live streams. Get timestamped segments, full text, and metadata in JSON, SRT, or plain text.

Zero Downtime

Youtube Transcript Extractor ( with Metadata )

devnaz/youtube-transcript-extractor

Extract YouTube video transcripts with complete metadata including title, description, views, likes, channel information, tags, and more.

DevnaZ

YouTube Video Transcript

starvibe/youtube-video-transcript

This Apify Actor extracts full transcripts (with timestamps) and metadata from YouTube videos, including title, description, upload date, views, likes, channel info, and duration

starvibe

1.8K

5.0

YouTube Transcript Scraper

cloud9_ai/youtube-transcript-scraper

Extract transcripts and captions from YouTube videos via InnerTube API. Support for auto-generated and manual captions in multiple languages. Get timestamped text segments.

cloud9

Youtube Shorts Scraper

crawlerbros/youtube-shorts-scraper

Scrape YouTube Shorts from any channel. Get views, likes, comments count, and complete channel metadata.

Crawler Bros

5.0

YouTube Transcript/Metadata Scraper 😋

toludare/youtube-metadata-scraper-all

Introducing the most comprehensive and robust YouTube metadata web scraper on Apify. Get video details, AI summary, creator/channel details, engagement statistics, transcripts, and more from YouTube videos and shorts via a single interface.

tolu.

5.0

Youtube Video Details Scraper

crawlerbros/youtube-video-details-scraper

Extract comprehensive details from YouTube videos including metadata, channel information, transcripts, comments, and engagement metrics.

Crawler Bros

5.0

Youtube Transcript Search

maged120/youtube-transcript-search

search youtube transcripts

Maged

Youtube Shorts Scraper

api-empire/youtube-shorts-scraper

YouTube Shorts Scraper extracts Shorts from any YouTube channel or search query. Capture video URLs, titles, creators, views, likes, comments, audio, and metadata. Ideal for trend research, content analysis, competitive tracking, and workflows needing structured YouTube Shorts data.

API Empire

YouTube Videos-Shorts Download & Transcript Extract & Translate

memo23/youtube-video-details-scraper

💰$7 per month only. Scrape metadata (title, description, channel info, publish date, view/like counts, tags, categories, thumbnails), channel details, and all engagement metrics. Plus: Download complete video transcripts in any available or auto-translated language, with clean plain-text output.