YouTube Transcript Scraper
Pricing
$20.00/month + usage
YouTube Transcript Scraper
Get transcripts from YouTube videos and Shorts as plain text or structured timestamped segments. Results come with title, description, likes, channel details, and other metadata.
Pricing
$20.00/month + usage
Rating
0.0
(0)
Developer

Embion
Actor stats
2
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
▶️ Scrape transcripts and metadata from YouTube videos and Shorts
This Actor gets transcripts supplied by the video creator or generated by YouTube. It works in two modes: full text or structured segments with exact timestamps. Built for automation pipelines: stable output, reliable retries, structured error codes, and proxy support.
Features:
- Extracts transcripts as plain text or timestamped segments
- Includes title, description, keywords, category, duration, publish date, channel name, subscriber count, and more
- Supports HTML and plain text transcript formats
- Writes consistent structure even when transcripts are missing
- Retries on failures, tracks error reasons, produces structured error items
- Residential proxies for highest reliability (recommended setting)
If you want to see the exact output format, check the section "Successful item example".
📦 Output dataset
Each processed URL produces one dataset item with the following structure:
✅ Successful item example
We truncated some long texts with "...etc" symbols to make examples easy to read. The real dataset output will contain full results without truncation.
Also note, that some videos do not have any transcripts for the language that you want, so the actor will still write the result to the dataset with caption_text and captions_structured fields both set to null.
Example of output when with_timestamps is true (enabled). captions_structured field is filled while caption_text field is null:
{"url": "https://youtu.be/dqwpQarrDwk", // URL provided as input in 'start_urls'"id": "dqwpQarrDwk", // video ID"title": "1,000km Cable to the Stars - The Skyhook", // video title"channel_name": "Kurzgesagt – In a Nutshell", // name of the channel which posted the video"channel_id": "UCsXVk37bltHxD1rDPwtNM8Q", // ID of the channel which posted the video"channel_url": "http://www.youtube.com/@kurzgesagt", // URL of the channel which posted the video"channel_subscribers_text": "24.8M subscribers", // subscriber count as shown on video page"channel_subscribers": 24800000, // subscriber count parsed as number"category": "Education", // category of the video"duration": 420, // total duration of the video in seconds, 0 if livestream"view_count": 12705758, // total number of views from microdata"like_count": 409346, // total number of likes from microdata"published_date": "2019-11-17T13:30:03.000Z", // when the video was published from microdata"published_date_text": "Nov 17, 2019", // when the video was published as it appears on the website"keywords": [ // keywords of the video"Skyhook","Spacetether","Tether","Space","Spavetravel",...etc],"caption_lang": "en", // language code of captions"caption_generated": false, // true if captions are auto-generated"caption_text": null, // null when "with_timestamps" input is true"captions_structured": [ // filled when "with_timestamps" is true and captions (transcript) exist{"start_ms": "1200", // milliseconds from start of the video when the text appears, guaranteed to be non-null if object exists"end_ms": "2920", // milliseconds from start of the video when the text hides, guaranteed to be non-null if object exists"snippet": "Getting to space is hard." // actual text in subtitles, may be null},{"start_ms": "3080","end_ms": "6580","snippet": "Right now, it’s like going up on a mountain on a unicycle-"},...etc],"available_captions": [ // list of all captions YouTube declares for the video{"language_code": "sq", // guaranteed to be non-null if object exists"name": "Albanian", // may be null depending on what YouTube returns"generated": false // guaranteed to be non-null if object exists},{"language_code": "ar","name": "Arabic","generated": false},...etc],"unlisted": false, // video requires direct URL and is hidden from search"live": false, // video is a livestream"error_code": null, // null on success"description": "Sources: https://sites.google.com/view/sources-skyhooks/\n\nGet your 12,020 SPACE Calendar here: https://shop.kurzgesagt.org/\nWORLDWIDE SHIPPING IS AVAILABLE!\n\nGetting to space is incredibly hard, expensive and needs a lot of resources. \nA more efficient way to get there is a Skyhook (or Spacetether), an ever rotating cable with a counter weight, that catapults spaceships from earth orbit into the depths of space. \n\n\nOUR CHANNELS\n▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀\nGerman Channel: https://kgs.link/youtubeDE \nSpanish Channel: https://kgs.link/youtubeES \n\n\nHOW CAN YOU SUPPORT US?\n▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀\nThis is how we make our living and it would be a pleasure if you support us!\n\nGet Merch designed with ❤ from https://kgs.link/shop" ...etc // raw description as shown below video}
Example of output when with_timestamps setting is false (disabled). caption_text field is filled while captions_structured field is null:
{"url": "https://youtu.be/dqwpQarrDwk","id": "dqwpQarrDwk","title": "1,000km Cable to the Stars - The Skyhook","channel_name": "Kurzgesagt – In a Nutshell","channel_id": "UCsXVk37bltHxD1rDPwtNM8Q","channel_url": "http://www.youtube.com/@kurzgesagt","channel_subscribers_text": "24.8M subscribers","channel_subscribers": 24800000,"category": "Education","duration": 420,"view_count": 12705758,"like_count": 409346,"published_date": "2019-11-17T13:30:03.000Z","published_date_text": "Nov 17, 2019","keywords": ["Skyhook","Spacetether","Tether","Space","Spavetravel",...etc],"caption_lang": "en","caption_generated": false,"caption_text": "Getting to space is hard. Right now, it’s like going up on a mountain on a unicycle- with a backpack full of explosives. Incredibly slow, you can’t transport a lot of stuff, and you might die. A rocket needs to reach a velocity about 40,000km an hour to escape from Earth. To get to that speed, rockets are mostly containers for fuel with a tiny tip of payload. This is bad if you want to go to other planets, because you need a lot of heavy stuff if you want to survive, and maybe even come back. So, is there a way to get to space with less fuel and more payload? A nice thing that solved most of our transport problems on Earth is what you call infrastructure. Whether it’s roads for cars, ports for ships, or rails for trains, we’ve made it easier to get to places. We can apply the same solution to space travel. Space infrastructure will make getting into orbit and out to the Moon, Mars, and beyond easier and cheaper." ...etc,"captions_structured": null,"available_captions": [{"language_code": "sq","name": "Albanian","generated": false},{"language_code": "ar","name": "Arabic","generated": false},...etc],"unlisted": false,"live": false,"error_code": null,"description": "Sources: https://sites.google.com/view/sources-skyhooks/\n\nGet your 12,020 SPACE Calendar here: https://shop.kurzgesagt.org/\nWORLDWIDE SHIPPING IS AVAILABLE!\n\nGetting to space is incredibly hard, expensive and needs a lot of resources. \nA more efficient way to get there is a Skyhook (or Spacetether), an ever rotating cable with a counter weight, that catapults spaceships from earth orbit into the depths of space. \n\n\nOUR CHANNELS\n▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀\nGerman Channel: https://kgs.link/youtubeDE \nSpanish Channel: https://kgs.link/youtubeES \n\n\nHOW CAN YOU SUPPORT US?\n▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀\nThis is how we make our living and it would be a pleasure if you support us!\n\nGet Merch designed with ❤ from https://kgs.link/shop" ...etc}
❌ Error item example
If actor fails to get video information for the given URL, it will write an error record to the dataset. This will allow your downstream automation to verify that the actor actually tried working on the given URL.
Actor has access only to the publicly available videos, including unlisted ones. Errors may happen due to any of the following reasons:
- Anti-bot protection or suspicious traffic block
- The URL is not a valid YouTube video page (homepage, channel, search, Shorts feed, etc.)
- Video was unpublished, deleted, set to private, or expired
- Video requires login due to age restriction or membership-only access
- Video is blocked in the region of your proxy
- Redirect or network issue prevented resolving the video URL
- Transcripts exist but the selected track could not be fetched or parsed
- Critical metadata (e.g., video ID, title, or caption manifest) was missing
- Global or regional YouTube outages
Here's how the dataset record looks like when the actor fails to fetch information about the specific video:
{"url": "https://example.com/","error_code": "not_youtube"}
{"url": "https://youtube.com/","error_code": "invalid_page_type"}
{"url": "https://www.youtube.com/watch?v=aAkMkVFwAoX","error_code": "video_unavailable"}
List of possible values written to error_code field of the dataset:
| Code | Meaning |
|---|---|
not_youtube | Input link is not recognised as a valid YouTube video URL. |
resolve_error | The link could not be resolved to a playable video (redirect or network issue). |
invalid_page_type | The page type is unsupported (for example, experimental formats). |
transcript_fetch_error | Transcript metadata could not be retrieved. |
transcript_selection_error | Transcript metadata exists but the selected track failed to load. |
missing_critical_data | Essential metadata was missing, preventing a complete record. |
video_info_fetch_error | Video metadata retrieval returned an unexpected response. |
video_unavailable | The video is blocked, private, removed, or otherwise unavailable. |
failed | All retries were exhausted due to an unexpected error. |
null | No error encountered. |
⚙️ Inputs
| Field | Type | Description |
|---|---|---|
start_urls | array of request objects | Each entry must include a url that points to a YouTube video or Shorts page. Optional HTTP method and headers are supported. |
caption_language | string | Language code prioritised for transcripts (for example en, es, de). |
with_description | boolean | Include the video description text in the output when true. |
with_timestamps | boolean | Emit timestamped transcript segments when true in captions_structured; otherwise a single transcript string in caption_text. |
allow_generated_captions | boolean | Fall back to auto-generated transcripts if creator-supplied transcripts are unavailable. |
caption_format | string | Accepts plain_text or html, applied to transcript fields. |
concurrency | integer | Maximum number of videos processed simultaneously. Tune to match your proxy capacity. |
max_retries | integer | Number of retry attempts per video before an error item is written. |
proxy | object | Standard Apify proxy configuration payload. We recommend enabling residential proxies for reliable results |
Example input
{"allow_generated_captions": true,"caption_format": "plain_text","caption_language": "en","concurrency": 10,"proxy": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]},"start_urls": [{"url": "https://youtu.be/dqwpQarrDwk"}],"with_description": true,"with_timestamps": true,"max_retries": 5}
💬 Language codes for transcripts
There isn't any complete list of possible language codes for caption_language input field, because some may be internal and undocumented. The most reliable way to discover the language codes is to scrape a few videos of your interest and check available_captions.language_code field in the resulting dataset.
However, the following list is closely following the set of language codes we had seen on YouTube:
| Code | Language |
|---|---|
| ab | Abkhazian |
| aa | Afar |
| af | Afrikaans |
| sq | Albanian |
| ase | American Sign Language |
| am | Amharic |
| ar | Arabic |
| arc | Aramaic |
| hy | Armenian |
| as | Assamese |
| ay | Aymara |
| az | Azerbaijani |
| bn | Bangla |
| ba | Bashkir |
| eu | Basque |
| be | Belarusian |
| bh | Bihari |
| bi | Bislama |
| bs | Bosnian |
| br | Breton |
| bg | Bulgarian |
| yue | Cantonese |
| yue-HK | Cantonese (Hong Kong) |
| ca | Catalan |
| chr | Cherokee |
| zh | Chinese |
| zh-CN | Chinese (China) |
| zh-HK | Chinese (Hong Kong) |
| zh-Hans | Chinese (Simplified) |
| zh-SG | Chinese (Singapore) |
| zh-TW | Chinese (Taiwan) |
| zh-Hant | Chinese (Traditional) |
| cho | Choctaw |
| co | Corsican |
| hr | Croatian |
| cs | Czech |
| da | Danish |
| nl | Dutch |
| nl-BE | Dutch (Belgium) |
| nl-NL | Dutch (Netherlands) |
| dz | Dzongkha |
| en | English |
| en-CA | English (Canada) |
| en-IE | English (Ireland) |
| en-GB | English (United Kingdom) |
| en-US | English (United States) |
| eo | Esperanto |
| et | Estonian |
| fo | Faroese |
| fj | Fijian |
| fil | Filipino |
| fi | Finnish |
| fr | French |
| fr-BE | French (Belgium) |
| fr-CA | French (Canada) |
| fr-FR | French (France) |
| fr-CH | French (Switzerland) |
| gl | Galician |
| ka | Georgian |
| de | German |
| de-AT | German (Austria) |
| de-DE | German (Germany) |
| de-CH | German (Switzerland) |
| el | Greek |
| kl | Greenlandic (Kalaallisut) |
| gn | Guarani |
| gu | Gujarati |
| hak | Hakka Chinese |
| hak-TW | Hakka Chinese (Taiwan) |
| ha | Hausa |
| iw | Hebrew |
| hi | Hindi |
| hi-Latn | Hindi (Phonetic) |
| hu | Hungarian |
| is | Icelandic |
| ig | Igbo |
| id | Indonesian |
| ia | Interlingua |
| ie | Interlingue |
| iu | Inuktitut |
| ik | Inupiaq |
| ga | Irish |
| it | Italian |
| ja | Japanese |
| jv | Javanese |
| kn | Kannada |
| ks | Kashmiri |
| kk | Kazakh |
| km | Khmer |
| rw | Kinyarwanda |
| tlh | Klingon |
| ko | Korean |
| ku | Kurdish |
| ky | Kyrgyz |
| lo | Lao |
| la | Latin |
| lv | Latvian |
| ln | Lingala |
| lt | Lithuanian |
| lb | Luxembourgish |
| mk | Macedonian |
| mg | Malagasy |
| ms | Malay |
| ml | Malayalam |
| mt | Maltese |
| mi | Maori |
| mr | Marathi |
| mas | Masai |
| nan | Min Nan Chinese |
| nan-TW | Min Nan Chinese (Taiwan) |
| mo | Moldavian |
| mn | Mongolian |
| my | Myanmar (Burmese) |
| na | Nauru |
| nv | Navajo |
| ne | Nepali |
| no | Norwegian |
| oc | Occitan |
| or | Odia |
| om | Oromo |
| ps | Pashto |
| fa | Persian |
| fa-AF | Persian (Afghanistan) |
| fa-IR | Persian (Iran) |
| pl | Polish |
| pt | Portuguese |
| pt-BR | Portuguese (Brazil) |
| pt-PT | Portuguese (Portugal) |
| pa | Punjabi |
| qu | Quechua |
| ro | Romanian |
| rm | Romansh |
| rn | Rundi |
| ru | Russian |
| ru-Latn | Russian (Phonetic) |
| sm | Samoan |
| sg | Sango |
| sa | Sanskrit |
| gd | Scottish Gaelic |
| sr | Serbian |
| sr-Cyrl | Serbian (Cyrillic) |
| sr-Latn | Serbian (Latin) |
| sh | Serbo-Croatian |
| sdp | Sherdukpen |
| sn | Shona |
| sd | Sindhi |
| si | Sinhala |
| sk | Slovak |
| sl | Slovenian |
| so | Somali |
| st | Southern Sotho |
| es | Spanish |
| es-419 | Spanish (Latin America) |
| es-MX | Spanish (Mexico) |
| es-ES | Spanish (Spain) |
| su | Sundanese |
| sw | Swahili |
| ss | Swati |
| sv | Swedish |
| tl | Tagalog |
| tg | Tajik |
| ta | Tamil |
| tt | Tatar |
| te | Telugu |
| th | Thai |
| bo | Tibetan |
| ti | Tigrinya |
| to | Tongan |
| ts | Tsonga |
| tn | Tswana |
| tr | Turkish |
| tk | Turkmen |
| tw | Twi |
| uk | Ukrainian |
| ur | Urdu |
| uz | Uzbek |
| vi | Vietnamese |
| vo | Volapük |
| cy | Welsh |
| fy | Western Frisian |
| wo | Wolof |
| xh | Xhosa |
| yi | Yiddish |
| yo | Yoruba |
| zu | Zulu |
Source: https://gist.github.com/stpe/f0ef216bda12ffed8b939a455f0d4b65
🚀 Running the Actor
- Register or log into Apify.
- Open the actor in Apify Console and configure your preferred input (inline or JSON).
- Start the run and observe progress in the live log stream.
- Download the dataset once the run finishes.