YouTube Subtitle & Transcript Scraper avatar

YouTube Subtitle & Transcript Scraper

Pricing

from $5.00 / 1,000 transcript extracteds

Go to Apify Store
YouTube Subtitle & Transcript Scraper

YouTube Subtitle & Transcript Scraper

Extract YouTube subtitles & transcripts from videos, Shorts, playlists, and channels. Output as JSON, SRT, VTT, or clean LLM-ready text. 100+ languages. Rich metadata: views, description, thumbnail. Multi-fallback engine for maximum reliability. Fair billing — failures are free.

Pricing

from $5.00 / 1,000 transcript extracteds

Rating

0.0

(0)

Developer

Richard Feng

Richard Feng

Maintained by Community

Actor stats

0

Bookmarked

19

Total users

13

Monthly active users

15 days ago

Last modified

Share

Extract subtitles and transcripts from any YouTube video — fast, reliable, and ready for AI pipelines.

Supports single videos, Shorts, playlists, and entire channels. Works with 100+ languages including auto-generated captions.

What you get

For each video, the scraper returns:

  • Full transcript text with timestamps
  • Rich video metadata — title, channel, description, view count, thumbnail, publish date
  • Language info — detected language, auto-generated flag, all available languages listed
  • Multiple output formats — pick what fits your workflow

Output formats

FormatBest for
JSONApps, databases, APIs — structured data with timestamps per segment
SRTVideo editors, media players — standard subtitle file format
VTTWeb players, HTML5 video — WebVTT subtitle format
TextSearch indexing, content analysis — plain text joined together
LLMAI/ML pipelines, RAG, fine-tuning — clean text with annotations stripped

The LLM format automatically removes [Music], [Applause], speaker labels, and other non-speech annotations so you get pure spoken content ready for language models.

Supported URL types

You can pass any of these as input:

  • https://www.youtube.com/watch?v=dQw4w9WgXcQ — standard video
  • https://youtu.be/dQw4w9WgXcQ — short link
  • https://www.youtube.com/shorts/dQw4w9WgXcQ — YouTube Shorts
  • https://www.youtube.com/playlist?list=PLxxxxx — full playlist
  • https://www.youtube.com/@channelname — all videos from a channel
  • dQw4w9WgXcQ — just the video ID

Mix and match in a single run — the scraper handles them all.

Input options

OptionDefaultDescription
urlsList of YouTube URLs or video IDs to process
outputFormatjsonOutput format: json, srt, vtt, text, or llm
languages["en"]Preferred languages in priority order (e.g. ["en", "ja", "de"])
includeAutoGeneratedtrueUse YouTube's auto-generated captions when manual ones aren't available
maxVideos0 (unlimited)Limit how many videos to process from playlists/channels
maxConcurrency3How many videos to process in parallel (1–10)
proxyApify ProxyProxy settings — residential proxies recommended

You can also use startUrls (the [{url: "..."}] format) instead of urls — both work.

Example input

{
"urls": [
"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"https://youtu.be/JGwWNGJdvx8"
],
"outputFormat": "llm",
"languages": ["en"],
"maxConcurrency": 2
}

Example output

Each video produces one result in the dataset:

{
"videoId": "dQw4w9WgXcQ",
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"title": "Rick Astley - Never Gonna Give You Up (Official Video)",
"channelName": "Rick Astley",
"channelId": "UCuAXFkgsw1L7xaCfnd5JJOw",
"description": "The official video for \"Never Gonna Give You Up\" by Rick Astley...",
"publishDate": "2009-10-25",
"viewCount": 1761003712,
"thumbnail": "https://i.ytimg.com/vi/dQw4w9WgXcQ/sddefault.jpg",
"availableLanguages": ["en", "de-DE", "ja", "pt-BR", "es-419"],
"language": "en",
"languageName": "English",
"isAutoGenerated": false,
"duration": 213,
"wordCount": 487,
"segmentCount": 61,
"text": "We're no strangers to love, you know the rules and so do I...",
"segments": [
{ "text": "We're no strangers to love", "start": 18.64, "end": 21.88 },
{ "text": "You know the rules and so do I", "start": 22.64, "end": 26.96 }
],
"extractedAt": "2026-04-10T07:00:00.000Z",
"error": null
}

When using SRT or VTT format, the result includes an srt or vtt field with the formatted subtitle file content.

Recommendations

For best results:

  • Use residential proxies (the default) — they work much better with YouTube than datacenter proxies
  • Start with maxConcurrency: 1 if you're processing many videos, then increase gradually
  • Set languages to your target language — the scraper picks the best available match
  • Use the LLM format if you're feeding transcripts into AI models — it strips all the noise

For large jobs:

  • Use playlists or channel URLs to batch-process videos in one run
  • Set maxVideos to limit playlist/channel scrapes during testing
  • The scraper handles failures gracefully — if one video fails, the rest still process. Failed videos show up in the results with an error field so you can retry them later

For AI/ML workflows:

  • The LLM output format gives you clean, annotation-free text optimized for context windows
  • JSON format preserves timestamps, which is useful for building time-aligned datasets
  • The segments array gives you natural sentence boundaries from the original captions

Fair billing

You're never charged for videos that fail to extract. You only pay for successful results.

Language support

The scraper supports all languages that YouTube captions are available in — over 100 languages. Set your preferred languages in priority order and the scraper will pick the best available match.

If manual captions aren't available in your language, YouTube's auto-generated captions are used as a fallback (unless you disable this with includeAutoGenerated: false).

Error handling

The scraper is designed to be resilient:

  • If a video has no captions, it reports the error and moves on
  • If YouTube rate-limits a request, the scraper retries with backoff
  • If one extraction method fails, it automatically tries alternatives
  • Failed videos appear in the dataset with a descriptive error field — successful videos have error: null

Need help?

If you run into issues or have questions, open an issue on the Apify Store page.