YouTube Subtitle & Transcript Scraper
Pricing
from $5.00 / 1,000 transcript extracteds
YouTube Subtitle & Transcript Scraper
Extract YouTube subtitles & transcripts from videos, Shorts, playlists, and channels. Output as JSON, SRT, VTT, or clean LLM-ready text. 100+ languages. Rich metadata: views, description, thumbnail. Multi-fallback engine for maximum reliability. Fair billing — failures are free.
Pricing
from $5.00 / 1,000 transcript extracteds
Rating
0.0
(0)
Developer
Richard Feng
Actor stats
0
Bookmarked
19
Total users
13
Monthly active users
15 days ago
Last modified
Categories
Share
Extract subtitles and transcripts from any YouTube video — fast, reliable, and ready for AI pipelines.
Supports single videos, Shorts, playlists, and entire channels. Works with 100+ languages including auto-generated captions.
What you get
For each video, the scraper returns:
- Full transcript text with timestamps
- Rich video metadata — title, channel, description, view count, thumbnail, publish date
- Language info — detected language, auto-generated flag, all available languages listed
- Multiple output formats — pick what fits your workflow
Output formats
| Format | Best for |
|---|---|
| JSON | Apps, databases, APIs — structured data with timestamps per segment |
| SRT | Video editors, media players — standard subtitle file format |
| VTT | Web players, HTML5 video — WebVTT subtitle format |
| Text | Search indexing, content analysis — plain text joined together |
| LLM | AI/ML pipelines, RAG, fine-tuning — clean text with annotations stripped |
The LLM format automatically removes [Music], [Applause], speaker labels, and other non-speech annotations so you get pure spoken content ready for language models.
Supported URL types
You can pass any of these as input:
https://www.youtube.com/watch?v=dQw4w9WgXcQ— standard videohttps://youtu.be/dQw4w9WgXcQ— short linkhttps://www.youtube.com/shorts/dQw4w9WgXcQ— YouTube Shortshttps://www.youtube.com/playlist?list=PLxxxxx— full playlisthttps://www.youtube.com/@channelname— all videos from a channeldQw4w9WgXcQ— just the video ID
Mix and match in a single run — the scraper handles them all.
Input options
| Option | Default | Description |
|---|---|---|
| urls | — | List of YouTube URLs or video IDs to process |
| outputFormat | json | Output format: json, srt, vtt, text, or llm |
| languages | ["en"] | Preferred languages in priority order (e.g. ["en", "ja", "de"]) |
| includeAutoGenerated | true | Use YouTube's auto-generated captions when manual ones aren't available |
| maxVideos | 0 (unlimited) | Limit how many videos to process from playlists/channels |
| maxConcurrency | 3 | How many videos to process in parallel (1–10) |
| proxy | Apify Proxy | Proxy settings — residential proxies recommended |
You can also use startUrls (the [{url: "..."}] format) instead of urls — both work.
Example input
{"urls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ","https://youtu.be/JGwWNGJdvx8"],"outputFormat": "llm","languages": ["en"],"maxConcurrency": 2}
Example output
Each video produces one result in the dataset:
{"videoId": "dQw4w9WgXcQ","url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ","title": "Rick Astley - Never Gonna Give You Up (Official Video)","channelName": "Rick Astley","channelId": "UCuAXFkgsw1L7xaCfnd5JJOw","description": "The official video for \"Never Gonna Give You Up\" by Rick Astley...","publishDate": "2009-10-25","viewCount": 1761003712,"thumbnail": "https://i.ytimg.com/vi/dQw4w9WgXcQ/sddefault.jpg","availableLanguages": ["en", "de-DE", "ja", "pt-BR", "es-419"],"language": "en","languageName": "English","isAutoGenerated": false,"duration": 213,"wordCount": 487,"segmentCount": 61,"text": "We're no strangers to love, you know the rules and so do I...","segments": [{ "text": "We're no strangers to love", "start": 18.64, "end": 21.88 },{ "text": "You know the rules and so do I", "start": 22.64, "end": 26.96 }],"extractedAt": "2026-04-10T07:00:00.000Z","error": null}
When using SRT or VTT format, the result includes an srt or vtt field with the formatted subtitle file content.
Recommendations
For best results:
- Use residential proxies (the default) — they work much better with YouTube than datacenter proxies
- Start with maxConcurrency: 1 if you're processing many videos, then increase gradually
- Set languages to your target language — the scraper picks the best available match
- Use the LLM format if you're feeding transcripts into AI models — it strips all the noise
For large jobs:
- Use playlists or channel URLs to batch-process videos in one run
- Set maxVideos to limit playlist/channel scrapes during testing
- The scraper handles failures gracefully — if one video fails, the rest still process. Failed videos show up in the results with an
errorfield so you can retry them later
For AI/ML workflows:
- The LLM output format gives you clean, annotation-free text optimized for context windows
- JSON format preserves timestamps, which is useful for building time-aligned datasets
- The
segmentsarray gives you natural sentence boundaries from the original captions
Fair billing
You're never charged for videos that fail to extract. You only pay for successful results.
Language support
The scraper supports all languages that YouTube captions are available in — over 100 languages. Set your preferred languages in priority order and the scraper will pick the best available match.
If manual captions aren't available in your language, YouTube's auto-generated captions are used as a fallback (unless you disable this with includeAutoGenerated: false).
Error handling
The scraper is designed to be resilient:
- If a video has no captions, it reports the error and moves on
- If YouTube rate-limits a request, the scraper retries with backoff
- If one extraction method fails, it automatically tries alternatives
- Failed videos appear in the dataset with a descriptive
errorfield — successful videos haveerror: null
Need help?
If you run into issues or have questions, open an issue on the Apify Store page.