YouTube Transcript Scraper - Bulk + Multi-language
Pricing
$5.00 / 1,000 transcript fetcheds
YouTube Transcript Scraper - Bulk + Multi-language
Extract YouTube transcripts in bulk: any public video, manual + auto-generated captions, multi-language fallback. Outputs full text + segments with timestamps. HTTP-only, no API key. Pay $0.005/transcript.
Pricing
$5.00 / 1,000 transcript fetcheds
Rating
0.0
(0)
Developer
dltik
Actor stats
0
Bookmarked
3
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
YouTube Transcript Scraper — Bulk Extract Captions, Multi-language, with Timestamps
Extract YouTube transcripts in bulk — any public video, any channel. Manual + auto-generated captions, multi-language fallback, full text + segments with timestamps. HTTP-only via yt-dlp, better fingerprinting than
youtube-transcript-api. No API key, no quota. $0.005 per transcript ($5 per 1,000).
⭐ Bookmark this YouTube Transcript Scraper — Apify ranks actors by bookmarks, so it directly helps the visibility of this scraper on the Apify Store.
What is the YouTube Transcript Scraper?
The YouTube Transcript Scraper is an Apify actor that downloads YouTube video transcripts (subtitles / closed captions) in bulk, fast, and cheap. Drop in a list of YouTube URLs or video IDs and get back the full transcript text plus timestamped segments. Multi-language with fallback — request French, get auto-translated English if French isn't available.
Built on yt-dlp instead of the deprecated youtube-transcript-api Python lib, so it has better session/cookie management, lower 429 rate, and tolerates DC IPs in most cases. Add the Apify Residential proxy if you hit blocks on aggressive videos.
Use cases
- AI training data — pull transcripts of 1,000+ podcast videos to fine-tune an LLM.
- Content marketing research — extract transcripts of competitor YouTube videos and analyze tone, keywords, hooks.
- Multilingual subtitles — get auto-generated captions in your target language for video re-localization.
- Searchable video archives — turn a 500-video channel into a full-text searchable corpus.
- Podcast / interview pipelines — feed YouTube transcripts into Claude / GPT for summarization, highlight extraction, social-media clip generation.
- SEO research — find which keywords appear in top-ranking video transcripts.
Input
{"videos": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ","https://youtu.be/jNQXAC9IVRw","EZ8RFXNQK2g"],"languages": ["en", "fr", "es"],"proxyConfig": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }}
The videos field accepts full URLs (youtube.com/watch?v=…, youtu.be/…, youtube.com/shorts/…, youtube.com/embed/…) or raw 11-character video IDs.
Output
{"video_id": "dQw4w9WgXcQ","url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ","language": "en","is_auto_generated": false,"transcript_text": "We're no strangers to love. You know the rules and so do I...","segments": [{ "start": 18.4, "duration": 3.6, "text": "We're no strangers to love" },{ "start": 22.0, "duration": 3.2, "text": "You know the rules and so do I" }],"char_count": 1842,"word_count": 311}
Pricing
PAY_PER_EVENT — $0.005 per transcript fetched (= $5 per 1,000). Failed fetches are not charged. Compute and (optional) residential proxy bandwidth are billed by Apify on top — typically <$0.001/transcript with HTTP-only mode.
FAQ — YouTube Transcript API alternatives
Why use this scraper vs youtube-transcript-api (Python lib)? That library breaks frequently when YouTube rotates internal endpoints, and it has zero anti-fingerprinting. This scraper uses yt-dlp, which is the most-maintained YouTube client (it powers most YouTube downloaders). It bypasses the issues youtube-transcript-api users hit weekly.
Does it work without a proxy? Often, yes — yt-dlp's session management bypasses most DC blocks. For aggressive videos (recently uploaded, high-view, or live-streamed), enable Apify Residential proxy to maintain >95% success rate.
Does it support YouTube Shorts and live streams? Yes — Shorts URLs (youtube.com/shorts/…) and live VOD pages are auto-detected.
Bulk extraction — how fast? Around 2-3 transcripts per second per actor instance. For 10K transcripts, run with 4-8 concurrent actor runs.
Multi-language fallback — how does it pick? It tries each language in your languages list in order; if none has a manual transcript, it falls back to auto-generated, then to auto-translated. The chosen language is in the language output field.
⭐ Found this useful? Bookmark this YouTube Transcript Scraper — it's the strongest signal for Apify Store ranking.
Related actors
- Substack Scraper — for written content + sentiment
- HackerNews MCP Server — tech-content analysis
- Pappers MCP Server — French B2B intelligence
License: MIT · Author: dltik