Pricing

Pay per event

YouTube Transcript Scraper & Bulk Downloader

Bulk YouTube transcript downloader and extractor — pull captions (manual or auto-generated) from one video or a whole channel, in any language. Returns plain-text transcript plus timed segments, export to JSON or CSV. We retry and rotate so the captions land.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

🎯 What this scrapes

YouTube ships closed captions for most videos. This Actor takes a list of video URLs or bare IDs, picks the best available caption track in the language you specify, downloads every cue, and writes one clean row per video. You get the full joined transcript text plus — if you want them — the per-cue segments with start time and duration. Channel name, video title, duration, and the full list of available languages all land in the same row.

We handle the parts that make bulk transcript extraction fragile at scale: rate-limit pushback, endpoint parameter drift, and residential proxy rotation so YouTube sees a real browser session rather than a Python script hitting its timedtext endpoint in a tight loop.

Captions are public metadata published by YouTube. This Actor fetches only what YouTube's own player loads for any viewer. It does not download video files, bypass region locks, or access private or unlisted content.

🔥 What we handle for you

🛡️ Browser fingerprint rotation — curl-cffi impersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not a Python script.
🌐 Residential proxy rotation via Apify Proxy — fresh session ID and exit IP on every block or rate-limit response.
🔁 Retries with exponential backoff on 408 / 429 / 5xx — up to 5 attempts per video, Retry-After header honoured.
🧱 Rate-limit-aware pacing — when YouTube pushes back we slow down rather than accumulate bans across the run.
🧊 Clean, typed dataset rows — Pydantic-validated output, ISO-8601 timestamps, stable IDs. Export as JSON, CSV, or Excel straight from Apify Console.
💰 Pay-Per-Event pricing — you pay only for rows that land in your dataset. No data, no charge beyond the small run warm-up fee.

💡 Use cases

RAG corpus seeding — bulk-download transcripts for a playlist of conference talks, lectures, or podcast episodes and feed them straight into a vector store or LLM context window.
YouTube transcript bulk download for NLP — export hundreds of transcripts at once for sentiment analysis, topic modelling, or fine-tuning data prep.
Podcast show-notes automation — feed each new YouTube-hosted episode through this Actor and into an LLM to generate Markdown show notes automatically.
Download YouTube subtitles for language learning — pull caption tracks in the target language across a playlist for comprehension practice or graded reading corpora.
YouTube subtitles dataset construction — build a reproducible, version-controlled transcript dataset for ML benchmarking, search indexing, or attribution research.
YouTube transcript for RAG pipelines — drop transcripts directly into LangChain, LlamaIndex, or any retrieval-augmented generation stack without preprocessing.

⚙️ How to use it

Click Try for free at the top of the Store listing.
Paste YouTube video URLs or bare video IDs into videoUrls — one per line, or as a JSON array. Shorts and youtu.be links both work.
Set language to the ISO-639-1 code you want (default en). The Actor falls back gracefully through manual tracks, auto-generated tracks, and any available language.
Click Start. Results stream into the run's dataset in real time.
Export from Storage → Dataset as JSON, CSV, or Excel — or pull via the Apify API.

For large lists (hundreds of videos) leave proxyConfiguration on its default of useApifyProxy: true. On the Apify FREE plan this uses datacenter proxies; upgrading to a paid plan routes through residential IPs, which handle aggressive rate-limiting with a higher success rate.

📥 Input

Field	Type	Required	Default	Notes
`videoUrls`	`array`	yes	`["https://www.youtube.com/watch?v=dQw4w9WgXcQ"]`	YouTube video URLs or bare video IDs. Shorts and youtu.be links are accepted.
`language`	`string`	no	`"en"`	ISO-639-1 language code. Track selection order: manual in requested language → auto in requested language → manual any → auto any.
`includeSegments`	`boolean`	no	`true`	When `true`, the `segments` array includes one entry per cue (text + start time + duration). The joined `transcript_text` field is always present regardless.
`concurrency`	`integer`	no	`4`	Number of videos processed in parallel. Lower this if you see elevated 429s on a shared datacenter proxy.
`proxyConfiguration`	`object`	no	`{"useApifyProxy": true}`	Proxy settings. YouTube rate-limits aggressive runs — residential routing is recommended for lists of 100+ videos.

Example input

{
  "videoUrls": [
    "dQw4w9WgXcQ",
    "https://www.youtube.com/watch?v=9bZkp7q19f0"
  ],
  "language": "en",
  "includeSegments": false,
  "concurrency": 3,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}

📤 Output

One dataset row per input video.

Field	Type	Notes
`video_id`	`string`	YouTube video ID (11 characters).
`video_url`	`string`	Canonical `youtube.com/watch?v=` URL.
`title`	`string \| null`	Video title parsed from the watch page.
`channel_name`	`string \| null`	Channel display name.
`channel_id`	`string \| null`	Channel ID.
`duration_seconds`	`integer \| null`	Video duration in seconds.
`language`	`string`	Caption track language code actually used.
`is_auto_generated`	`boolean`	`true` for YouTube-auto-generated tracks; `false` for manually uploaded captions.
`transcript_text`	`string`	Full transcript joined with newlines — ready to paste into an LLM prompt or search index.
`segments`	`array \| null`	Per-cue entries with `text`, `start`, and `duration` when `includeSegments` is `true`.
`available_languages`	`array`	All caption track language codes available on the video.
`scraped_at`	`string`	ISO-8601 timestamp of when this row was written.

Example output

{
  "video_id": "dQw4w9WgXcQ",
  "video_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "title": "Rick Astley - Never Gonna Give You Up (Official Music Video)",
  "channel_name": "Rick Astley",
  "channel_id": "UCuAXFkgsw1L7xaCfnd5JJOw",
  "duration_seconds": 213,
  "language": "en",
  "is_auto_generated": false,
  "transcript_text": "We're no strangers to love\nYou know the rules and so do I\n...",
  "segments": null,
  "available_languages": ["en", "es", "fr", "de"],
  "scraped_at": "2026-06-01T10:32:14Z"
}

💰 Pricing

Pay-Per-Event — you pay only when these events fire:

Event	USD	What it is
`actor-start`	$0.005	One-off warm-up charge per run
`result`	$0.004	Per dataset row written

Example: 1 000 transcripts at these rates ≈ $4.05. No subscription, no monthly minimum, no credit card required to start — Apify gives every new account $5 of free credit.

For very large bulk runs (10 000+ transcripts/month) the per-result charge scales linearly: 10k ≈ $40, 100k ≈ $400. If your volume is that high, open an issue on the Actor's Issues tab — we can discuss a volume arrangement.

🚧 Limitations

Captions disabled by uploader — some creators turn off captions entirely. Those videos return no transcript row; the Actor logs the skip and moves on.
Rate-limiting on large batches — YouTube pushes back on high-concurrency runs from shared datacenter IPs. Use proxyConfiguration with residential routing and keep concurrency at 3–5 for lists of 500+ videos.
Live streams — live captions are usually unavailable until the broadcast ends and the VOD is processed. Re-run after the stream concludes.
Age-gated / sign-in-required videos — this Actor does not accept YouTube credentials and cannot retrieve captions from age-restricted content.
Parameter drift — YouTube occasionally rotates its internal timedtext endpoint parameters. When this happens existing runs may return empty transcripts for affected videos. We monitor for this and ship a fix within 48 hours. Check the Actor's CHANGELOG for the latest version.

❓ FAQ

What's the difference between this and the youtube-transcript-api Python library?

The OSS library is great for one-off scripts. This Actor wraps equivalent logic inside Apify's cloud infrastructure, adding proxy rotation, retries, concurrency control, structured output, and the ability to schedule recurring runs — no server required. Use the library for local experiments; use this Actor when you need youtube transcript bulk download at scale without managing infrastructure.

Does it work for youtube transcript api access programmatically?

Yes. Every run's dataset is accessible via the Apify REST API. You can trigger runs, poll for completion, and pull results as JSON in one API call. See Apify's documentation for the full reference.

Can I download YouTube subtitles in languages other than English?

Yes. Set language to any ISO-639-1 code (e.g. "es" for Spanish, "ja" for Japanese, "de" for German). The Actor will select the best matching track and fall back gracefully if the exact language is unavailable. The available_languages field in every output row lists what was actually on the video.

What about youtube closed captions extract for auto-generated tracks?

Auto-generated tracks are fully supported and labelled clearly via the is_auto_generated field. Auto tracks are used as a fallback when no manual caption upload exists. Quality varies by video; auto-generated tracks on professionally produced content tend to be accurate.

What if no captions exist at all?

The Actor logs the video ID and skips it. We do not synthesise or transcribe audio — that's a different (much more expensive) problem.

Can I use this for a youtube transcript for rag pipeline?

Exactly the use case we built for. The transcript_text field is clean joined text ready for chunking. The segments array gives you cue-level timestamps if you want to preserve position information for citation or retrieval. Both fields export as-is into JSON; just point your LangChain Document or LlamaIndex Node constructor at the dataset.

Why is title or channel_name empty?

If YouTube returns a consent interstitial or a 429 on the watch page during metadata fetch, we still deliver the transcript but leave page-scraped fields null. The transcript itself is retrieved from a separate endpoint and succeeds independently.

💬 Your feedback

Spotted a bug, hit a rate-limit pattern we aren't handling, or need a field added? Open an issue on the Actor's Issues tab in Apify Console — we read every report and ship fixes on a weekly cadence. For parameter-drift breakages, check the CHANGELOG first; a fix is usually already in latest.

YouTube Transcript Scraper

thescrappa/youtube-transcript-scraper

Extract YouTube transcript segments and full transcript text by video ID.

Scrappa

YouTube Transcript Scraper - Bulk + Multi-language

dltik/youtube-transcript-scraper

Extract YouTube transcripts in bulk: any public video, manual + auto-generated captions, multi-language fallback. Outputs full text + segments with timestamps. HTTP-only, no API key. Pay $0.005/transcript.

Walid

YouTube Transcript Scraper - Subtitles and Captions

openclawmara/youtube-transcript-scraper

Extract transcripts and subtitles from YouTube videos. Get auto-generated or manual captions in any language. Bulk extraction from video URLs, channels, or playlists. Output as plain text, timestamped segments, or SRT. Perfect for content repurposing, SEO, and video analysis.

OpenClaw Mara

Youtube Transcript Scraper

scrapebase/youtube-transcript-scraper

ScrapeBase

Youtube Transcript Scraper

scraperforge/youtube-transcript-scraper

ScraperForge

Youtube Transcript Scraper

scrapium/youtube-transcript-scraper

Scrapium

YouTube Transcript Scraper

elaborate_statue/youtube-transcript-scraper

Extract transcripts (captions) from YouTube videos with timestamps. Supports manual and auto-generated captions in 50+ languages. Outputs JSON, plain text, or SRT format.

Alex Kim

YouTube Transcript Extractor

startuphub/youtube-transcript

Get the full text transcript of any YouTube video by URL or ID, with title, channel, and duration.

StartupHub

YouTube Transcript Extractor

scrapemesh/youtube-transcript-scraper

ScrapeMesh

YouTube Transcript Scraper Goat

goat255/youtube-transcript-scraper

Extract transcripts and captions from public YouTube videos in bulk. Returns timed segments (start, duration, text), the language, whether captions are auto-generated, and an optional joined plain-text blob. Accepts watch URLs, youtu.be links, Shorts, or raw video ids. No login or cookies.