Youtube Transcript Scraper
Pricing
from $40.00 / 1,000 results
Youtube Transcript Scraper
Extract transcripts and captions from YouTube videos with language selection support. Returns timestamped segments, full concatenated text, and basic video metadata.
Pricing
from $40.00 / 1,000 results
Rating
5.0
(27)
Developer
Crawler Bros
Actor stats
28
Bookmarked
14
Total users
3
Monthly active users
4 days ago
Last modified
Categories
Share
Extract transcripts and captions from any public YouTube video. Get timestamped segments, full plain-text transcripts, language metadata, and core video info — ready for AI pipelines, content analysis, summarization, translation, and research.
When a video has no captions at all (uploader disabled them, or it's music-only with no on-screen text), an optional Whisper AI fallback can transcribe the audio directly so you still get a usable transcript.
What you get
For each video the scraper returns:
| Field | Description |
|---|---|
video_id | YouTube 11-character video ID |
title | Video title |
channel_name | Channel display name |
channel_id | Channel ID (when available) |
duration_seconds | Video duration in seconds (when available) |
views | View count (when available) |
published_date | Publish date in YYYY-MM-DD (when available) |
thumbnail | Thumbnail URL |
transcript_language | Language code of the extracted transcript (e.g. en, es, ko) |
transcript_language_name | Full language name |
is_auto_generated | true if the transcript is YouTube's auto-caption, false for manually uploaded captions or Whisper output |
transcript_source | library / innertube / playwright_dom / whisper — tells you which path produced the transcript |
language_probability | Whisper's language-detection confidence (only set when transcript_source=whisper) |
available_languages | Array of every transcript language available for the video |
segments | Timestamped segments — start, dur, text |
segment_count | Number of segments returned |
full_text | Complete transcript joined into a single string |
success | true when a transcript was extracted, false otherwise |
error | Reason text when success=false |
inputUrl | The URL you submitted |
scrapedAt | ISO 8601 UTC timestamp |
Empty fields are dropped from each record so the dataset stays clean.
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
videoUrls | Array | required | YouTube watch URLs, youtu.be short links, Shorts URLs, embed URLs, or plain 11-char video IDs. |
language | String | "" | Preferred language code (en, es, fr, de, ja, ko, …). Empty = best available. |
includeAutoGenerated | Boolean | true | Include YouTube auto-captions when manual ones aren't available. |
useWhisper | Boolean | false | Fall back to local Whisper transcription when YouTube has no transcript. Adds ~30-180 s per video. |
whisperModel | Enum | base | tiny (fastest), base (balanced), small (most accurate). Pick small for music or noisy audio. |
Supported URL formats
https://www.youtube.com/watch?v=dQw4w9WgXcQhttps://youtu.be/dQw4w9WgXcQhttps://www.youtube.com/shorts/VIDEO_IDhttps://www.youtube.com/embed/VIDEO_ID- Plain 11-char ID:
dQw4w9WgXcQ
Example input — multiple videos, English preferred
{"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ","https://youtu.be/9bZkp7q19f0","https://www.youtube.com/shorts/abc123XYZ45"],"language": "en","includeAutoGenerated": true}
Example input — Whisper fallback for transcripts-disabled videos
{"videoUrls": ["https://www.youtube.com/watch?v=XqZsoesa55w"],"useWhisper": true,"whisperModel": "small"}
Example output
{"video_id": "dQw4w9WgXcQ","title": "Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)","channel_name": "Rick Astley","channel_id": "UCuAXFkgsw1L7xaCfnd5JJOw","duration_seconds": 213,"views": 1769190465,"published_date": "2009-10-24","thumbnail": "https://i.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg","transcript_language": "en","transcript_language_name": "English","is_auto_generated": false,"transcript_source": "library","available_languages": [{ "code": "en", "name": "English", "is_auto_generated": false },{ "code": "es-419", "name": "Spanish (Latin America)", "is_auto_generated": false }],"segments": [{ "start": "1.360", "dur": "1.680", "text": "[♪♪♪]" },{ "start": "18.640", "dur": "3.240", "text": "We're no strangers to love" }],"segment_count": 61,"full_text": "[♪♪♪] We're no strangers to love You know the rules and so do I ...","success": true,"inputUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ","scrapedAt": "2026-05-05T13:42:18Z"}
Use cases
- AI training data — Build text corpora from YouTube content for LLM fine-tuning or RAG pipelines.
- Content repurposing — Turn long videos into blog posts, summaries, or social copy.
- Research and analysis — Pull spoken content from lectures, interviews, podcasts, and documentaries.
- Subtitles and accessibility — Retrieve captions for translation or accessibility workflows.
- SEO and keyword research — Analyse spoken keywords and topics across YouTube content.
- Competitive intelligence — Monitor what competitors say in their videos.
- Education — Extract transcripts from online courses for indexing or study notes.
- Sentiment analysis — Run sentiment or topic models against transcripts at scale.
FAQ
Which videos can I scrape?
Any public YouTube video that has either manually created captions or auto-generated captions. Private videos, deleted videos, and members-only videos cannot be scraped — those return success=false with a clear error.
What if a video has no captions at all?
Set useWhisper: true to download the audio and transcribe it locally with Whisper (faster-whisper). For clear speech, whisperModel=base is the sweet spot. For music, noisy audio, or short clips, use whisperModel=small for noticeably better accuracy.
What if my requested language isn't available?
The scraper first tries an exact match (en), then variants (en matches en-GB, en-US), then falls back to the best available transcript. The available_languages field always lists everything available.
What's the difference between transcript_source values?
library— pulled from YouTube's published caption tracks (fastest, most reliable, real human captions for popular videos).innertube— pulled from YouTube's internal API when the library couldn't reach it.playwright_dom— extracted from the in-page transcript panel as a last resort.whisper— generated locally from the audio with Whisper AI when YouTube has no captions at all.
How accurate are auto-generated captions vs Whisper?
YouTube auto-captions are generally accurate for clear English speech. Whisper base is comparable for clean audio, and Whisper small typically beats YouTube on accents, multiple speakers, and noisy audio. Both struggle on music with non-vocal singing.
The Whisper output looks like garbage / repetitive text.
Whisper-tiny on music or near-silent clips can hallucinate repetitive phrases. The actor automatically detects this (low language-detection confidence + identical segment text) and returns success=false with an actionable error pointing you at a larger model. Re-run with whisperModel=small for music videos.
Can I scrape multiple videos in one run? Yes — pass an array of URLs. Each video is processed sequentially and pushed as its own dataset row.
How current is the data? Live — every run hits YouTube at request time. Schedule the actor for daily / hourly refreshes.
Limitations
- Private, members-only, age-restricted, and deleted videos cannot be scraped.
- Whisper transcription uses CPU, so it adds 30-180 s per video depending on length and model size.
- Whisper accuracy on heavy music or pure-instrumental audio is fundamentally limited regardless of model size.
- YouTube can change its caption infrastructure; the scraper has multiple fallback paths but a transient outage may still cause
success=falsefor individual videos.