YouTube Transcript Scraper — AI Fallback for Missing Captions
Pricing
from $0.50 / 1,000 transcriptions
YouTube Transcript Scraper — AI Fallback for Missing Captions
Extract YouTube transcripts with AI-powered fallback when captions are unavailable. Enter a URL or search query, get clean timestamped JSON with segments and word-level timings. Ideal for content repurposing, LLM training data, and video accessibility workflows.
Pricing
from $0.50 / 1,000 transcriptions
Rating
5.0
(1)
Developer
Epic Scrapers
Maintained by CommunityActor stats
2
Bookmarked
4
Total users
3
Monthly active users
12 days ago
Last modified
Categories
Share
YouTube Transcript Scraper — AI Fallback for Missing Captions
Extract YouTube video transcripts, captions, and subtitles at scale. Just provide one or more video URLs and get clean, timestamped JSON output — including segments and word-level timings. When YouTube captions are unavailable, the optional AI transcription fallback automatically downloads the audio and transcribes it using a speech-to-text model.
Perfect for content repurposing, LLM training data pipelines, video SEO analysis, accessibility workflows, research, and summarization bots.
Features
- Extract transcripts from any YouTube video URL
- Support multiple formats — JSON3 and VTT caption parsing with automatic deduplication
- Word-level timestamps — get every word with precise start and end times (when available)
- AI transcription fallback — when YouTube has no captions, automatically transcribe audio via AI
- Proxy support — uses residential proxies for reliable access
- Batch processing — pass multiple URLs in a single run
- Clean JSON output — transcript, segments, words, video metadata
Input
| Field | Type | Description |
|---|---|---|
urlList | Array of strings | One or more YouTube video URLs |
useAITranscription | Boolean | Enable AI fallback when captions are missing |
Output
Each video returns a JSON object with:
{"url": "https://www.youtube.com/watch?v=...","videoId": "abc123","title": "Video Title","duration": 300,"language": "en","transcript": "Full concatenated transcript text...","segments": [{"start": 0.0,"end": 2.5,"text": "Welcome to this video"}],"words": [{"text": "Welcome","start": 0.0,"end": 0.6}]}
When AI transcription is used, the output also includes "aiTranscription": true.
Use Cases
- Content repurposing — turn videos into blog posts, articles, or social media snippets
- LLM training data — collect clean, timestamped text for fine-tuning or RAG pipelines
- Video SEO — extract captions for keyword analysis and search optimization
- Accessibility — generate text transcripts for hearing-impaired audiences
- Research — analyze spoken content across large video datasets
- Summarization — feed transcripts into AI tools for automatic video summaries
Example
Input:
{"urlList": ["https://www.youtube.com/watch?v=jNQXAC9IVRw"],"useAITranscription": false}
Output includes the full transcript with timestamped segments and word-level timings (when available in the source captions).
FAQ
Does this work on any YouTube video?
Yes, as long as the video has captions (manual or auto-generated). If captions are missing, enable AI transcription fallback.
What languages are supported?
All languages that YouTube provides captions for — auto-generated or manual. English is selected by default.
How does the AI fallback work?
When enabled and no captions are found, the actor downloads the audio stream and transcribes it using a speech-to-text model. This adds a small processing delay per video.
Can I use this at scale?
Yes. Pass multiple URLs in the urlList array. The actor processes each video sequentially with residential proxy support for reliability.