YouTube Transcript Scraper — AI  Fallback for Missing Captions avatar

YouTube Transcript Scraper — AI Fallback for Missing Captions

Pricing

from $0.50 / 1,000 transcriptions

Go to Apify Store
YouTube Transcript Scraper — AI  Fallback for Missing Captions

YouTube Transcript Scraper — AI Fallback for Missing Captions

Extract YouTube transcripts with AI-powered fallback when captions are unavailable. Enter a URL or search query, get clean timestamped JSON with segments and word-level timings. Ideal for content repurposing, LLM training data, and video accessibility workflows.

Pricing

from $0.50 / 1,000 transcriptions

Rating

5.0

(1)

Developer

Epic Scrapers

Epic Scrapers

Maintained by Community

Actor stats

2

Bookmarked

4

Total users

3

Monthly active users

12 days ago

Last modified

Share

YouTube Transcript Scraper — AI Fallback for Missing Captions

Extract YouTube video transcripts, captions, and subtitles at scale. Just provide one or more video URLs and get clean, timestamped JSON output — including segments and word-level timings. When YouTube captions are unavailable, the optional AI transcription fallback automatically downloads the audio and transcribes it using a speech-to-text model.

Perfect for content repurposing, LLM training data pipelines, video SEO analysis, accessibility workflows, research, and summarization bots.

Features

  • Extract transcripts from any YouTube video URL
  • Support multiple formats — JSON3 and VTT caption parsing with automatic deduplication
  • Word-level timestamps — get every word with precise start and end times (when available)
  • AI transcription fallback — when YouTube has no captions, automatically transcribe audio via AI
  • Proxy support — uses residential proxies for reliable access
  • Batch processing — pass multiple URLs in a single run
  • Clean JSON output — transcript, segments, words, video metadata

Input

FieldTypeDescription
urlListArray of stringsOne or more YouTube video URLs
useAITranscriptionBooleanEnable AI fallback when captions are missing

Output

Each video returns a JSON object with:

{
"url": "https://www.youtube.com/watch?v=...",
"videoId": "abc123",
"title": "Video Title",
"duration": 300,
"language": "en",
"transcript": "Full concatenated transcript text...",
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": "Welcome to this video"
}
],
"words": [
{
"text": "Welcome",
"start": 0.0,
"end": 0.6
}
]
}

When AI transcription is used, the output also includes "aiTranscription": true.

Use Cases

  • Content repurposing — turn videos into blog posts, articles, or social media snippets
  • LLM training data — collect clean, timestamped text for fine-tuning or RAG pipelines
  • Video SEO — extract captions for keyword analysis and search optimization
  • Accessibility — generate text transcripts for hearing-impaired audiences
  • Research — analyze spoken content across large video datasets
  • Summarization — feed transcripts into AI tools for automatic video summaries

Example

Input:

{
"urlList": ["https://www.youtube.com/watch?v=jNQXAC9IVRw"],
"useAITranscription": false
}

Output includes the full transcript with timestamped segments and word-level timings (when available in the source captions).

FAQ

Does this work on any YouTube video?
Yes, as long as the video has captions (manual or auto-generated). If captions are missing, enable AI transcription fallback.

What languages are supported?
All languages that YouTube provides captions for — auto-generated or manual. English is selected by default.

How does the AI fallback work?
When enabled and no captions are found, the actor downloads the audio stream and transcribes it using a speech-to-text model. This adds a small processing delay per video.

Can I use this at scale?
Yes. Pass multiple URLs in the urlList array. The actor processes each video sequentially with residential proxy support for reliability.