YouTube Transcript Scraper + Whisper AI Fallback
Pricing
from $0.70 / 1,000 transcript extracteds
YouTube Transcript Scraper + Whisper AI Fallback
Extract YouTube transcripts from any video — even without captions. Whisper AI fallback, LLM-ready output, SRT/VTT export. No API key. $0.001/video.
Pricing
from $0.70 / 1,000 transcript extracteds
Rating
0.0
(0)
Developer
CodePoetry
Actor stats
0
Bookmarked
15
Total users
5
Monthly active users
2.9 hours
Issues response
2 days ago
Last modified
Categories
Share
YouTube Transcript Scraper — Captions + AI Speech-to-Text
Extract transcripts from any YouTube video — even when captions don't exist.
Most transcript tools stop working when a video has no captions. This actor doesn't. It pulls native captions when YouTube has them, and transcribes the audio with built-in speech-to-text AI when it doesn't. No external API key required.
Give it a single video, a full playlist, or an entire channel. Get transcripts in JSON, plain text, SRT, VTT, or an LLM-ready format — ready to download or feed into a pipeline. Built for bulk runs: concurrent processing, pay-per-result pricing, and no wasted resources on requests that don't need them.
New to Apify? Every new account gets $5 in free credits — no credit card needed. That's enough to transcribe an entire YouTube channel (~4,900 native transcripts).
How to scrape YouTube transcripts
- Click Try for free on this actor's page.
- Paste one or more YouTube URLs into the YouTube URLs field:
- Individual videos (
youtube.com/watch?v=...) - Playlists (
youtube.com/playlist?list=...) - Channels (
youtube.com/@channelname)
- Individual videos (
- Choose your Output Formats. Not sure? Start with Plain Text — it's the words as one block of text.
- Set Caption Languages if you need something other than English (default:
en). Use two-letter codes:es= Spanish,fr= French,de= German. - Videos without captions are automatically transcribed by the built-in AI model. Set a Max AI Minutes cap to control spend (default: 30 minutes).
- Click Start. A single video with captions finishes in under 30 seconds. A 100-video playlist typically finishes in 2–3 minutes.
- Download results as JSON, CSV, or Excel — or consume them via the Apify API.
A single video costs ~$0.006. No subscription, no commitment — pay only for what you use.
How it works
Step 1 — Expand Paste one or more URLs — single videos, playlists, or channel URLs. The actor resolves them into individual video URLs automatically.
Step 2 — Extract For each video, the actor checks for native captions (manual or auto-generated) in your requested languages. If captions exist, they are fetched and formatted immediately — no audio download needed.
Step 3 — Transcribe (when needed) If no captions are found, the actor automatically downloads the audio and transcribes it using a bundled faster-whisper model running on Apify's compute — no external transcription API needed. The output has the same structure as native caption output. Use Max AI minutes per run and Skip AI for long videos to control AI spend.
One failed video never stops the batch. Every item in the output dataset has an error_code field so you can filter results programmatically.
What you get
Every output item contains full video metadata and your transcript in the formats you requested.
Video metadata
| Field | Type | Description |
|---|---|---|
metadata.id | string | YouTube video ID |
metadata.title | string | Video title |
metadata.url | string | Canonical watch URL |
metadata.channel | string | Channel display name |
metadata.channel_id | string | Channel ID (UC-prefixed) |
metadata.channel_url | string | Channel URL |
metadata.description | string | Full video description |
metadata.duration | integer | Duration in seconds |
metadata.view_count | integer | Total views |
metadata.like_count | integer | Total likes |
metadata.upload_date | string | Upload date (YYYYMMDD) |
metadata.thumbnail | string | Highest-resolution thumbnail URL |
metadata.tags | array | Creator-set tags |
metadata.categories | array | YouTube categories |
Transcript fields
| Field | Type | Description |
|---|---|---|
language | string | Language code of the transcript (e.g. en, zh-TW) |
is_auto_generated | boolean | true if YouTube auto-generated the captions |
is_ai_generated | boolean | true if transcribed by the built-in AI model |
transcript_json | array | Timestamped segments [{start, end, text}]. When wordLevel: true, each segment also has a words array: [{start, text}] for native captions, [{start, end, text}] for AI transcriptions. |
transcript_text | string | Plain text transcript |
transcript_llm | string | Text with [Music], (laughter), and filler tokens stripped — ready for AI pipelines |
transcript_srt | string | SRT subtitle format. Always present for AI-transcribed items even if not in outputFormats. |
transcript_vtt | string | WebVTT format |
language_probability | number | AI model's confidence in the detected language (0–1). AI transcription only. |
language_was_forced | boolean | true when forceWhisperLanguage was set. AI transcription only. |
ai_duration_charged_min | integer | Minutes of AI time charged for this video. AI transcription only. |
ai_speech_duration_sec | number | Actual speech duration detected by the model in seconds (informational). AI transcription only. |
available_languages | array | Caption language codes YouTube provides on this video. Only present on NO_CAPTIONS_AVAILABLE and LANGUAGE_NOT_FOUND error items — use them to refine your languages input. |
error_code | string | Structured error code when extraction failed. See Error codes for the full reference table. |
Use cases
Here are ten real workflows built on this actor — from quick one-off summaries to scheduled competitive intelligence pipelines.
1. Claude Desktop / Claude.ai MCP integration
Connect this actor as an MCP server so Claude Desktop, Claude.ai Projects, Cursor, or any other MCP-compatible AI client can fetch a transcript just by being handed a YouTube URL. Ask Claude to "summarise this video" or "extract the key points from this lecture" — no copy-pasting required.
Recommended settings: outputFormats: ["llm"]
2. YouTube Shorts — per-word karaoke captions
YouTube Shorts often have auto-generated captions. Enable wordLevel: true to get per-word start times from the transcript_json field. Feed the result into a caption editor (CapCut, DaVinci Resolve, Adobe Premiere) to produce word-by-word highlighted captions — the "karaoke" style popular on short-form video.
Recommended settings: wordLevel: true, outputFormats: ["json", "srt"], subType: "auto"
3. Build a searchable knowledge base (RAG)
Bulk-extract every video from a company channel, educational YouTube account, or podcast series. Store the transcript_llm text in a vector database (Pinecone, Weaviate, pgvector) indexed by metadata.id and metadata.title. Use it as a retrieval-augmented generation (RAG) corpus so your chatbot can answer questions grounded in the exact video content.
Recommended settings: outputFormats: ["llm"], maxResults: 500. AI fallback is always active — videos without captions are transcribed automatically.
4. NLP / sentiment analysis pipeline
Extract transcripts from a brand's channel, a competitor's channel, or a set of product-review videos. Pipe transcript_text into an NLP pipeline (spaCy, HuggingFace Transformers, OpenAI) for sentiment scoring, named entity extraction, topic modeling, or keyword frequency. Useful for brand monitoring and competitive intelligence.
Recommended settings: outputFormats: ["text", "llm"], subType: "both"
5. LLM training data collection
Curate domain-specific transcripts from niche YouTube channels (medical lectures, legal explainers, coding tutorials, scientific talks) to build fine-tuning datasets. The transcript_llm format strips filler tokens cleanly. Use metadata.tags and metadata.categories to filter and label the data.
Recommended settings: outputFormats: ["llm"], maxAiMinutes cap per run to control cost.
6. SEO content repurposing
Turn a library of tutorials or vlogs into written content. Pass the transcript_llm field to an LLM prompt asking it to rewrite the transcript as a blog post, Twitter/X thread, newsletter section, or LinkedIn article. Combine with metadata.title, metadata.tags, and metadata.description for context.
Recommended settings: outputFormats: ["llm"], languages: ["en"]
7. Podcast / lecture transcription (no captions)
Podcasters who upload to YouTube and educators who post lecture recordings rarely add manual captions. The actor automatically transcribes them with faster-whisper. Use forceWhisperLanguage if you know the channel's language to skip the auto-detection window and reduce cost.
Recommended settings: forceWhisperLanguage: "en", skipAiFallbackIfLongerThan: 120 to skip anything over 2 hours.
8. Accessibility and caption quality audit
Compare YouTube's auto-generated captions (subType: "auto", is_auto_generated: true) against an AI transcription of the same video. Differences surface errors in the auto-generated track. Useful for accessibility compliance reviews or for creators who want to improve their caption quality before publishing.
Recommended settings: Two runs — one with subType: "auto" only, one with subType: "manual" to force AI fallback (since no manual captions exist, the actor will auto-transcribe).
9. Academic research and citation analysis
Download a researcher's full lecture series, a conference talk archive, or all videos from an academic YouTube channel. Index the transcripts by speaker, date (metadata.upload_date), and topic. Use to find when specific terminology first appeared, how arguments evolved over time, or to build a citation graph for a literature review.
Recommended settings: outputFormats: ["json", "text"], maxResults: 1000, languages set to the channel's primary language.
10. Competitive intelligence monitoring (scheduled runs)
Schedule the actor to run weekly on a competitor's channel URL. Set maxResults: 5 to pull only the latest videos. Use an Apify webhook to POST the new transcripts to Slack, a CRM, or an internal dashboard. Get an automatic digest of every new product announcement, feature mention, or pricing discussion your competitor publishes on YouTube.
Recommended settings: maxResults: 5, outputFormats: ["llm"], paired with an Apify schedule and webhook.
Output examples
Native caption output
{"metadata": {"id": "dQw4w9WgXcQ","title": "Rick Astley - Never Gonna Give You Up","channel": "Rick Astley","duration": 213,"view_count": 1757728410,"upload_date": "20091025"},"language": "en","is_auto_generated": false,"is_ai_generated": false,"transcript_json": [{ "start": 18.5, "end": 21.0, "text": "We're no strangers to love" },{ "start": 21.0, "end": 24.5, "text": "You know the rules and so do I" }],"transcript_text": "We're no strangers to love You know the rules and so do I ...","transcript_llm": "We're no strangers to love You know the rules and so do I ..."}
AI transcription output
When a video has no captions, AI transcription runs automatically:
{"metadata": { "title": "...", "duration": 240 },"is_ai_generated": true,"language": "en","language_probability": 0.9987,"ai_duration_charged_min": 4,"transcript_json": [{ "start": 0.0, "end": 3.2, "text": "Welcome to today's episode." }],"transcript_text": "Welcome to today's episode. ...","transcript_llm": "Welcome to today's episode. ..."}
Error item
{"url": "https://www.youtube.com/watch?v=...","metadata": { "title": "...", "duration": 720 },"error": "No subtitles found in requested languages.","error_code": "LANGUAGE_NOT_FOUND"}
Error codes:
Error items are never billed — Actor.charge() is only called on successful transcript results. The table below groups errors by cause so you know whether the issue is in your input or something outside your control.
Input errors — caused by the URLs or settings you provided:
| Error code | Meaning | What to do |
|---|---|---|
AGE_RESTRICTED | YouTube requires sign-in / age verification to access this video. | Remove the URL — cannot be bypassed. |
PRIVATE_OR_UNAVAILABLE | The video is private, deleted, or blocked in the runner's region. | Remove the URL or check if the video is public. |
LIVE_VIDEO | Live streams have no static captions to extract. | Wait until the stream ends, then retry. |
LANGUAGE_NOT_FOUND | Captions exist but not in the requested language. available_languages shows what's available. | Change your languages input. |
Budget / limit errors — the video could be transcribed, but a budget gate prevented it:
| Error code | Meaning | What to do |
|---|---|---|
NO_CAPTIONS_AVAILABLE | The video has zero caption tracks. AI fallback is attempted if budget allows. | Ensure AI fallback is not blocked by the limits below. |
AI_MINUTES_LIMIT_REACHED | The maxAiMinutes budget for this run is exhausted. | Increase maxAiMinutes and retry. |
AI_FALLBACK_SKIPPED_TOO_LONG | The video exceeds the skipAiFallbackIfLongerThan duration limit. | Increase or remove the limit. |
SPENDING_LIMIT_REACHED | The Apify account spending limit was hit — no further AI charges possible. | Adjust your Apify billing settings. |
Infrastructure / actor errors — not caused by your input; no charge is made:
| Error code | Meaning | What to do |
|---|---|---|
BOT_DETECTION | YouTube challenged the request. The actor retried through proxy tiers automatically. | Usually self-resolving. Switch proxy group if persistent. |
EXTRACTION_ERROR | Generic yt-dlp failure — the video may be temporarily unavailable on YouTube's side. | Retry later. |
AI_TRANSCRIPTION_FAILED | The Whisper model or audio download failed for this video. | Check run logs; retry. |
UNEXPECTED_ERROR | An unhandled exception in the actor code. The video gets an error item; other videos continue. | Open an issue if persistent. |
Pricing
This actor uses Pay-Per-Event pricing — you pay for results, not compute time or monthly fees.
In plain terms: a single native transcript costs $0.001. There is also a $0.005 one-time startup fee per run. Scraping one video costs around $0.006 total. From the second video onwards, this actor is cheaper than competitors charging $0.005 flat per transcript.
How much does a run cost?
| Videos | This actor | Typical competitor ($0.005/transcript) | You save |
|---|---|---|---|
| 1 | $0.006 | $0.005 | — |
| 2 | $0.007 | $0.010 | 30% |
| 10 | $0.015 | $0.050 | 70% |
| 100 | $0.105 | $0.500 | 79% |
| 1,000 | $1.005 | $5.000 | 80% |
Native transcript pricing
| Plan | Per transcript | 10 videos | 100 videos | 1,000 videos |
|---|---|---|---|---|
| Free | $0.001 | $0.015 | $0.105 | $1.005 |
| Bronze ($49/mo) | $0.0009 | $0.014 | $0.095 | $0.905 |
| Silver ($199/mo) | $0.0008 | $0.013 | $0.085 | $0.805 |
| Gold ($999/mo) | $0.0007 | $0.012 | $0.075 | $0.705 |
AI transcription pricing (when captions are unavailable)
AI is only charged for videos that actually need it — native captions are always checked first. Billed minutes are based on the published video duration (rounded up to the nearest minute, minimum 1 minute per video), not on detected speech length. The ai_speech_duration_sec field in the output is informational.
| Plan | Per AI minute | 10-min video | 60-min video |
|---|---|---|---|
| Free | $0.012 | $0.12 | $0.72 |
| Bronze | $0.011 | $0.11 | $0.66 |
| Silver | $0.010 | $0.10 | $0.60 |
| Gold | $0.009 | $0.09 | $0.54 |
Real-world examples
| Task | Videos | AI? | Estimated cost (Free plan) |
|---|---|---|---|
| Single video | 1 | No | ~$0.006 |
| YouTube playlist (20 videos) | 20 | No | ~$0.025 |
| Channel analysis (100 videos) | 100 | No | ~$0.105 |
| Podcast batch (20 × 45 min, no captions) | 20 | Yes — 900 AI min | ~$10.81 |
| Research corpus (500 videos, 20% no captions, 10 min avg) | 500 | Mixed | ~$12.41 |
On the free $5 credit: approximately 4,900 native transcripts (enough for an entire YouTube channel), or around 400 minutes of AI transcription.
The prices above are Pay-Per-Event charges only and do not include proxy costs. The default datacenter proxy costs nothing on clean runs — it is only used as a fallback when YouTube challenges a request. If the datacenter tier is also challenged, the actor auto-escalates to residential (~$0.40/GB), though this is rare. See Proxy configuration for details.
Built for bulk runs
Every part of this actor is designed to keep costs and resource use as low as possible, especially at scale:
- Pay per result, not per run. The $0.005 startup fee is charged once regardless of batch size — so a 1,000-video run costs nearly the same overhead as a 10-video run.
- No proxy cost on clean runs. Every request goes direct first. The proxy is only used as a silent fallback if YouTube challenges a specific request — and that happens rarely. Most runs pay $0 in proxy fees.
- AI model loaded on demand. The transcription model is only initialised when a video actually needs AI transcription. Runs that rely entirely on native captions start faster and use less memory.
- Concurrent processing. Up to 5 videos are processed in parallel, significantly reducing wall-clock time for large playlists or channels.
- Built-in spend controls. Max AI minutes per run and Skip AI for long videos let you set hard caps on AI spend before a run starts — no surprises from unexpectedly long videos.
- One failed video never slows the batch. Errors are logged and skipped immediately; the rest of the batch continues at full speed.
How it compares
| Feature | This actor | Typical alternatives |
|---|---|---|
| Transcribes videos with no captions | Yes — built-in AI, no external API key | No — returns an error |
| LLM-optimised output (filler stripped) | Yes — transcript_llm field | No |
| Spend safeguards (AI minute cap, skip long videos) | Yes | No |
| Native transcript price | $0.001 per transcript | Up to $0.005 — 5× more |
| No monthly subscription | Yes — pay only for what you run | Flat monthly fee |
| Batch: playlists and channels | Yes | Most |
| Output formats | JSON, Text, SRT, VTT, LLM | Usually JSON only |
| Word-level timestamps | Yes | Rare |
| YouTube Data API key required | No | No |
| Automatic access challenge bypass | Yes — retries via proxy when needed, direct otherwise | Varies |
| MCP-compatible (Claude Desktop, Cursor, etc.) | Yes — via Apify MCP integration | Rare |
Who uses it
AI and LLM developers
Feed transcripts into RAG pipelines, summarisation chains, or fine-tuning datasets. The transcript_llm field strips [Music], (laughter), and other filler tokens that bloat context windows. Compatible with LangChain, LlamaIndex, and other Python AI frameworks.
Content creators and marketers
Turn any YouTube video into a blog post or newsletter draft without manual transcription. Extract pull quotes from interviews. Run an entire channel archive in one batch.
SEO professionals and researchers
Extract keyword data from video transcripts at scale. Build text content from videos to rank alongside YouTube results on Google. Analyse a competitor's spoken messaging for topic and positioning gaps.
Data scientists and academics
Build NLP corpora from lectures, conference talks, and documentary interviews. Process multilingual transcripts for cross-language analysis. Run large dataset collection jobs overnight via the API.
Developers building MCP-integrated AI tools
Connect this actor as an MCP server so Claude Desktop, Claude.ai Projects, Cursor, or any MCP-compatible client can fetch and process YouTube transcripts in a single tool call. No copy-pasting, no API wiring — just hand the model a URL.
Integration examples
Python — Apify client
Get your API token from the Apify Console under Settings → Integrations. Keep it secret — treat it like a password.
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("codepoetry/youtube-transcript-ai-scraper").call(run_input={"startUrls": [{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}],"languages": ["en"],"outputFormats": ["json", "llm"],})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item["metadata"]["title"])print(item["transcript_text"][:200])
JavaScript / Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('codepoetry/youtube-transcript-ai-scraper').call({startUrls: [{ url: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ' }],languages: ['en'],outputFormats: ['json', 'llm'],});const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach(item => console.log(item.metadata.title, item.transcript_text.slice(0, 200)));
LangChain / RAG pipeline
from apify_client import ApifyClientfrom langchain.docstore.document import Documentclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("codepoetry/youtube-transcript-ai-scraper").call(run_input={"startUrls": [{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}],"outputFormats": ["llm"],"maxAiMinutes": 60,})docs = [Document(page_content=item["transcript_llm"],metadata={"source": item["metadata"]["url"], "title": item["metadata"]["title"]},)for item in client.dataset(run["defaultDatasetId"]).iterate_items()if "transcript_llm" in item]# docs is ready for any LangChain vector store or retriever
Run on a schedule or trigger a webhook
To run this actor on a schedule or receive a webhook notification when a run finishes, use the Schedules and Integrations tabs on the actor's page in the Apify Console. See the Apify scheduling docs and webhook docs for setup instructions.
Advanced options
All options can be set in the Input form or passed as JSON when calling via the API.
| Option | UI label | Default | When to use |
|---|---|---|---|
maxResults | Max videos | 10 | Cap how many videos are fetched from a playlist or channel. Single video URLs ignore this. |
languages | Caption languages | ["en"] | Preferred caption languages in order of priority. First match on the video is used. Codes: en English, es Spanish, fr French, de German. |
subType | Caption source | "both" | "manual" = human captions only · "auto" = auto-generated only · "both" = prefer manual, fall back to auto |
outputFormats | Output formats | json, text, llm | Which transcript formats to write to the dataset. |
wordLevel | Word-level timestamps | false | Add per-word timestamps to JSON segments. Not available for manual captions. |
maxAiMinutes | Max AI minutes | 30 | Hard cap on AI transcription minutes per run. Set to 0 for unlimited. Recommended when processing unknown playlists. |
skipAiFallbackIfLongerThan | Skip AI for videos longer than | 0 (off) | Skip AI for videos exceeding N minutes. Avoids unexpected costs from long videos. |
forceWhisperLanguage | AI transcription language | auto-detect | Force AI to a specific language (ISO code, e.g. "es"). Skips 30-second detection window, saves ~20% per video. |
Proxy configuration
Proxy is always active and fully automatic — no configuration needed. Every request goes direct first, and the proxy is only used if YouTube challenges the request. This costs nothing on a clean run.
How it works
Occasionally YouTube asks automated requests to verify they are not bots. When this happens the actor automatically escalates through progressively stronger proxy tiers until the request succeeds:
- Direct request (no proxy) — used first for every video. Zero cost.
- Datacenter proxy — fast and free on most plans. Handles the vast majority of challenges.
- Residential proxy — highest trust with YouTube. Used only if the datacenter tier is also challenged.
The escalation is fully automatic — you do not need to configure anything. If all tiers are exhausted, the affected video is marked with a BOT_DETECTION error code and the actor continues with the remaining videos.
Proxy costs
| Type | Cost | Notes |
|---|---|---|
| Datacenter (Apify) | Free on most plans | Default first tier. Zero bandwidth consumed on clean runs. |
| Residential (Apify) | ~$0.40 / GB | Auto-escalation tier. Only consumed if datacenter proxy is also challenged — rare. |
Proxy costs are billed from your Apify account balance as a separate line item, alongside this actor's Pay-Per-Event charges. On a typical run with no bot challenges: $0 proxy cost. If datacenter retry is needed: approximately 0.5 MB per affected video. Residential is only consumed if datacenter also fails — this is rare and keeps costs minimal even in bulk runs.
Limitations
All non-recoverable failures produce a dataset item with an error_code field. See the Error codes table for the full reference.
Constraints:
- No translation — the actor returns the original spoken language only.
- YouTube may rate-limit very large batches (100+ videos). The automatic proxy escalation handles most cases transparently.
maxResultsdefault of 10 is intentionally conservative — increase it for large playlists or full channel archives.
Memory
This actor runs with a fixed 4 GB allocation. No configuration needed — the same setting works for both native caption extraction (lightweight) and AI transcription (which loads the faster-whisper speech model into memory).
Frequently asked questions
How much does one video cost?
A single video with captions costs approximately $0.006 on the Free plan — $0.001 for the transcript plus a $0.005 one-time startup fee per run. A second video in the same run adds just $0.001. AI transcription adds $0.012 per minute of audio on the Free plan.
What happens if a video has no captions?
The actor automatically downloads the audio and transcribes it using the built-in AI model — the output has the same structure as a native caption result, with is_ai_generated: true. If the maxAiMinutes cap is reached, remaining caption-free videos receive an AI_MINUTES_LIMIT_REACHED error item and the run continues. The available_languages field lists the caption language codes YouTube does provide on the video.
Does it work for playlists and channels?
Yes. Paste a playlist or channel URL and the actor expands it into individual videos automatically. Use maxResults to cap how many are fetched. If one video is private, age-restricted, or unavailable, it gets an error item while the rest continue.
What languages are supported?
Native captions: Any language YouTube provides captions for — typically 100+ languages for auto-generated captions. Pass multiple language codes (e.g. ["en", "es"]) to fall back automatically when your first choice is unavailable.
AI transcription: 99 languages, including English, Spanish, French, German, Portuguese, Japanese, Chinese, Arabic, and Hindi.
What output formats are available?
- JSON — timestamped segments as an array
[{start, end, text}, ...] - Text — plain text joined from all segments
- LLM — text with
[Music],(laughter), and other filler tokens stripped, ready for AI pipelines - SRT — standard subtitle format for video players and editing software
- VTT — WebVTT format for HTML5
<video>elements
Multiple formats can be requested in a single run.
Can I set a spending limit?
Yes. Max AI minutes per run caps total AI-transcribed minutes per run (default: 30). Skip AI for long videos skips videos exceeding a duration threshold automatically. Set the AI minutes cap to 0 for unlimited AI transcription.
How accurate is AI transcription?
Accurate for clear speech in widely spoken languages. Accuracy degrades with heavy accents, domain-specific jargon, or poor audio quality. The language_probability field indicates the model's confidence in the detected language. For quality-critical work, treat AI transcripts as a first draft and review them.
What is a YouTube transcript scraper?
A YouTube transcript scraper extracts the spoken text from YouTube videos. This actor retrieves captions when YouTube provides them, or generates a transcript from the audio when captions are unavailable.
Does this translate transcripts?
No. The actor returns the original spoken language. Use a separate translation service for translation.
Does it work for YouTube Shorts?
Yes. Shorts use the same caption infrastructure as regular videos.
Do I need a YouTube Data API key?
No. The actor accesses publicly available caption data without any YouTube API credentials.
How does this compare to the YouTube Data API?
The YouTube Data API v3 does not provide transcript data. It requires a Google Cloud project, OAuth credentials, and per-day quotas. This actor requires none of that.
How does this compare to the youtube-transcript-api Python library?
The youtube-transcript-api library is fine for a handful of videos in your own Python script. This actor adds cloud infrastructure, batch processing across playlists and channels, AI transcription for caption-free videos, multiple output formats, scheduling, and Apify platform integrations (webhooks, REST API, n8n, Make, Zapier).
What do the run log messages mean?
Open the Log tab on any completed run to see what the actor did. Here are the messages you may encounter:
| Message | What it means | Action needed? |
|---|---|---|
Processing video: https://... | Normal progress — one line per video | None |
Expanding URL: https://... | Resolving a playlist or channel to individual videos | None |
Total unique videos to process: N | How many videos were found after deduplication | None |
Loading AI transcription model... | AI model is loading — only happens once per run | None |
AI transcription model ready. | Model loaded, ready to transcribe | None |
AI transcription language: set to 'en' | Language was set by your forceWhisperLanguage input | None |
AI transcription language: auto-detecting | Language will be detected from the audio | None |
Downloading audio for AI transcription: https://... | Audio is being downloaded for a caption-free video — normal progress | None |
Running AI transcription... | AI model is actively processing the audio | None |
No subtitles found. Running AI fallback for ... (N min estimated) | AI transcription is starting for a caption-free video | None |
AI transcription complete — language: en (confidence: 99%) | AI transcription finished for one video | None |
YouTube access challenge for ... — retrying via proxy tier 1/2... | YouTube challenged the request; escalating through proxy tiers | None — handled automatically |
YouTube access challenge for ... — no proxy tiers available | Same challenge but no proxy could be created | Check Apify proxy service status |
Subtitle fetch failed for ... — retrying via proxy tier 1/2... | Subtitle download failed; escalating through proxy tiers | None — handled automatically |
Subtitle fetch failed for ... (lang): HTTP 429 | Subtitle download failed after all retries (rate-limited) | Try again later or reduce batch size. The actor continues with other videos. |
YouTube access challenge on audio download ... — retrying via proxy tier 1/2... | Audio download challenged; escalating through proxy tiers | None — handled automatically |
Audio download failed ... after exhausting all N proxy tiers. | All proxy tiers failed for audio download | Try again later. The video gets an AI_TRANSCRIPTION_FAILED item. |
Skipping AI fallback for ...: needs N min but only Y remain | The maxAiMinutes cap was reached for this run | Raise maxAiMinutes if you want to transcribe more |
Apify spending limit reached. No further AI charges will be made. | Your Apify account spending limit was hit | Check your Apify billing settings |
Audio download failed for ...: <error> | Could not download the audio for AI transcription | Check the error detail; that video gets an AI_TRANSCRIPTION_FAILED item in the dataset |
AI fallback failed for ...: <error> | AI transcription error for this video | Check the error detail; the video gets an error item in the dataset |
Unhandled error for ...: <error> | Unexpected failure — the video gets an UNEXPECTED_ERROR item | Open an issue if this happens repeatedly |
Error items written to the dataset always have an error_code field — use that for programmatic filtering rather than parsing log text.
Is it legal to extract YouTube transcripts?
YouTube's Terms of Service prohibit automated scraping, and you are responsible for complying with their Terms and applicable law in your jurisdiction. This actor accesses only publicly available caption data — the same data visible when you click "Open transcript" in the YouTube player. It does not bypass any authentication, access private content, or collect personal user data. See Apify's web scraping legality guide for a broader overview.
Language Reference
YouTube caption languages (130+ codes)
Use these codes in the Caption languages (languages) input. Regional variants such as zh-TW and zh-CN are also accepted where YouTube differentiates them.
| Code | Language | Code | Language | Code | Language |
|---|---|---|---|---|---|
af | Afrikaans | ak | Akan | sq | Albanian |
am | Amharic | ar | Arabic | hy | Armenian |
as | Assamese | ay | Aymara | az | Azerbaijani |
bn | Bangla | eu | Basque | be | Belarusian |
bho | Bhojpuri | bs | Bosnian | bg | Bulgarian |
my | Burmese | ca | Catalan | ceb | Cebuano |
zh | Chinese | zh-CN | Chinese (China) | zh-HK | Chinese (Hong Kong) |
zh-SG | Chinese (Singapore) | zh-TW | Chinese (Taiwan) | zh-Hans | Chinese (Simplified) |
zh-Hant | Chinese (Traditional) | co | Corsican | hr | Croatian |
cs | Czech | da | Danish | dv | Divehi |
nl | Dutch | en | English | en-US | English (United States) |
eo | Esperanto | et | Estonian | ee | Ewe |
fil | Filipino | fi | Finnish | fr | French |
gl | Galician | lg | Ganda | ka | Georgian |
de | German | el | Greek | gn | Guarani |
gu | Gujarati | ht | Haitian Creole | ha | Hausa |
haw | Hawaiian | iw | Hebrew | hi | Hindi |
hmn | Hmong | hu | Hungarian | is | Icelandic |
ig | Igbo | id | Indonesian | ga | Irish |
it | Italian | ja | Japanese | jv | Javanese |
kn | Kannada | kk | Kazakh | km | Khmer |
rw | Kinyarwanda | ko | Korean | kri | Krio |
ku | Kurdish | ky | Kyrgyz | lo | Lao |
la | Latin | lv | Latvian | ln | Lingala |
lt | Lithuanian | lb | Luxembourgish | mk | Macedonian |
mg | Malagasy | ms | Malay | ml | Malayalam |
mt | Maltese | mi | Māori | mr | Marathi |
mn | Mongolian | ne | Nepali | nso | Northern Sotho |
no | Norwegian | ny | Nyanja | or | Odia |
om | Oromo | ps | Pashto | fa | Persian |
pl | Polish | pt | Portuguese | pa | Punjabi |
qu | Quechua | ro | Romanian | ru | Russian |
sm | Samoan | sa | Sanskrit | gd | Scottish Gaelic |
sr | Serbian | sn | Shona | sd | Sindhi |
si | Sinhala | sk | Slovak | sl | Slovenian |
so | Somali | st | Southern Sotho | es | Spanish |
su | Sundanese | sw | Swahili | sv | Swedish |
tg | Tajik | ta | Tamil | tt | Tatar |
te | Telugu | th | Thai | ti | Tigrinya |
ts | Tsonga | tr | Turkish | tk | Turkmen |
uk | Ukrainian | ur | Urdu | ug | Uyghur |
uz | Uzbek | vi | Vietnamese | cy | Welsh |
fy | Western Frisian | xh | Xhosa | yi | Yiddish |
yo | Yoruba | zu | Zulu |
Not all codes will have captions on every video. When a requested code is not available, the actor returns a
LANGUAGE_NOT_FOUNDorNO_CAPTIONS_AVAILABLEerror item with anavailable_languagesfield listing the codes that are actually present on that video.
AI transcription languages (99 codes)
Use one of these codes in the AI transcription language (forceWhisperLanguage) input to skip auto-detection. If a language is not in this list, the AI model cannot transcribe it — use auto-detect instead.
| Code | Language | Code | Language | Code | Language |
|---|---|---|---|---|---|
af | Afrikaans | am | Amharic | ar | Arabic |
as | Assamese | az | Azerbaijani | ba | Bashkir |
be | Belarusian | bg | Bulgarian | bn | Bengali |
bo | Tibetan | br | Breton | bs | Bosnian |
ca | Catalan | cs | Czech | cy | Welsh |
da | Danish | de | German | el | Greek |
en | English | es | Spanish | et | Estonian |
eu | Basque | fa | Persian | fi | Finnish |
fo | Faroese | fr | French | gl | Galician |
gu | Gujarati | ha | Hausa | haw | Hawaiian |
he | Hebrew | hi | Hindi | hr | Croatian |
ht | Haitian Creole | hu | Hungarian | hy | Armenian |
id | Indonesian | is | Icelandic | it | Italian |
ja | Japanese | jw | Javanese | ka | Georgian |
kk | Kazakh | km | Khmer | kn | Kannada |
ko | Korean | la | Latin | lb | Luxembourgish |
ln | Lingala | lo | Lao | lt | Lithuanian |
lv | Latvian | mg | Malagasy | mi | Māori |
mk | Macedonian | ml | Malayalam | mn | Mongolian |
mr | Marathi | ms | Malay | mt | Maltese |
my | Burmese | ne | Nepali | nl | Dutch |
nn | Nynorsk | no | Norwegian | oc | Occitan |
pa | Punjabi | pl | Polish | ps | Pashto |
pt | Portuguese | ro | Romanian | ru | Russian |
sa | Sanskrit | sd | Sindhi | si | Sinhala |
sk | Slovak | sl | Slovenian | sn | Shona |
so | Somali | sq | Albanian | sr | Serbian |
su | Sundanese | sv | Swedish | sw | Swahili |
ta | Tamil | te | Telugu | tg | Tajik |
th | Thai | tl | Filipino | tr | Turkish |
tt | Tatar | uk | Ukrainian | ur | Urdu |
uz | Uzbek | vi | Vietnamese | yi | Yiddish |
yo | Yoruba | yue | Cantonese | zh | Chinese |
The AI model supports 99 languages. Accuracy varies by language — it is highest for widely spoken languages (English, Spanish, French, German, etc.) and may degrade for low-resource languages. The
language_probabilityfield in the output indicates the model's confidence in the detected or forced language.
About this actor
This actor runs on the Apify platform. AI transcription uses faster-whisper (MIT license), bundled into the Docker image so there is no model download delay on first run.
Found a bug or have a feature request? Use the Issues tab on this actor's page.