YouTube Transcript Scraper + Whisper AI Fallback avatar

YouTube Transcript Scraper + Whisper AI Fallback

Pricing

from $0.70 / 1,000 transcript extracteds

Go to Apify Store
YouTube Transcript Scraper + Whisper AI Fallback

YouTube Transcript Scraper + Whisper AI Fallback

Extract YouTube transcripts from any video — even without captions. Whisper AI fallback, LLM-ready output, SRT/VTT export. No API key. $0.001/video.

Pricing

from $0.70 / 1,000 transcript extracteds

Rating

0.0

(0)

Developer

CodePoetry

CodePoetry

Maintained by Community

Actor stats

0

Bookmarked

15

Total users

5

Monthly active users

2.9 hours

Issues response

2 days ago

Last modified

Share

YouTube Transcript Scraper — Captions + AI Speech-to-Text

Extract transcripts from any YouTube video — even when captions don't exist.

Most transcript tools stop working when a video has no captions. This actor doesn't. It pulls native captions when YouTube has them, and transcribes the audio with built-in speech-to-text AI when it doesn't. No external API key required.

Give it a single video, a full playlist, or an entire channel. Get transcripts in JSON, plain text, SRT, VTT, or an LLM-ready format — ready to download or feed into a pipeline. Built for bulk runs: concurrent processing, pay-per-result pricing, and no wasted resources on requests that don't need them.

New to Apify? Every new account gets $5 in free credits — no credit card needed. That's enough to transcribe an entire YouTube channel (~4,900 native transcripts).


How to scrape YouTube transcripts

  1. Click Try for free on this actor's page.
  2. Paste one or more YouTube URLs into the YouTube URLs field:
    • Individual videos (youtube.com/watch?v=...)
    • Playlists (youtube.com/playlist?list=...)
    • Channels (youtube.com/@channelname)
  3. Choose your Output Formats. Not sure? Start with Plain Text — it's the words as one block of text.
  4. Set Caption Languages if you need something other than English (default: en). Use two-letter codes: es = Spanish, fr = French, de = German.
  5. Videos without captions are automatically transcribed by the built-in AI model. Set a Max AI Minutes cap to control spend (default: 30 minutes).
  6. Click Start. A single video with captions finishes in under 30 seconds. A 100-video playlist typically finishes in 2–3 minutes.
  7. Download results as JSON, CSV, or Excel — or consume them via the Apify API.

A single video costs ~$0.006. No subscription, no commitment — pay only for what you use.


How it works

Step 1 — Expand Paste one or more URLs — single videos, playlists, or channel URLs. The actor resolves them into individual video URLs automatically.

Step 2 — Extract For each video, the actor checks for native captions (manual or auto-generated) in your requested languages. If captions exist, they are fetched and formatted immediately — no audio download needed.

Step 3 — Transcribe (when needed) If no captions are found, the actor automatically downloads the audio and transcribes it using a bundled faster-whisper model running on Apify's compute — no external transcription API needed. The output has the same structure as native caption output. Use Max AI minutes per run and Skip AI for long videos to control AI spend.

One failed video never stops the batch. Every item in the output dataset has an error_code field so you can filter results programmatically.


What you get

Every output item contains full video metadata and your transcript in the formats you requested.

Video metadata

FieldTypeDescription
metadata.idstringYouTube video ID
metadata.titlestringVideo title
metadata.urlstringCanonical watch URL
metadata.channelstringChannel display name
metadata.channel_idstringChannel ID (UC-prefixed)
metadata.channel_urlstringChannel URL
metadata.descriptionstringFull video description
metadata.durationintegerDuration in seconds
metadata.view_countintegerTotal views
metadata.like_countintegerTotal likes
metadata.upload_datestringUpload date (YYYYMMDD)
metadata.thumbnailstringHighest-resolution thumbnail URL
metadata.tagsarrayCreator-set tags
metadata.categoriesarrayYouTube categories

Transcript fields

FieldTypeDescription
languagestringLanguage code of the transcript (e.g. en, zh-TW)
is_auto_generatedbooleantrue if YouTube auto-generated the captions
is_ai_generatedbooleantrue if transcribed by the built-in AI model
transcript_jsonarrayTimestamped segments [{start, end, text}]. When wordLevel: true, each segment also has a words array: [{start, text}] for native captions, [{start, end, text}] for AI transcriptions.
transcript_textstringPlain text transcript
transcript_llmstringText with [Music], (laughter), and filler tokens stripped — ready for AI pipelines
transcript_srtstringSRT subtitle format. Always present for AI-transcribed items even if not in outputFormats.
transcript_vttstringWebVTT format
language_probabilitynumberAI model's confidence in the detected language (0–1). AI transcription only.
language_was_forcedbooleantrue when forceWhisperLanguage was set. AI transcription only.
ai_duration_charged_minintegerMinutes of AI time charged for this video. AI transcription only.
ai_speech_duration_secnumberActual speech duration detected by the model in seconds (informational). AI transcription only.
available_languagesarrayCaption language codes YouTube provides on this video. Only present on NO_CAPTIONS_AVAILABLE and LANGUAGE_NOT_FOUND error items — use them to refine your languages input.
error_codestringStructured error code when extraction failed. See Error codes for the full reference table.

Use cases

Here are ten real workflows built on this actor — from quick one-off summaries to scheduled competitive intelligence pipelines.

1. Claude Desktop / Claude.ai MCP integration

Connect this actor as an MCP server so Claude Desktop, Claude.ai Projects, Cursor, or any other MCP-compatible AI client can fetch a transcript just by being handed a YouTube URL. Ask Claude to "summarise this video" or "extract the key points from this lecture" — no copy-pasting required.

Recommended settings: outputFormats: ["llm"]


2. YouTube Shorts — per-word karaoke captions

YouTube Shorts often have auto-generated captions. Enable wordLevel: true to get per-word start times from the transcript_json field. Feed the result into a caption editor (CapCut, DaVinci Resolve, Adobe Premiere) to produce word-by-word highlighted captions — the "karaoke" style popular on short-form video.

Recommended settings: wordLevel: true, outputFormats: ["json", "srt"], subType: "auto"


3. Build a searchable knowledge base (RAG)

Bulk-extract every video from a company channel, educational YouTube account, or podcast series. Store the transcript_llm text in a vector database (Pinecone, Weaviate, pgvector) indexed by metadata.id and metadata.title. Use it as a retrieval-augmented generation (RAG) corpus so your chatbot can answer questions grounded in the exact video content.

Recommended settings: outputFormats: ["llm"], maxResults: 500. AI fallback is always active — videos without captions are transcribed automatically.


4. NLP / sentiment analysis pipeline

Extract transcripts from a brand's channel, a competitor's channel, or a set of product-review videos. Pipe transcript_text into an NLP pipeline (spaCy, HuggingFace Transformers, OpenAI) for sentiment scoring, named entity extraction, topic modeling, or keyword frequency. Useful for brand monitoring and competitive intelligence.

Recommended settings: outputFormats: ["text", "llm"], subType: "both"


5. LLM training data collection

Curate domain-specific transcripts from niche YouTube channels (medical lectures, legal explainers, coding tutorials, scientific talks) to build fine-tuning datasets. The transcript_llm format strips filler tokens cleanly. Use metadata.tags and metadata.categories to filter and label the data.

Recommended settings: outputFormats: ["llm"], maxAiMinutes cap per run to control cost.


6. SEO content repurposing

Turn a library of tutorials or vlogs into written content. Pass the transcript_llm field to an LLM prompt asking it to rewrite the transcript as a blog post, Twitter/X thread, newsletter section, or LinkedIn article. Combine with metadata.title, metadata.tags, and metadata.description for context.

Recommended settings: outputFormats: ["llm"], languages: ["en"]


7. Podcast / lecture transcription (no captions)

Podcasters who upload to YouTube and educators who post lecture recordings rarely add manual captions. The actor automatically transcribes them with faster-whisper. Use forceWhisperLanguage if you know the channel's language to skip the auto-detection window and reduce cost.

Recommended settings: forceWhisperLanguage: "en", skipAiFallbackIfLongerThan: 120 to skip anything over 2 hours.


8. Accessibility and caption quality audit

Compare YouTube's auto-generated captions (subType: "auto", is_auto_generated: true) against an AI transcription of the same video. Differences surface errors in the auto-generated track. Useful for accessibility compliance reviews or for creators who want to improve their caption quality before publishing.

Recommended settings: Two runs — one with subType: "auto" only, one with subType: "manual" to force AI fallback (since no manual captions exist, the actor will auto-transcribe).


9. Academic research and citation analysis

Download a researcher's full lecture series, a conference talk archive, or all videos from an academic YouTube channel. Index the transcripts by speaker, date (metadata.upload_date), and topic. Use to find when specific terminology first appeared, how arguments evolved over time, or to build a citation graph for a literature review.

Recommended settings: outputFormats: ["json", "text"], maxResults: 1000, languages set to the channel's primary language.


10. Competitive intelligence monitoring (scheduled runs)

Schedule the actor to run weekly on a competitor's channel URL. Set maxResults: 5 to pull only the latest videos. Use an Apify webhook to POST the new transcripts to Slack, a CRM, or an internal dashboard. Get an automatic digest of every new product announcement, feature mention, or pricing discussion your competitor publishes on YouTube.

Recommended settings: maxResults: 5, outputFormats: ["llm"], paired with an Apify schedule and webhook.


Output examples

Native caption output

{
"metadata": {
"id": "dQw4w9WgXcQ",
"title": "Rick Astley - Never Gonna Give You Up",
"channel": "Rick Astley",
"duration": 213,
"view_count": 1757728410,
"upload_date": "20091025"
},
"language": "en",
"is_auto_generated": false,
"is_ai_generated": false,
"transcript_json": [
{ "start": 18.5, "end": 21.0, "text": "We're no strangers to love" },
{ "start": 21.0, "end": 24.5, "text": "You know the rules and so do I" }
],
"transcript_text": "We're no strangers to love You know the rules and so do I ...",
"transcript_llm": "We're no strangers to love You know the rules and so do I ..."
}

AI transcription output

When a video has no captions, AI transcription runs automatically:

{
"metadata": { "title": "...", "duration": 240 },
"is_ai_generated": true,
"language": "en",
"language_probability": 0.9987,
"ai_duration_charged_min": 4,
"transcript_json": [
{ "start": 0.0, "end": 3.2, "text": "Welcome to today's episode." }
],
"transcript_text": "Welcome to today's episode. ...",
"transcript_llm": "Welcome to today's episode. ..."
}

Error item

{
"url": "https://www.youtube.com/watch?v=...",
"metadata": { "title": "...", "duration": 720 },
"error": "No subtitles found in requested languages.",
"error_code": "LANGUAGE_NOT_FOUND"
}

Error codes:

Error items are never billedActor.charge() is only called on successful transcript results. The table below groups errors by cause so you know whether the issue is in your input or something outside your control.

Input errors — caused by the URLs or settings you provided:

Error codeMeaningWhat to do
AGE_RESTRICTEDYouTube requires sign-in / age verification to access this video.Remove the URL — cannot be bypassed.
PRIVATE_OR_UNAVAILABLEThe video is private, deleted, or blocked in the runner's region.Remove the URL or check if the video is public.
LIVE_VIDEOLive streams have no static captions to extract.Wait until the stream ends, then retry.
LANGUAGE_NOT_FOUNDCaptions exist but not in the requested language. available_languages shows what's available.Change your languages input.

Budget / limit errors — the video could be transcribed, but a budget gate prevented it:

Error codeMeaningWhat to do
NO_CAPTIONS_AVAILABLEThe video has zero caption tracks. AI fallback is attempted if budget allows.Ensure AI fallback is not blocked by the limits below.
AI_MINUTES_LIMIT_REACHEDThe maxAiMinutes budget for this run is exhausted.Increase maxAiMinutes and retry.
AI_FALLBACK_SKIPPED_TOO_LONGThe video exceeds the skipAiFallbackIfLongerThan duration limit.Increase or remove the limit.
SPENDING_LIMIT_REACHEDThe Apify account spending limit was hit — no further AI charges possible.Adjust your Apify billing settings.

Infrastructure / actor errors — not caused by your input; no charge is made:

Error codeMeaningWhat to do
BOT_DETECTIONYouTube challenged the request. The actor retried through proxy tiers automatically.Usually self-resolving. Switch proxy group if persistent.
EXTRACTION_ERRORGeneric yt-dlp failure — the video may be temporarily unavailable on YouTube's side.Retry later.
AI_TRANSCRIPTION_FAILEDThe Whisper model or audio download failed for this video.Check run logs; retry.
UNEXPECTED_ERRORAn unhandled exception in the actor code. The video gets an error item; other videos continue.Open an issue if persistent.

Pricing

This actor uses Pay-Per-Event pricing — you pay for results, not compute time or monthly fees.

In plain terms: a single native transcript costs $0.001. There is also a $0.005 one-time startup fee per run. Scraping one video costs around $0.006 total. From the second video onwards, this actor is cheaper than competitors charging $0.005 flat per transcript.

How much does a run cost?

VideosThis actorTypical competitor ($0.005/transcript)You save
1$0.006$0.005
2$0.007$0.01030%
10$0.015$0.05070%
100$0.105$0.50079%
1,000$1.005$5.00080%

Native transcript pricing

PlanPer transcript10 videos100 videos1,000 videos
Free$0.001$0.015$0.105$1.005
Bronze ($49/mo)$0.0009$0.014$0.095$0.905
Silver ($199/mo)$0.0008$0.013$0.085$0.805
Gold ($999/mo)$0.0007$0.012$0.075$0.705

AI transcription pricing (when captions are unavailable)

AI is only charged for videos that actually need it — native captions are always checked first. Billed minutes are based on the published video duration (rounded up to the nearest minute, minimum 1 minute per video), not on detected speech length. The ai_speech_duration_sec field in the output is informational.

PlanPer AI minute10-min video60-min video
Free$0.012$0.12$0.72
Bronze$0.011$0.11$0.66
Silver$0.010$0.10$0.60
Gold$0.009$0.09$0.54

Real-world examples

TaskVideosAI?Estimated cost (Free plan)
Single video1No~$0.006
YouTube playlist (20 videos)20No~$0.025
Channel analysis (100 videos)100No~$0.105
Podcast batch (20 × 45 min, no captions)20Yes — 900 AI min~$10.81
Research corpus (500 videos, 20% no captions, 10 min avg)500Mixed~$12.41

On the free $5 credit: approximately 4,900 native transcripts (enough for an entire YouTube channel), or around 400 minutes of AI transcription.

The prices above are Pay-Per-Event charges only and do not include proxy costs. The default datacenter proxy costs nothing on clean runs — it is only used as a fallback when YouTube challenges a request. If the datacenter tier is also challenged, the actor auto-escalates to residential (~$0.40/GB), though this is rare. See Proxy configuration for details.

Built for bulk runs

Every part of this actor is designed to keep costs and resource use as low as possible, especially at scale:

  • Pay per result, not per run. The $0.005 startup fee is charged once regardless of batch size — so a 1,000-video run costs nearly the same overhead as a 10-video run.
  • No proxy cost on clean runs. Every request goes direct first. The proxy is only used as a silent fallback if YouTube challenges a specific request — and that happens rarely. Most runs pay $0 in proxy fees.
  • AI model loaded on demand. The transcription model is only initialised when a video actually needs AI transcription. Runs that rely entirely on native captions start faster and use less memory.
  • Concurrent processing. Up to 5 videos are processed in parallel, significantly reducing wall-clock time for large playlists or channels.
  • Built-in spend controls. Max AI minutes per run and Skip AI for long videos let you set hard caps on AI spend before a run starts — no surprises from unexpectedly long videos.
  • One failed video never slows the batch. Errors are logged and skipped immediately; the rest of the batch continues at full speed.

How it compares

FeatureThis actorTypical alternatives
Transcribes videos with no captionsYes — built-in AI, no external API keyNo — returns an error
LLM-optimised output (filler stripped)Yes — transcript_llm fieldNo
Spend safeguards (AI minute cap, skip long videos)YesNo
Native transcript price$0.001 per transcriptUp to $0.005 — 5× more
No monthly subscriptionYes — pay only for what you runFlat monthly fee
Batch: playlists and channelsYesMost
Output formatsJSON, Text, SRT, VTT, LLMUsually JSON only
Word-level timestampsYesRare
YouTube Data API key requiredNoNo
Automatic access challenge bypassYes — retries via proxy when needed, direct otherwiseVaries
MCP-compatible (Claude Desktop, Cursor, etc.)Yes — via Apify MCP integrationRare

Who uses it

AI and LLM developers

Feed transcripts into RAG pipelines, summarisation chains, or fine-tuning datasets. The transcript_llm field strips [Music], (laughter), and other filler tokens that bloat context windows. Compatible with LangChain, LlamaIndex, and other Python AI frameworks.

Content creators and marketers

Turn any YouTube video into a blog post or newsletter draft without manual transcription. Extract pull quotes from interviews. Run an entire channel archive in one batch.

SEO professionals and researchers

Extract keyword data from video transcripts at scale. Build text content from videos to rank alongside YouTube results on Google. Analyse a competitor's spoken messaging for topic and positioning gaps.

Data scientists and academics

Build NLP corpora from lectures, conference talks, and documentary interviews. Process multilingual transcripts for cross-language analysis. Run large dataset collection jobs overnight via the API.

Developers building MCP-integrated AI tools

Connect this actor as an MCP server so Claude Desktop, Claude.ai Projects, Cursor, or any MCP-compatible client can fetch and process YouTube transcripts in a single tool call. No copy-pasting, no API wiring — just hand the model a URL.


Integration examples

Python — Apify client

Get your API token from the Apify Console under Settings → Integrations. Keep it secret — treat it like a password.

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("codepoetry/youtube-transcript-ai-scraper").call(
run_input={
"startUrls": [{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}],
"languages": ["en"],
"outputFormats": ["json", "llm"],
}
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["metadata"]["title"])
print(item["transcript_text"][:200])

JavaScript / Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('codepoetry/youtube-transcript-ai-scraper').call({
startUrls: [{ url: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ' }],
languages: ['en'],
outputFormats: ['json', 'llm'],
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(item => console.log(item.metadata.title, item.transcript_text.slice(0, 200)));

LangChain / RAG pipeline

from apify_client import ApifyClient
from langchain.docstore.document import Document
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("codepoetry/youtube-transcript-ai-scraper").call(
run_input={
"startUrls": [{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}],
"outputFormats": ["llm"],
"maxAiMinutes": 60,
}
)
docs = [
Document(
page_content=item["transcript_llm"],
metadata={"source": item["metadata"]["url"], "title": item["metadata"]["title"]},
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items()
if "transcript_llm" in item
]
# docs is ready for any LangChain vector store or retriever

Run on a schedule or trigger a webhook

To run this actor on a schedule or receive a webhook notification when a run finishes, use the Schedules and Integrations tabs on the actor's page in the Apify Console. See the Apify scheduling docs and webhook docs for setup instructions.


Advanced options

All options can be set in the Input form or passed as JSON when calling via the API.

OptionUI labelDefaultWhen to use
maxResultsMax videos10Cap how many videos are fetched from a playlist or channel. Single video URLs ignore this.
languagesCaption languages["en"]Preferred caption languages in order of priority. First match on the video is used. Codes: en English, es Spanish, fr French, de German.
subTypeCaption source"both""manual" = human captions only · "auto" = auto-generated only · "both" = prefer manual, fall back to auto
outputFormatsOutput formatsjson, text, llmWhich transcript formats to write to the dataset.
wordLevelWord-level timestampsfalseAdd per-word timestamps to JSON segments. Not available for manual captions.
maxAiMinutesMax AI minutes30Hard cap on AI transcription minutes per run. Set to 0 for unlimited. Recommended when processing unknown playlists.
skipAiFallbackIfLongerThanSkip AI for videos longer than0 (off)Skip AI for videos exceeding N minutes. Avoids unexpected costs from long videos.
forceWhisperLanguageAI transcription languageauto-detectForce AI to a specific language (ISO code, e.g. "es"). Skips 30-second detection window, saves ~20% per video.

Proxy configuration

Proxy is always active and fully automatic — no configuration needed. Every request goes direct first, and the proxy is only used if YouTube challenges the request. This costs nothing on a clean run.

How it works

Occasionally YouTube asks automated requests to verify they are not bots. When this happens the actor automatically escalates through progressively stronger proxy tiers until the request succeeds:

  1. Direct request (no proxy) — used first for every video. Zero cost.
  2. Datacenter proxy — fast and free on most plans. Handles the vast majority of challenges.
  3. Residential proxy — highest trust with YouTube. Used only if the datacenter tier is also challenged.

The escalation is fully automatic — you do not need to configure anything. If all tiers are exhausted, the affected video is marked with a BOT_DETECTION error code and the actor continues with the remaining videos.

Proxy costs

TypeCostNotes
Datacenter (Apify)Free on most plansDefault first tier. Zero bandwidth consumed on clean runs.
Residential (Apify)~$0.40 / GBAuto-escalation tier. Only consumed if datacenter proxy is also challenged — rare.

Proxy costs are billed from your Apify account balance as a separate line item, alongside this actor's Pay-Per-Event charges. On a typical run with no bot challenges: $0 proxy cost. If datacenter retry is needed: approximately 0.5 MB per affected video. Residential is only consumed if datacenter also fails — this is rare and keeps costs minimal even in bulk runs.


Limitations

All non-recoverable failures produce a dataset item with an error_code field. See the Error codes table for the full reference.

Constraints:

  • No translation — the actor returns the original spoken language only.
  • YouTube may rate-limit very large batches (100+ videos). The automatic proxy escalation handles most cases transparently.
  • maxResults default of 10 is intentionally conservative — increase it for large playlists or full channel archives.

Memory

This actor runs with a fixed 4 GB allocation. No configuration needed — the same setting works for both native caption extraction (lightweight) and AI transcription (which loads the faster-whisper speech model into memory).


Frequently asked questions

How much does one video cost?

A single video with captions costs approximately $0.006 on the Free plan — $0.001 for the transcript plus a $0.005 one-time startup fee per run. A second video in the same run adds just $0.001. AI transcription adds $0.012 per minute of audio on the Free plan.

What happens if a video has no captions?

The actor automatically downloads the audio and transcribes it using the built-in AI model — the output has the same structure as a native caption result, with is_ai_generated: true. If the maxAiMinutes cap is reached, remaining caption-free videos receive an AI_MINUTES_LIMIT_REACHED error item and the run continues. The available_languages field lists the caption language codes YouTube does provide on the video.

Does it work for playlists and channels?

Yes. Paste a playlist or channel URL and the actor expands it into individual videos automatically. Use maxResults to cap how many are fetched. If one video is private, age-restricted, or unavailable, it gets an error item while the rest continue.

What languages are supported?

Native captions: Any language YouTube provides captions for — typically 100+ languages for auto-generated captions. Pass multiple language codes (e.g. ["en", "es"]) to fall back automatically when your first choice is unavailable.

AI transcription: 99 languages, including English, Spanish, French, German, Portuguese, Japanese, Chinese, Arabic, and Hindi.

What output formats are available?

  • JSON — timestamped segments as an array [{start, end, text}, ...]
  • Text — plain text joined from all segments
  • LLM — text with [Music], (laughter), and other filler tokens stripped, ready for AI pipelines
  • SRT — standard subtitle format for video players and editing software
  • VTT — WebVTT format for HTML5 <video> elements

Multiple formats can be requested in a single run.

Can I set a spending limit?

Yes. Max AI minutes per run caps total AI-transcribed minutes per run (default: 30). Skip AI for long videos skips videos exceeding a duration threshold automatically. Set the AI minutes cap to 0 for unlimited AI transcription.

How accurate is AI transcription?

Accurate for clear speech in widely spoken languages. Accuracy degrades with heavy accents, domain-specific jargon, or poor audio quality. The language_probability field indicates the model's confidence in the detected language. For quality-critical work, treat AI transcripts as a first draft and review them.

What is a YouTube transcript scraper?

A YouTube transcript scraper extracts the spoken text from YouTube videos. This actor retrieves captions when YouTube provides them, or generates a transcript from the audio when captions are unavailable.

Does this translate transcripts?

No. The actor returns the original spoken language. Use a separate translation service for translation.

Does it work for YouTube Shorts?

Yes. Shorts use the same caption infrastructure as regular videos.

Do I need a YouTube Data API key?

No. The actor accesses publicly available caption data without any YouTube API credentials.

How does this compare to the YouTube Data API?

The YouTube Data API v3 does not provide transcript data. It requires a Google Cloud project, OAuth credentials, and per-day quotas. This actor requires none of that.

How does this compare to the youtube-transcript-api Python library?

The youtube-transcript-api library is fine for a handful of videos in your own Python script. This actor adds cloud infrastructure, batch processing across playlists and channels, AI transcription for caption-free videos, multiple output formats, scheduling, and Apify platform integrations (webhooks, REST API, n8n, Make, Zapier).

What do the run log messages mean?

Open the Log tab on any completed run to see what the actor did. Here are the messages you may encounter:

MessageWhat it meansAction needed?
Processing video: https://...Normal progress — one line per videoNone
Expanding URL: https://...Resolving a playlist or channel to individual videosNone
Total unique videos to process: NHow many videos were found after deduplicationNone
Loading AI transcription model...AI model is loading — only happens once per runNone
AI transcription model ready.Model loaded, ready to transcribeNone
AI transcription language: set to 'en'Language was set by your forceWhisperLanguage inputNone
AI transcription language: auto-detectingLanguage will be detected from the audioNone
Downloading audio for AI transcription: https://...Audio is being downloaded for a caption-free video — normal progressNone
Running AI transcription...AI model is actively processing the audioNone
No subtitles found. Running AI fallback for ... (N min estimated)AI transcription is starting for a caption-free videoNone
AI transcription complete — language: en (confidence: 99%)AI transcription finished for one videoNone
YouTube access challenge for ... — retrying via proxy tier 1/2...YouTube challenged the request; escalating through proxy tiersNone — handled automatically
YouTube access challenge for ... — no proxy tiers availableSame challenge but no proxy could be createdCheck Apify proxy service status
Subtitle fetch failed for ... — retrying via proxy tier 1/2...Subtitle download failed; escalating through proxy tiersNone — handled automatically
Subtitle fetch failed for ... (lang): HTTP 429Subtitle download failed after all retries (rate-limited)Try again later or reduce batch size. The actor continues with other videos.
YouTube access challenge on audio download ... — retrying via proxy tier 1/2...Audio download challenged; escalating through proxy tiersNone — handled automatically
Audio download failed ... after exhausting all N proxy tiers.All proxy tiers failed for audio downloadTry again later. The video gets an AI_TRANSCRIPTION_FAILED item.
Skipping AI fallback for ...: needs N min but only Y remainThe maxAiMinutes cap was reached for this runRaise maxAiMinutes if you want to transcribe more
Apify spending limit reached. No further AI charges will be made.Your Apify account spending limit was hitCheck your Apify billing settings
Audio download failed for ...: <error>Could not download the audio for AI transcriptionCheck the error detail; that video gets an AI_TRANSCRIPTION_FAILED item in the dataset
AI fallback failed for ...: <error>AI transcription error for this videoCheck the error detail; the video gets an error item in the dataset
Unhandled error for ...: <error>Unexpected failure — the video gets an UNEXPECTED_ERROR itemOpen an issue if this happens repeatedly

Error items written to the dataset always have an error_code field — use that for programmatic filtering rather than parsing log text.

YouTube's Terms of Service prohibit automated scraping, and you are responsible for complying with their Terms and applicable law in your jurisdiction. This actor accesses only publicly available caption data — the same data visible when you click "Open transcript" in the YouTube player. It does not bypass any authentication, access private content, or collect personal user data. See Apify's web scraping legality guide for a broader overview.


Language Reference

YouTube caption languages (130+ codes)

Use these codes in the Caption languages (languages) input. Regional variants such as zh-TW and zh-CN are also accepted where YouTube differentiates them.

CodeLanguageCodeLanguageCodeLanguage
afAfrikaansakAkansqAlbanian
amAmharicarArabichyArmenian
asAssameseayAymaraazAzerbaijani
bnBanglaeuBasquebeBelarusian
bhoBhojpuribsBosnianbgBulgarian
myBurmesecaCatalancebCebuano
zhChinesezh-CNChinese (China)zh-HKChinese (Hong Kong)
zh-SGChinese (Singapore)zh-TWChinese (Taiwan)zh-HansChinese (Simplified)
zh-HantChinese (Traditional)coCorsicanhrCroatian
csCzechdaDanishdvDivehi
nlDutchenEnglishen-USEnglish (United States)
eoEsperantoetEstonianeeEwe
filFilipinofiFinnishfrFrench
glGalicianlgGandakaGeorgian
deGermanelGreekgnGuarani
guGujaratihtHaitian CreolehaHausa
hawHawaiianiwHebrewhiHindi
hmnHmonghuHungarianisIcelandic
igIgboidIndonesiangaIrish
itItalianjaJapanesejvJavanese
knKannadakkKazakhkmKhmer
rwKinyarwandakoKoreankriKrio
kuKurdishkyKyrgyzloLao
laLatinlvLatvianlnLingala
ltLithuanianlbLuxembourgishmkMacedonian
mgMalagasymsMalaymlMalayalam
mtMaltesemiMāorimrMarathi
mnMongolianneNepalinsoNorthern Sotho
noNorwegiannyNyanjaorOdia
omOromopsPashtofaPersian
plPolishptPortuguesepaPunjabi
quQuechuaroRomanianruRussian
smSamoansaSanskritgdScottish Gaelic
srSerbiansnShonasdSindhi
siSinhalaskSlovakslSlovenian
soSomalistSouthern SothoesSpanish
suSundaneseswSwahilisvSwedish
tgTajiktaTamilttTatar
teTeluguthThaitiTigrinya
tsTsongatrTurkishtkTurkmen
ukUkrainianurUrduugUyghur
uzUzbekviVietnamesecyWelsh
fyWestern FrisianxhXhosayiYiddish
yoYorubazuZulu

Not all codes will have captions on every video. When a requested code is not available, the actor returns a LANGUAGE_NOT_FOUND or NO_CAPTIONS_AVAILABLE error item with an available_languages field listing the codes that are actually present on that video.

AI transcription languages (99 codes)

Use one of these codes in the AI transcription language (forceWhisperLanguage) input to skip auto-detection. If a language is not in this list, the AI model cannot transcribe it — use auto-detect instead.

CodeLanguageCodeLanguageCodeLanguage
afAfrikaansamAmharicarArabic
asAssameseazAzerbaijanibaBashkir
beBelarusianbgBulgarianbnBengali
boTibetanbrBretonbsBosnian
caCatalancsCzechcyWelsh
daDanishdeGermanelGreek
enEnglishesSpanishetEstonian
euBasquefaPersianfiFinnish
foFaroesefrFrenchglGalician
guGujaratihaHausahawHawaiian
heHebrewhiHindihrCroatian
htHaitian CreolehuHungarianhyArmenian
idIndonesianisIcelandicitItalian
jaJapanesejwJavanesekaGeorgian
kkKazakhkmKhmerknKannada
koKoreanlaLatinlbLuxembourgish
lnLingalaloLaoltLithuanian
lvLatvianmgMalagasymiMāori
mkMacedonianmlMalayalammnMongolian
mrMarathimsMalaymtMaltese
myBurmeseneNepalinlDutch
nnNynorsknoNorwegianocOccitan
paPunjabiplPolishpsPashto
ptPortugueseroRomanianruRussian
saSanskritsdSindhisiSinhala
skSlovakslSloveniansnShona
soSomalisqAlbaniansrSerbian
suSundanesesvSwedishswSwahili
taTamilteTelugutgTajik
thThaitlFilipinotrTurkish
ttTatarukUkrainianurUrdu
uzUzbekviVietnameseyiYiddish
yoYorubayueCantonesezhChinese

The AI model supports 99 languages. Accuracy varies by language — it is highest for widely spoken languages (English, Spanish, French, German, etc.) and may degrade for low-resource languages. The language_probability field in the output indicates the model's confidence in the detected or forced language.


About this actor

This actor runs on the Apify platform. AI transcription uses faster-whisper (MIT license), bundled into the Docker image so there is no model download delay on first run.

Found a bug or have a feature request? Use the Issues tab on this actor's page.