YouTube Transcript Scraper + Whisper AI Fallback avatar

YouTube Transcript Scraper + Whisper AI Fallback

Pricing

from $0.70 / 1,000 transcript extracteds

Go to Apify Store
YouTube Transcript Scraper + Whisper AI Fallback

YouTube Transcript Scraper + Whisper AI Fallback

Extract YouTube transcripts from any video — even without captions. Whisper AI fallback, LLM-ready output, SRT/VTT export. No API key. $0.001/video.

Pricing

from $0.70 / 1,000 transcript extracteds

Rating

0.0

(0)

Developer

CodePoetry

CodePoetry

Maintained by Community

Actor stats

0

Bookmarked

60

Total users

29

Monthly active users

2.9 hours

Issues response

3 days ago

Last modified

Share

YouTube Transcript Scraper — Captions + AI Speech-to-Text

Extract transcripts from any YouTube video — even when captions don't exist.

Most transcript tools stop at videos with no captions. This one transcribes the audio instead — no external API key required.

Give it a single video, a full playlist, or an entire channel. Get transcripts in JSON, plain text, SRT, VTT, or an LLM-ready format — ready to download or feed into a pipeline. Processes up to 5 videos simultaneously. You pay per transcript, not per minute of server time.

Works as an MCP tool with Claude Desktop, Cursor, and any MCP-compatible client.


Quick start

  1. Click Try for free on this actor's page.

  2. Paste one or more YouTube URLs into the YouTube URLs field. Supported formats:

    • Single video: youtube.com/watch?v=...
    • Playlist: youtube.com/playlist?list=...
    • Channel: youtube.com/@channelname

    For playlists and channels the default is 10 videos — increase Max videos before starting if you want more.

  3. Choose your Output Formats. Not sure? Start with Plain Text — it's the words as one block of text. Set Caption Languages if you need something other than English (default: en).

  4. AI transcription is off by default — the lowest-cost setting. Videos without native captions are skipped unless you turn on Enable AI transcription in the input form. When on, memory adjusts to 4 GB automatically and AI processes at roughly 8× real-time (a 30-minute podcast takes about 4 minutes).

    If you turned AI on: set a Max AI Minutes cap before running an unknown playlist or channel (default: 30 minutes). Native captions are checked first — AI only runs on videos that have none. Raise the cap or set it to 0 (unlimited) after estimating cost in the Pricing section.

  5. Click Start. A single video with captions finishes in under 30 seconds. When the run completes, download results from the Dataset tab as JSON, CSV, or Excel — or access them via the API.

AI transcription is off by default — the lowest-cost setting. See the Pricing page for the full rate card.


How it works

Step 1: Expand Paste one or more URLs — single videos, playlists, or channel URLs. The actor resolves them into individual video URLs automatically.

Step 2: Extract For each video, the actor checks for native captions (manual or auto-generated) in your requested languages. If captions exist, they are fetched and formatted immediately — no audio download needed.

Step 3: Transcribe (when needed) If no captions are found and AI transcription is enabled, the actor downloads the audio and transcribes it using a bundled faster-whisper model running on Apify's compute — no external API needed. The output has the same structure as native caption output. Use Max AI minutes per run and Skip AI for long videos to control AI spend.

Failed videos are logged and skipped; the rest of the batch continues. Every item in the output dataset has an error_code field so you can filter results programmatically.


What you get

Each output item includes video metadata and your transcript in the formats you requested.

Video metadata

FieldTypeDescription
metadata.idstringYouTube video ID
metadata.titlestringVideo title
metadata.urlstringCanonical watch URL
metadata.channelstringChannel display name
metadata.channel_idstringChannel ID (UC-prefixed)
metadata.channel_urlstringChannel URL
metadata.descriptionstringFull video description
metadata.durationintegerDuration in seconds
metadata.view_countintegerTotal views
metadata.like_countintegerTotal likes
metadata.upload_datestringUpload date (YYYYMMDD)
metadata.thumbnailstringHighest-resolution thumbnail URL
metadata.tagsarrayCreator-set tags
metadata.categoriesarrayYouTube categories

Transcript fields

FieldTypeDescription
languagestringLanguage code of the transcript (e.g. en, zh-TW)
is_auto_generatedbooleantrue if YouTube auto-generated the captions
is_ai_generatedbooleantrue if transcribed by the built-in AI model
transcript_jsonarrayTimestamped segments [{start, end, text}]. When wordLevel: true, each segment also has a words array [{start, end, text}] — end is estimated for native captions, exact for AI transcriptions.
transcript_textstringPlain text transcript
transcript_llmstringText with [Music], (laughter), and filler tokens stripped — ready for AI pipelines
transcript_srtstringSRT subtitle format. Present when srt is in outputFormats.
transcript_vttstringWebVTT format
language_probabilitynumberAI model's confidence in the detected language (0–1). AI transcription only.
language_was_forcedbooleantrue when forceWhisperLanguage was set. AI transcription only.
ai_duration_charged_minintegerMinutes of AI time charged for this video. AI transcription only.
ai_speech_duration_secnumberActual speech duration detected by the model in seconds (informational). AI transcription only.
available_languagesarrayCaption language codes YouTube provides on this video. Only present on NO_CAPTIONS_AVAILABLE and LANGUAGE_NOT_FOUND error items — use them to refine your languages input.
error_codestringStructured error code when extraction failed. See Error codes for the full reference table.

Use cases

1. Claude Desktop / Claude.ai MCP integration

Connect this actor as an MCP server so Claude Desktop, Claude.ai Projects, Cursor, or any other MCP-compatible AI client can fetch a transcript just by being handed a YouTube URL. Ask Claude to "summarise this video" or "extract the key points from this lecture" — no copy-pasting required.

Recommended settings: outputFormats: ["llm"]

2. YouTube Shorts — per-word karaoke captions

YouTube Shorts often have auto-generated captions. Enable wordLevel: true to get per-word start times from the transcript_json field. Feed the result into a caption editor (CapCut, DaVinci Resolve, Adobe Premiere) to produce word-by-word highlighted captions — the "karaoke" style popular on short-form video.

Recommended settings: wordLevel: true, outputFormats: ["json", "srt"], subType: "auto"

3. Build a searchable knowledge base (RAG)

Bulk-extract every video from a company channel, educational YouTube account, or podcast series. Store the transcript_llm text in a vector database (Pinecone, Weaviate, pgvector) indexed by metadata.id and metadata.title. Use it as a retrieval-augmented generation (RAG) corpus so your chatbot can answer questions grounded in the exact video content.

Recommended settings: outputFormats: ["llm"], maxResults: 500, enableAiFallback: true. Enable AI transcription so caption-free videos in the channel are transcribed automatically.

4. NLP / sentiment analysis pipeline

Extract transcripts from a brand's channel, a competitor's channel, or a set of product-review videos. Pipe transcript_text into an NLP pipeline (spaCy, HuggingFace Transformers, OpenAI) for sentiment scoring, named entity extraction, topic modeling, or keyword frequency. Useful for brand monitoring and competitive intelligence.

Recommended settings: outputFormats: ["text", "llm"], subType: "both"

5. LLM training data collection

Curate domain-specific transcripts from niche YouTube channels (medical lectures, legal explainers, coding tutorials, scientific talks) to build fine-tuning datasets. The transcript_llm format strips filler tokens cleanly. Use metadata.tags and metadata.categories to filter and label the data.

Recommended settings: outputFormats: ["llm"], maxAiMinutes cap per run to control cost.

6. SEO content repurposing

Turn a library of tutorials or vlogs into written content. Pass the transcript_llm field to an LLM prompt asking it to rewrite the transcript as a blog post, Twitter/X thread, newsletter section, or LinkedIn article. Combine with metadata.title, metadata.tags, and metadata.description for context.

Recommended settings: outputFormats: ["llm"], languages: ["en"]

7. Podcast / lecture transcription (no captions)

Podcasters who upload to YouTube and educators who post lecture recordings rarely add manual captions. Enable AI transcription and the actor transcribes them with faster-whisper. Use forceWhisperLanguage if you know the channel's language to skip the auto-detection window and reduce cost.

Recommended settings: enableAiFallback: true, forceWhisperLanguage: "en", skipAiFallbackIfLongerThan: 120 to skip anything over 2 hours.

8. Accessibility and caption quality audit

Compare YouTube's auto-generated captions (subType: "auto", is_auto_generated: true) against an AI transcription of the same video. Differences surface errors in the auto-generated track. Useful for accessibility compliance reviews or for creators who want to improve their caption quality before publishing.

Recommended settings: Two runs — one with subType: "auto" only, one with subType: "manual" and enableAiFallback: true to force AI fallback (since no manual captions exist, the actor falls back to AI transcription).

9. Academic research and citation analysis

Download a researcher's full lecture series, a conference talk archive, or all videos from an academic YouTube channel. Index the transcripts by speaker, date (metadata.upload_date), and topic. Use to find when specific terminology first appeared, how arguments evolved over time, or to build a citation graph for a literature review.

Recommended settings: outputFormats: ["json", "text"], maxResults: 1000, languages set to the channel's primary language.

10. Competitive intelligence monitoring (scheduled runs)

Schedule the actor to run weekly on a competitor's channel URL. Set maxResults: 5 to pull only the latest videos. Use an Apify webhook to POST the new transcripts to Slack, a CRM, or an internal dashboard. Automatically surface every new product announcement, feature mention, or pricing discussion your competitor publishes.

Recommended settings: maxResults: 5, outputFormats: ["llm"], paired with an Apify schedule and webhook.


Output examples

Native caption output

{
"metadata": {
"id": "dQw4w9WgXcQ",
"title": "Rick Astley - Never Gonna Give You Up",
"channel": "Rick Astley",
"duration": 213,
"view_count": 1757728410,
"upload_date": "20091025"
},
"language": "en",
"is_auto_generated": false,
"is_ai_generated": false,
"transcript_json": [
{ "start": 18.5, "end": 21.0, "text": "We're no strangers to love" },
{ "start": 21.0, "end": 24.5, "text": "You know the rules and so do I" },
{ "start": 24.5, "end": 28.0, "text": "A full commitment's what I'm thinking of" },
{ "start": 28.0, "end": 31.5, "text": "You wouldn't get this from any other guy" }
],
"transcript_text": "We're no strangers to love You know the rules and so do I A full commitment's what I'm thinking of You wouldn't get this from any other guy ...",
"transcript_llm": "We're no strangers to love You know the rules and so do I A full commitment's what I'm thinking of You wouldn't get this from any other guy ...",
"transcript_srt": "1\n00:00:18,500 --> 00:00:21,000\nWe're no strangers to love\n\n2\n00:00:21,000 --> 00:00:24,500\nYou know the rules and so do I\n"
}

AI transcription output

When a video has no captions and enableAiFallback is on, AI transcription runs:

{
"metadata": {
"title": "Deep Work Podcast - Episode 12",
"channel": "Cal Newport",
"duration": 1847,
"upload_date": "20240315"
},
"is_ai_generated": true,
"language": "en",
"language_probability": 0.9991,
"language_was_forced": false,
"ai_duration_charged_min": 31,
"transcript_json": [
{ "start": 0.0, "end": 4.1, "text": "Welcome back to Deep Work. I'm Cal Newport." },
{ "start": 4.1, "end": 9.3, "text": "Today we're talking about why single-tasking is a competitive advantage in 2024." },
{ "start": 9.3, "end": 14.8, "text": "The research here is pretty clear, and I think most people are leaving a lot on the table." }
],
"transcript_text": "Welcome back to Deep Work. I'm Cal Newport. Today we're talking about why single-tasking is a competitive advantage in 2024. ...",
"transcript_srt": "1\n00:00:00,000 --> 00:00:04,100\nWelcome back to Deep Work. I'm Cal Newport.\n\n2\n00:00:04,100 --> 00:00:09,300\nToday we're talking about why single-tasking\nis a competitive advantage in 2024.\n"
}

Error item

{
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"metadata": {
"title": "Rick Astley - Never Gonna Give You Up",
"duration": 213
},
"error": "No subtitles found in requested languages.",
"error_code": "LANGUAGE_NOT_FOUND",
"available_languages": ["en", "es", "fr", "de", "pt", "ja"]
}

Error codes:

Error items are never billed — you are only charged for successful transcripts. The table below groups errors by cause so you know whether the issue is in your input or something outside your control.

Input errors — caused by the URLs or settings you provided:

Error codeMeaningWhat to do
AGE_RESTRICTEDYouTube requires sign-in / age verification to access this video.Remove the URL — cannot be bypassed.
PRIVATE_OR_UNAVAILABLEThe video is private, deleted, or blocked in the runner's region.Remove the URL or check if the video is public.
LIVE_VIDEOLive streams have no static captions to extract.Wait until the stream ends, then retry.
LANGUAGE_NOT_FOUNDCaptions exist but not in the requested language. available_languages shows what's available.Change your languages input.

Budget / limit errors — the video could be transcribed, but a budget gate prevented it:

Error codeMeaningWhat to do
NO_CAPTIONS_AVAILABLEThe video has no captions and AI transcription is turned off.Turn on Enable AI transcription in the input form and re-run, or set enableAiFallback: true via API.
AI_MINUTES_LIMIT_REACHEDThe maxAiMinutes budget for this run is exhausted.Increase maxAiMinutes and retry.
AI_FALLBACK_SKIPPED_TOO_LONGThe video exceeds the skipAiFallbackIfLongerThan duration limit.Increase or remove the limit.
SPENDING_LIMIT_REACHEDThe Apify account spending limit was hit — no further AI charges possible.Adjust your Apify billing settings.

Infrastructure / actor errors — not caused by your input; no charge is made:

Error codeMeaningWhat to do
BOT_DETECTIONYouTube challenged the request. The actor retried through proxy tiers automatically.Usually self-resolving. Switch proxy group if persistent.
EXTRACTION_ERRORGeneric yt-dlp failure — the video may be temporarily unavailable on YouTube's side.Retry later.
AI_TRANSCRIPTION_FAILEDThe Whisper model or audio download failed for this video.Check run logs; retry.
UNEXPECTED_ERRORAn unhandled exception in the actor code. The video gets an error item; other videos continue.Open an issue if persistent.

Pricing

Pay-per-result — you are charged per transcript extracted, not per minute of server time. Each run has a one-time startup fee, then a flat per-transcript charge. AI transcription adds a per-minute charge on top, and only runs on videos that have no native captions.

For the full rate card by subscription plan, see the Pricing page.

Proxy costs are separate from transcript charges. The default datacenter proxy costs nothing on clean runs — it is only used as a fallback when YouTube challenges a request. If the datacenter tier is also challenged, the actor auto-escalates to residential (~$0.40/GB), though this is rare. See Proxy configuration for details.

Built for bulk runs

A few things keep costs low at scale:

  • One small startup fee, then pay per transcript. Each run has a fixed startup fee ($0.0025 with AI off, $0.010 with AI on). After that, you pay $0.001 per transcript — the same whether the run has 10 videos or 1,000.
  • No proxy cost on clean runs. Every request goes direct first. The proxy is only used as a silent fallback if YouTube challenges a specific request — and that happens rarely. Most runs pay $0 in proxy fees.
  • AI model only initialized when needed. Within an AI-enabled run, the transcription model is loaded into memory only when the first caption-free video is encountered. If every video in your batch has native captions, the model never occupies RAM.
  • Concurrent processing. Up to 5 videos are processed in parallel, reducing total run time for large playlists or channels.
  • Built-in spend controls. Max AI minutes per run and Skip AI for long videos let you set hard caps on AI spend before a run starts.
  • Failed videos never stall the run. Each failed video gets a dataset error item; the rest of the batch continues without interruption.

How it compares

FeatureThis actorTypical alternatives
Transcribes videos with no captionsYes — built-in AI, no external API keyNo — returns an error
LLM-optimized output (filler stripped)Yes — transcript_llm fieldNo
Spend safeguards (AI minute cap, skip long videos)YesNo
Native transcript price$0.001 per transcriptUp to $0.005 — 5× more
No monthly subscriptionYes — pay only for what you runFlat monthly fee
Batch: playlists and channelsYesMost
Output formatsJSON, Text, SRT, VTT, LLMUsually JSON only
Word-level timestampsYesRare
YouTube Data API key requiredNoNo
Automatic access challenge bypassYes — retries via proxy when needed, direct otherwiseVaries
MCP-compatible (Claude Desktop, Cursor, etc.)Yes — via Apify MCP integrationRare

Try it free → first result in under 30 seconds.


Who uses it

AI and LLM developers

Every output item includes a ready-to-use transcript_llm field — filler tokens stripped, clean text, no post-processing. Batch a whole channel overnight via the API and wake up to a dataset ready for your retrieval pipeline or fine-tuning job.

Content creators and marketers

Turn any YouTube video into a blog post or newsletter draft without manual transcription. Extract pull quotes from interviews. Run an entire channel archive in one batch.

SEO professionals and researchers

Extract keyword data from video transcripts at scale. Build text content from video transcripts for search. Analyse a competitor's spoken messaging for topic and positioning gaps.

Data scientists and academics

Build NLP corpora from lectures, conference talks, and documentary interviews. Process multilingual transcripts for cross-language analysis. Run large dataset collection jobs overnight via the API.

Developers building MCP-integrated AI tools

Connect this actor as an MCP server so Claude Desktop, Claude.ai Projects, Cursor, or any MCP-compatible client can fetch and process YouTube transcripts in a single tool call. No copy-pasting, no API wiring — just hand the model a URL.


Integration examples

Python — Apify client

Get your API token from the Apify Console under Settings → Integrations. Keep it secret — treat it like a password.

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("codepoetry/youtube-transcript-ai-scraper").call(
run_input={
"startUrls": [{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}],
"languages": ["en"],
"outputFormats": ["json", "llm"],
}
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["metadata"]["title"])
print(item["transcript_text"][:200])

JavaScript / Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('codepoetry/youtube-transcript-ai-scraper').call({
startUrls: [{ url: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ' }],
languages: ['en'],
outputFormats: ['json', 'llm'],
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(item => console.log(item.metadata.title, item.transcript_text.slice(0, 200)));

LangChain / RAG pipeline

from apify_client import ApifyClient
from langchain.docstore.document import Document
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("codepoetry/youtube-transcript-ai-scraper").call(
run_input={
"startUrls": [{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}],
"outputFormats": ["llm"],
"maxAiMinutes": 60,
}
)
docs = [
Document(
page_content=item["transcript_llm"],
metadata={"source": item["metadata"]["url"], "title": item["metadata"]["title"]},
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items()
if "transcript_llm" in item
]
# docs is ready for any LangChain vector store or retriever

Run on a schedule or trigger a webhook

To run this actor on a schedule or receive a webhook notification when a run finishes, use the Schedules and Integrations tabs on the actor's page in the Apify Console. See the Apify scheduling docs and webhook docs for setup instructions.


Advanced options

All options can be set in the Input form or passed as JSON when calling via the API.

OptionUI labelDefaultWhen to use
enableAiFallbackEnable AI transcriptionfalseWhen off (default), videos with no native captions are skipped. Enable to fall back to AI transcription for those videos — memory raises to 4 GB automatically.
maxResultsMax videos per playlist/channel10Cap how many videos are fetched from a playlist or channel. Single video URLs ignore this.
languagesCaption languages["en"]Preferred caption languages in order of priority. First match on the video is used. Pick from the dropdown (130+ languages) or pass any ISO 639-1 code via the API.
subTypeCaption source"both""manual" = human captions only · "auto" = auto-generated only · "both" = prefer manual, fall back to auto
outputFormatsOutput formatsjson, text, llmWhich transcript formats to write to the dataset.
wordLevelWord-level timestampsfalseAdd per-word timestamps to JSON segments. Safe to enable for any video — has no effect on manual captions.
maxAiMinutesMax AI minutes30Hard cap on AI transcription minutes per run. Set to 0 for unlimited. Recommended when processing unknown playlists.
skipAiFallbackIfLongerThanSkip AI for long videos (minutes)0 (off)Skip AI for videos exceeding N minutes. Avoids unexpected costs from long videos.
forceWhisperLanguageAI transcription languageauto-detectForce AI to a specific language. Pick from the dropdown (99 supported languages) or pass any code via the API. Skips 30-second detection window, saves ~20% per video.

Proxy configuration

Proxy is always active and fully automatic — no configuration needed. Every request goes direct first, and the proxy is only used if YouTube challenges the request. This costs nothing on a clean run.

How it works

Occasionally YouTube challenges automated requests. When this happens the actor automatically escalates through progressively stronger proxy tiers until the request succeeds:

  1. Direct request (no proxy) — used first for every video. Zero cost.
  2. Datacenter proxy — fast and free on most plans. Handles the vast majority of challenges.
  3. Residential proxy — highest trust with YouTube. Used only if the datacenter tier is also challenged.

The escalation is fully automatic — you do not need to configure anything. If all tiers are exhausted, the affected video is marked with a BOT_DETECTION error code and the actor continues with the remaining videos.

Proxy costs

TypeCostNotes
Datacenter (Apify)Free on most plansDefault first tier. Zero bandwidth consumed on clean runs.
Residential (Apify)~$0.40 / GBAuto-escalation tier. Only consumed if datacenter proxy is also challenged — rare.

Proxy costs are billed from your Apify account balance as a separate line item, alongside this actor's Pay-Per-Event charges. On a typical run with no bot challenges: $0 proxy cost. If datacenter retry is needed: approximately 0.5 MB per affected video. Residential is only consumed if datacenter also fails — this is rare and keeps costs minimal even in bulk runs.


Limitations

All non-recoverable failures produce a dataset item with an error_code field. See the Error codes table for the full reference.

Constraints:

  • No translation — the actor returns the original spoken language only.
  • YouTube may rate-limit very large batches (100+ videos). The automatic proxy escalation handles most cases transparently.
  • maxResults default of 10 is intentionally conservative — increase it for large playlists or full channel archives.

Memory

Memory is set automatically based on whether Enable AI transcription is on or off — no manual configuration needed for most runs.

  • AI off (default): 1 GB allocated automatically. Native caption runs use under 400 MB — the lowest-cost configuration.
  • AI on: 4 GB allocated automatically. Required for the Whisper model (~1.5 GB) plus concurrent AI jobs.

If you manually override memory below 2 GB while AI is on, the run fails immediately with a clear message explaining what to change.

AI transcription speed: processes at roughly 8× real-time. A 30-minute podcast takes about 4 minutes; a 2-hour lecture takes about 15 minutes.

How to override memory manually (API users)

If you need to override the automatic setting — for example, forcing 1 GB on an AI-enabled run where you are certain all videos have captions:

  1. Open the Actor in the Apify Console.
  2. Click the Input tab, then click the ⚙ Settings button (top right of the input form).
  3. Find the Memory field and enter your preferred value (e.g. 1024 for 1 GB).
  4. Click Save — the setting is saved with your input and used on every future run.

If you run via API, pass "memoryMbytes": 1024 in the run options alongside your runInput. Setting below 2048 while enableAiFallback: true causes an immediate run failure.


Frequently asked questions

How much does one video cost?

A single video with captions costs approximately $0.0035 on the Free plan with AI off (1 GB memory) — $0.001 for the transcript plus a $0.0025 startup fee per run. With AI on (4 GB), the startup fee is $0.010, making the first video $0.011. A second video in the same run adds just $0.001 regardless of memory. AI transcription adds $0.012 per minute of audio on the Free plan.

What happens if a video has no captions?

By default the video is skipped with a NO_CAPTIONS_AVAILABLE error. Enable AI transcription (enableAiFallback: true) and the actor downloads the audio and transcribes it — output has the same structure as native captions, with is_ai_generated: true. If the maxAiMinutes cap is reached, remaining caption-free videos receive an AI_MINUTES_LIMIT_REACHED error and the run continues. The available_languages field lists caption codes YouTube does provide on that video.

Does it work for playlists and channels?

Yes. Paste a playlist or channel URL and the actor expands it into individual videos automatically. Use maxResults to cap how many are fetched. If one video is private, age-restricted, or unavailable, it gets an error item while the rest continue.

What languages are supported?

Native captions: Any language YouTube provides captions for — typically 100+ languages for auto-generated captions. Pass multiple language codes (e.g. ["en", "es"]) to fall back automatically when your first choice is unavailable.

AI transcription: 99 languages, including English, Spanish, French, German, Portuguese, Japanese, Chinese, Arabic, and Hindi.

What output formats are available?

  • JSON — timestamped segments as an array [{start, end, text}, ...]
  • Text — plain text joined from all segments
  • LLM — text with [Music], (laughter), and other filler tokens stripped, ready for AI pipelines
  • SRT — standard subtitle format for video players and editing software
  • VTT — WebVTT format for HTML5 <video> elements

Multiple formats can be requested in a single run.

Can I set a spending limit?

Yes. Max AI minutes per run caps total AI-transcribed minutes per run (default: 30). Skip AI for long videos skips videos exceeding a duration threshold automatically. Set the AI minutes cap to 0 for unlimited AI transcription.

How accurate is AI transcription?

Accurate for clear speech in widely spoken languages. Accuracy degrades with heavy accents, domain-specific jargon, or poor audio quality. The language_probability field indicates the model's confidence in the detected language. For quality-critical work, treat AI transcripts as a first draft and review them.

What is a YouTube transcript scraper?

A YouTube transcript scraper extracts the spoken text from YouTube videos — converting a YouTube video to text without any manual work. This actor retrieves captions when YouTube provides them, or generates a transcript from the audio when captions are unavailable.

Can I use this to convert a YouTube video to text?

Yes — that is exactly what it does. Paste the video URL, click Start, and the actor returns the spoken words as plain text (or JSON, SRT, VTT, or LLM-ready format). For videos with no captions, enable AI transcription to generate the text from the audio.

Does this translate transcripts?

No. The actor returns the original spoken language. Use a separate translation service for translation.

Does it work for YouTube Shorts?

Yes. Shorts use the same caption infrastructure as regular videos.

Do I need a YouTube Data API key?

No. The actor accesses publicly available caption data without any YouTube API credentials.

How does this compare to the YouTube Data API?

The YouTube Data API v3 does not provide transcript data. It requires a Google Cloud project, OAuth credentials, and per-day quotas. This actor requires none of that.

How does this compare to the youtube-transcript-api Python library?

The youtube-transcript-api library is fine for a handful of videos in your own Python script. This actor adds cloud infrastructure, batch processing across playlists and channels, AI transcription for caption-free videos, multiple output formats, scheduling, and Apify platform integrations (webhooks, REST API, n8n, Make, Zapier).

What do the run log messages mean?

Open the Log tab on any completed run to see what the actor did. Here are the messages you may encounter:

Normal progress — no action needed:

MessageWhat it means
Starting — memory: 4 GB · AI transcription: enabled · max 30 min/run · languages: enRun configuration summary at startup
Found N videos to process.How many videos were found and will be processed
Fetching video list from: https://...Expanding a playlist or channel into individual video URLs
Processing: https://...Processing started for one video
✓ Saved: "Video Title" (en, auto-generated captions)Transcript saved successfully — native captions found
Loading AI transcription model...AI model is loading; this happens once per run when the first caption-free video is encountered
AI transcription model ready.Model loaded; subsequent AI jobs reuse it with no delay
AI transcription: up to N job(s) at a time (N GB memory)How many AI jobs run in parallel — 1 at <4 GB, 2 at 4 GB
AI transcription language: forced to 'en'Language set by your forceWhisperLanguage input
AI transcription language: auto-detecting from audioLanguage will be detected from the first 30 seconds of audio
No captions found — starting AI transcription for "..."No native captions found; AI transcription is starting for this video
Downloading audio: https://...Downloading audio for a caption-free video
Transcribing audio (~N min at 8x real-time)...AI model is processing the audio
✓ AI transcription complete — language: en (confidence: 99%)AI transcription saved successfully
✓ Done: N/M transcripts saved · X AI-min usedRun completed — see Dataset tab for results

Handled automatically — no action needed:

MessageWhat it means
YouTube rate-limited this request — switching to proxy (attempt N/M)...YouTube blocked a request; actor is retrying via proxy automatically
Caption download blocked for '...' — retrying via proxy (attempt N/M)...Caption download was blocked; retrying via proxy automatically
YouTube rate-limited the audio download — retrying via proxy (attempt N/M)...Audio download was blocked; retrying via proxy automatically
No videos found at ... (possible rate-limit) — retrying via proxy...Playlist/channel expansion was blocked; retrying via proxy

Warnings — action may be needed:

MessageWhat it meansWhat to do
No videos found. Check your input URLs and try again.No videos were found from any of the provided URLsCheck that playlists and channels are public; verify the URLs are correct
Skipping AI for "Title": needs N min but only Y min of AI budget remain...The maxAiMinutes cap was nearly exhaustedIncrease Max AI Minutes in actor settings and re-run
YouTube blocked access to ... and no proxy is configured.YouTube blocked the request and proxy could not be set upCheck Apify proxy service status; retry later

Errors — check the dataset error_code field for the affected video:

MessageWhat it meansWhat to do
Audio download failed for ... after trying N proxy tier(s)...All retries exhausted for audio downloadTry re-running later; reduce batch size if persistent
AI transcription failed for "Title": <error>Whisper model error for this videoCheck error detail; try re-running the affected video
Apify spending limit reached — no further charges possible...Account spending limit was hit mid-runGo to Apify Console → Settings → Billing to adjust your limit
Unexpected error processing ...: <error>Unhandled exception — video gets UNEXPECTED_ERROR in datasetOpen an issue on the actor's Issues tab if this recurs

Every error item in the dataset has an error_code field — filter by that rather than parsing log text. See Error codes for the full reference table.

YouTube's Terms of Service prohibit automated scraping, and you are responsible for complying with their Terms and applicable law in your jurisdiction. This actor accesses only publicly available caption data — the same data visible when you click "Open transcript" in the YouTube player. It does not bypass any authentication, access private content, or collect personal user data. See Apify's web scraping legality guide for a broader overview.


Language reference

Supported languages

Native captions: Any language YouTube provides — 130+ codes. Use the Caption languages dropdown to pick from the full list, or pass any ISO 639-1 code directly via the API.

AI transcription: 99 languages. Use the AI transcription language dropdown to pick from all supported codes, or see the full list in the faster-whisper tokenizer. Leave blank to auto-detect.

When a requested language is not available on a video, the actor returns a LANGUAGE_NOT_FOUND or NO_CAPTIONS_AVAILABLE error item with an available_languages field listing the codes that are actually present.


Get started

Click Try for free at the top of this page. The Rick Astley demo URL is pre-filled — run it in one click to see the full output structure.

Questions or bugs? Use the Issues tab on this actor's page — response time is typically within 24 hours.


About this actor

This actor runs on the Apify platform. AI transcription uses faster-whisper (MIT license), bundled into the Docker image so there is no model download delay on first run.