YouTube Transcriber avatar

YouTube Transcriber

Pricing

from $5.00 / 1,000 caption transcripts

Go to Apify Store
YouTube Transcriber

YouTube Transcriber

Transcribe YouTube videos. Captions when available, OpenAI Whisper fallback (BYOK) for the rest. No YouTube account needed.

Pricing

from $5.00 / 1,000 caption transcripts

Rating

0.0

(0)

Developer

Arnas

Arnas

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

15 hours ago

Last modified

Share

Transcribe YouTube videos via captions when available, OpenAI Whisper API as fallback. You bring your own OpenAI key, you pay OpenAI directly for Whisper compute, the actor only charges for the transcript event itself. Single video URLs only in v1.

What does YouTube Transcriber do?

YouTube Transcriber extracts the spoken content of a YouTube video as text. When the video has captions in your requested language, it grabs them directly (cheap and fast). When it doesn't, it downloads the smallest available audio format and sends it to OpenAI's Whisper API for transcription using your own OpenAI key. Output is plain text or structured JSON with timestamps. No YouTube account needed.

Built on yt-dlp (the most reliable YouTube extraction tool in 2026) plus ffmpeg, with the actor wrapping subprocess calls in a strict SSRF / shell-injection defense.

Why use YouTube Transcriber?

  • Cheapest captions price on Apify Store — $0.0005 per transcript on the captions path, matching the captions-only price leader
  • BYOK Whisper at zero markup — when Whisper fallback fires you pay OpenAI directly (~$0.006/min). The actor charges $0.05 for the path (vs codepoetry's bundled $0.012/min × N min, ~5-6× cheaper for typical video lengths)
  • Predictable cost ceilingmaxWhisperMinutesPerRun caps your OpenAI bill per run
  • Audio always fits Whisper's limit — yt-dlp + ffmpeg picks smallest-format audio under 24 MB; configurable maxDurationMinutes (default 18)
  • In-product visibility — every video produces a record (success or skip with reason), so you can see why something was skipped without scrolling logs
  • No silent leaks — your OpenAI key is isSecret: true in the input form, never logged, sanitized from any error message before output

Who is this for?

  • Researchers — pull transcripts of academic talks, interviews, podcasts at scale
  • AI/ML engineers — feed real human speech into pipelines, fine-tune models on real conversations
  • Journalists — transcribe source video evidence quickly
  • Content marketers — repurpose video content as text for SEO
  • Power users with an OpenAI account — if you already have an OpenAI key, this actor is the cheapest way to get Whisper-quality transcripts for arbitrary YouTube videos

How to use YouTube Transcriber

  1. Open the actor input page
  2. Paste YouTube video URLs into Video URLs (one per line). Bare 11-char video IDs also work.
  3. Set Preferred caption language (default en)
  4. Choose Transcript method: auto (captions → Whisper), captions (captions only, skip if missing), or whisper (Whisper only, ignore CC)
  5. (Optional) Paste your OpenAI API key — only needed when a video lacks captions in your preferred language and you want Whisper to fill the gap. Captions-only workflows work without a key.
  6. Pick Output format: text or json
  7. Click Start
  8. Download results from the Dataset tab as JSON, CSV, Excel, etc.

Example input

{
"videoUrls": [
"https://www.youtube.com/watch?v=jNQXAC9IVRw",
"https://www.youtube.com/watch?v=dQw4w9WgXcQ"
],
"preferredLanguage": "en",
"transcriptMethod": "auto",
"openaiApiKey": "sk-YOUR_KEY_HERE",
"outputFormat": "text",
"includeTimestamps": false,
"maxDurationMinutes": 18,
"maxWhisperMinutesPerRun": 60
}

Input parameters

ParameterTypeDefaultDescription
videoUrlsstring[]— (required)YouTube URLs (any standard format) or bare 11-char video IDs
preferredLanguagestring"en"BCP-47 code. Falls through to Whisper if not available.
transcriptMethodenum"auto"auto, captions, or whisper
openaiApiKeysecret string— (optional)Your OpenAI API key. Required only when transcriptMethod=whisper. In auto mode, missing key means videos without captions are skipped (with reason no-openai-key-no-fallback) instead of failing the run.
whisperModelenum"whisper-1"Only whisper-1 supports verbose_json segment timestamps
outputFormatenum"text"text or json
includeTimestampsbooleanfalseWhen text, prefix each segment with [HH:MM:SS]
maxDurationMinutesinteger18Skip videos longer than this. Default keeps audio under Whisper's 25 MB limit.
maxWhisperMinutesPerRuninteger60Bounds your OpenAI bill per run. 0 = unlimited.
proxyConfigurationobjectRESIDENTIALYouTube blocks datacenter IPs in 2026; RESIDENTIAL recommended

Output examples

Text format (success)

{
"videoId": "jNQXAC9IVRw",
"videoUrl": "https://www.youtube.com/watch?v=jNQXAC9IVRw",
"title": "Me at the zoo",
"channelTitle": "jawed",
"publishedAt": "2005-04-23T00:00:00.000Z",
"durationSeconds": 19,
"language": "en",
"transcriptMethod": "captions",
"outputFormat": "text",
"transcript": "All right, so here we are, in front of the elephants, the cool thing about these guys is that they have really, really, really long trunks, and that's cool. And that's pretty much all there is to say.",
"skipReason": null,
"scrapedAt": "2026-04-19T20:35:00.000Z"
}

JSON format (success, with segment timestamps)

{
"videoId": "jNQXAC9IVRw",
"videoUrl": "https://www.youtube.com/watch?v=jNQXAC9IVRw",
"transcriptMethod": "captions",
"outputFormat": "json",
"transcript": [
{ "start": 0.36, "end": 4.32, "text": "All right, so here we are, in front of the elephants" },
{ "start": 4.32, "end": 8.5, "text": "the cool thing about these guys is that they have really" },
{ "start": 8.5, "end": 14.2, "text": "really really long trunks and that's cool" }
],
"skipReason": null
}

Skip record (visibility into why a video wasn't transcribed)

{
"videoId": "OPf0YbXqDm0",
"title": "Mark Ronson - Uptown Funk",
"transcript": [],
"skipReason": "no-captions",
"outputFormat": "json",
"transcriptMethod": "captions",
"language": ""
}

Skip reasons you may see: video-unavailable, over-duration-cap, live-stream, no-captions, no-openai-key-no-fallback, whisper-budget-exceeded, audio-download-failed, audio-exceeds-whisper-limit, whisper-api-error:401-invalid-key, whisper-api-error:402-insufficient-quota, whisper-api-error:429-rate-limit, whisper-api-error:5xx, whisper-api-error:network.

Pricing

This actor uses pay-per-event pricing — you pay only for what runs.

EventPrice
apify-actor-start (run start, first 5s of compute included)$0.003
transcript-captions (one charge per captioned video transcribed)$0.0005
transcript-whisper (one charge per video routed to Whisper fallback)$0.05

The Whisper-path price covers our proxy bandwidth + audio download + Apify compute. You additionally pay OpenAI ~$0.006/min for the actual Whisper API call (billed to your OpenAI account, not us).

Real-world examples

RunApify sideYour OpenAI sideTotal
1 captioned video$0.003 + $0.0005 = $0.0035$0$0.0035
1 video, no captions, 5 min$0.003 + $0.05 = $0.0535 × $0.006 = $0.030$0.083
1 video, no captions, 18 min (default cap)$0.003 + $0.05 = $0.05318 × $0.006 = $0.108$0.161
10 captioned videos$0.003 + 10 × $0.0005 = $0.008$0$0.008
10 videos, all need Whisper, avg 10 min$0.003 + 10 × $0.05 = $0.503100 × $0.006 = $0.600$1.103

How to scrape YouTube transcripts at scale

  1. Set transcriptMethod=captions for the cheapest path — most popular videos have captions
  2. For videos without captions, set transcriptMethod=auto with a real OpenAI key
  3. Use maxWhisperMinutesPerRun to cap your OpenAI exposure per run (default 60 min = ~$0.36)
  4. Schedule recurring runs via Apify scheduler for monitoring channels / playlists (process URLs in batches)
  5. Pipe results to Google Sheets, BigQuery, Slack via Apify integrations

Anti-bot resilience

  • yt-dlp 2026.03.17 pinned in the Docker image. The 2026 YouTube anti-bot environment (PoToken, SABR signature ciphers) is handled by yt-dlp's mature extractor stack — validated at 87% audio-download success rate against a representative video sample on RESIDENTIAL proxy with no PoToken plugin
  • RESIDENTIAL proxy default — YouTube reliably blocks datacenter IPs in 2026
  • Real-Chrome User-Agent sent on subprocess calls
  • Per-run summary log lets you detect when audio-download success rate degrades

Security and credential handling

  • openaiApiKey is isSecret: true — masked in the Console input form and at rest
  • Key is sent only to api.openai.com over HTTPS
  • Errors from OpenAI are sanitized: sk-* patterns are masked before any log line, dataset record, or thrown error
  • Audio files are written only to OS temp dir (never to the actor's Apify storage), deleted immediately after the Whisper call (try/finally), and best-effort cleaned on actor abort

API usage

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('YOUR_USERNAME/youtube-transcriber').call({
videoUrls: ['https://www.youtube.com/watch?v=jNQXAC9IVRw'],
openaiApiKey: 'sk-YOUR_OPENAI_KEY',
transcriptMethod: 'auto',
outputFormat: 'text',
});
const { items } = await client.dataset(run.namedDatasetIds.transcripts).listItems();
console.log(items);

FAQ

Why do I need an OpenAI key? Whisper transcription quality is best in class and we don't bundle the cost — you pay OpenAI directly. If a video has captions, the key isn't used. The captions-only mode skips Whisper entirely so you can use the actor without an OpenAI account by setting transcriptMethod=captions and pasting any non-empty placeholder.

Why is maxDurationMinutes capped at 18 by default? OpenAI Whisper's hard limit is 25 MB per request. 18 minutes of audio at typical bitrates is comfortably under. If you raise the cap, you may hit audio-exceeds-whisper-limit skips on high-bitrate music videos.

Can I scrape playlists or channels? Not in v1 — single video URLs only. Workaround: use Apify's other YouTube actors to extract video URLs from a playlist/channel, then pipe them into this one.

What about livestreams? Active livestreams are skipped with reason live-stream. Concluded livestream archives may work but are not explicitly tested.

What about private/age-restricted videos? They produce video-unavailable skip records. The actor never tries to authenticate.

The actor returned a whisper-api-error:402-insufficient-quota skip — what now? Your OpenAI account is out of credit. Top up at platform.openai.com — the actor side cost is unaffected.

Can I get SRT/VTT subtitle files? Not in v1. JSON output gives you per-segment timestamps that you can convert client-side.

This actor accesses publicly available YouTube content. Scraping public data is generally permissible per hiQ Labs v. LinkedIn (2022). The actor does not log in, bypass age-gates, or download from private/restricted videos. For commercial uses, consult your own legal counsel — this is not legal advice. GDPR: video metadata may include creator names (public usernames); aggregate anonymized analysis is generally safe.