YouTube Transcriber
Pricing
from $5.00 / 1,000 caption transcripts
YouTube Transcriber
Transcribe YouTube videos. Captions when available, OpenAI Whisper fallback (BYOK) for the rest. No YouTube account needed.
Pricing
from $5.00 / 1,000 caption transcripts
Rating
0.0
(0)
Developer
Arnas
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
15 hours ago
Last modified
Categories
Share
Transcribe YouTube videos via captions when available, OpenAI Whisper API as fallback. You bring your own OpenAI key, you pay OpenAI directly for Whisper compute, the actor only charges for the transcript event itself. Single video URLs only in v1.
What does YouTube Transcriber do?
YouTube Transcriber extracts the spoken content of a YouTube video as text. When the video has captions in your requested language, it grabs them directly (cheap and fast). When it doesn't, it downloads the smallest available audio format and sends it to OpenAI's Whisper API for transcription using your own OpenAI key. Output is plain text or structured JSON with timestamps. No YouTube account needed.
Built on yt-dlp (the most reliable YouTube extraction tool in 2026) plus ffmpeg, with the actor wrapping subprocess calls in a strict SSRF / shell-injection defense.
Why use YouTube Transcriber?
- Cheapest captions price on Apify Store — $0.0005 per transcript on the captions path, matching the captions-only price leader
- BYOK Whisper at zero markup — when Whisper fallback fires you pay OpenAI directly (~$0.006/min). The actor charges $0.05 for the path (vs codepoetry's bundled $0.012/min × N min, ~5-6× cheaper for typical video lengths)
- Predictable cost ceiling —
maxWhisperMinutesPerRuncaps your OpenAI bill per run - Audio always fits Whisper's limit — yt-dlp + ffmpeg picks smallest-format audio under 24 MB; configurable
maxDurationMinutes(default 18) - In-product visibility — every video produces a record (success or skip with reason), so you can see why something was skipped without scrolling logs
- No silent leaks — your OpenAI key is
isSecret: truein the input form, never logged, sanitized from any error message before output
Who is this for?
- Researchers — pull transcripts of academic talks, interviews, podcasts at scale
- AI/ML engineers — feed real human speech into pipelines, fine-tune models on real conversations
- Journalists — transcribe source video evidence quickly
- Content marketers — repurpose video content as text for SEO
- Power users with an OpenAI account — if you already have an OpenAI key, this actor is the cheapest way to get Whisper-quality transcripts for arbitrary YouTube videos
How to use YouTube Transcriber
- Open the actor input page
- Paste YouTube video URLs into Video URLs (one per line). Bare 11-char video IDs also work.
- Set Preferred caption language (default
en) - Choose Transcript method:
auto(captions → Whisper),captions(captions only, skip if missing), orwhisper(Whisper only, ignore CC) - (Optional) Paste your OpenAI API key — only needed when a video lacks captions in your preferred language and you want Whisper to fill the gap. Captions-only workflows work without a key.
- Pick Output format:
textorjson - Click Start
- Download results from the Dataset tab as JSON, CSV, Excel, etc.
Example input
{"videoUrls": ["https://www.youtube.com/watch?v=jNQXAC9IVRw","https://www.youtube.com/watch?v=dQw4w9WgXcQ"],"preferredLanguage": "en","transcriptMethod": "auto","openaiApiKey": "sk-YOUR_KEY_HERE","outputFormat": "text","includeTimestamps": false,"maxDurationMinutes": 18,"maxWhisperMinutesPerRun": 60}
Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
videoUrls | string[] | — (required) | YouTube URLs (any standard format) or bare 11-char video IDs |
preferredLanguage | string | "en" | BCP-47 code. Falls through to Whisper if not available. |
transcriptMethod | enum | "auto" | auto, captions, or whisper |
openaiApiKey | secret string | — (optional) | Your OpenAI API key. Required only when transcriptMethod=whisper. In auto mode, missing key means videos without captions are skipped (with reason no-openai-key-no-fallback) instead of failing the run. |
whisperModel | enum | "whisper-1" | Only whisper-1 supports verbose_json segment timestamps |
outputFormat | enum | "text" | text or json |
includeTimestamps | boolean | false | When text, prefix each segment with [HH:MM:SS] |
maxDurationMinutes | integer | 18 | Skip videos longer than this. Default keeps audio under Whisper's 25 MB limit. |
maxWhisperMinutesPerRun | integer | 60 | Bounds your OpenAI bill per run. 0 = unlimited. |
proxyConfiguration | object | RESIDENTIAL | YouTube blocks datacenter IPs in 2026; RESIDENTIAL recommended |
Output examples
Text format (success)
{"videoId": "jNQXAC9IVRw","videoUrl": "https://www.youtube.com/watch?v=jNQXAC9IVRw","title": "Me at the zoo","channelTitle": "jawed","publishedAt": "2005-04-23T00:00:00.000Z","durationSeconds": 19,"language": "en","transcriptMethod": "captions","outputFormat": "text","transcript": "All right, so here we are, in front of the elephants, the cool thing about these guys is that they have really, really, really long trunks, and that's cool. And that's pretty much all there is to say.","skipReason": null,"scrapedAt": "2026-04-19T20:35:00.000Z"}
JSON format (success, with segment timestamps)
{"videoId": "jNQXAC9IVRw","videoUrl": "https://www.youtube.com/watch?v=jNQXAC9IVRw","transcriptMethod": "captions","outputFormat": "json","transcript": [{ "start": 0.36, "end": 4.32, "text": "All right, so here we are, in front of the elephants" },{ "start": 4.32, "end": 8.5, "text": "the cool thing about these guys is that they have really" },{ "start": 8.5, "end": 14.2, "text": "really really long trunks and that's cool" }],"skipReason": null}
Skip record (visibility into why a video wasn't transcribed)
{"videoId": "OPf0YbXqDm0","title": "Mark Ronson - Uptown Funk","transcript": [],"skipReason": "no-captions","outputFormat": "json","transcriptMethod": "captions","language": ""}
Skip reasons you may see: video-unavailable, over-duration-cap, live-stream, no-captions, no-openai-key-no-fallback, whisper-budget-exceeded, audio-download-failed, audio-exceeds-whisper-limit, whisper-api-error:401-invalid-key, whisper-api-error:402-insufficient-quota, whisper-api-error:429-rate-limit, whisper-api-error:5xx, whisper-api-error:network.
Pricing
This actor uses pay-per-event pricing — you pay only for what runs.
| Event | Price |
|---|---|
apify-actor-start (run start, first 5s of compute included) | $0.003 |
transcript-captions (one charge per captioned video transcribed) | $0.0005 |
transcript-whisper (one charge per video routed to Whisper fallback) | $0.05 |
The Whisper-path price covers our proxy bandwidth + audio download + Apify compute. You additionally pay OpenAI ~$0.006/min for the actual Whisper API call (billed to your OpenAI account, not us).
Real-world examples
| Run | Apify side | Your OpenAI side | Total |
|---|---|---|---|
| 1 captioned video | $0.003 + $0.0005 = $0.0035 | $0 | $0.0035 |
| 1 video, no captions, 5 min | $0.003 + $0.05 = $0.053 | 5 × $0.006 = $0.030 | $0.083 |
| 1 video, no captions, 18 min (default cap) | $0.003 + $0.05 = $0.053 | 18 × $0.006 = $0.108 | $0.161 |
| 10 captioned videos | $0.003 + 10 × $0.0005 = $0.008 | $0 | $0.008 |
| 10 videos, all need Whisper, avg 10 min | $0.003 + 10 × $0.05 = $0.503 | 100 × $0.006 = $0.600 | $1.103 |
How to scrape YouTube transcripts at scale
- Set
transcriptMethod=captionsfor the cheapest path — most popular videos have captions - For videos without captions, set
transcriptMethod=autowith a real OpenAI key - Use
maxWhisperMinutesPerRunto cap your OpenAI exposure per run (default 60 min = ~$0.36) - Schedule recurring runs via Apify scheduler for monitoring channels / playlists (process URLs in batches)
- Pipe results to Google Sheets, BigQuery, Slack via Apify integrations
Anti-bot resilience
- yt-dlp 2026.03.17 pinned in the Docker image. The 2026 YouTube anti-bot environment (PoToken, SABR signature ciphers) is handled by yt-dlp's mature extractor stack — validated at 87% audio-download success rate against a representative video sample on RESIDENTIAL proxy with no PoToken plugin
- RESIDENTIAL proxy default — YouTube reliably blocks datacenter IPs in 2026
- Real-Chrome User-Agent sent on subprocess calls
- Per-run summary log lets you detect when audio-download success rate degrades
Security and credential handling
openaiApiKeyisisSecret: true— masked in the Console input form and at rest- Key is sent only to
api.openai.comover HTTPS - Errors from OpenAI are sanitized:
sk-*patterns are masked before any log line, dataset record, or thrown error - Audio files are written only to OS temp dir (never to the actor's Apify storage), deleted immediately after the Whisper call (try/finally), and best-effort cleaned on actor abort
API usage
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });const run = await client.actor('YOUR_USERNAME/youtube-transcriber').call({videoUrls: ['https://www.youtube.com/watch?v=jNQXAC9IVRw'],openaiApiKey: 'sk-YOUR_OPENAI_KEY',transcriptMethod: 'auto',outputFormat: 'text',});const { items } = await client.dataset(run.namedDatasetIds.transcripts).listItems();console.log(items);
FAQ
Why do I need an OpenAI key? Whisper transcription quality is best in class and we don't bundle the cost — you pay OpenAI directly. If a video has captions, the key isn't used. The captions-only mode skips Whisper entirely so you can use the actor without an OpenAI account by setting transcriptMethod=captions and pasting any non-empty placeholder.
Why is maxDurationMinutes capped at 18 by default? OpenAI Whisper's hard limit is 25 MB per request. 18 minutes of audio at typical bitrates is comfortably under. If you raise the cap, you may hit audio-exceeds-whisper-limit skips on high-bitrate music videos.
Can I scrape playlists or channels? Not in v1 — single video URLs only. Workaround: use Apify's other YouTube actors to extract video URLs from a playlist/channel, then pipe them into this one.
What about livestreams? Active livestreams are skipped with reason live-stream. Concluded livestream archives may work but are not explicitly tested.
What about private/age-restricted videos? They produce video-unavailable skip records. The actor never tries to authenticate.
The actor returned a whisper-api-error:402-insufficient-quota skip — what now? Your OpenAI account is out of credit. Top up at platform.openai.com — the actor side cost is unaffected.
Can I get SRT/VTT subtitle files? Not in v1. JSON output gives you per-segment timestamps that you can convert client-side.
Legal
This actor accesses publicly available YouTube content. Scraping public data is generally permissible per hiQ Labs v. LinkedIn (2022). The actor does not log in, bypass age-gates, or download from private/restricted videos. For commercial uses, consult your own legal counsel — this is not legal advice. GDPR: video metadata may include creator names (public usernames); aggregate anonymized analysis is generally safe.