Pricing

from $5.00 / 1,000 caption transcripts

YouTube Transcriber

Transcribe YouTube videos. Captions when available, OpenAI Whisper fallback (BYOK) for the rest. No YouTube account needed.

Pricing

from $5.00 / 1,000 caption transcripts

Rating

0.0

(0)

Developer

Arnas

Actor stats

Bookmarked

Total users

Monthly active users

15 hours ago

Last modified

What does YouTube Transcriber do?

YouTube Transcriber extracts the spoken content of a YouTube video as text. When the video has captions in your requested language, it grabs them directly (cheap and fast). When it doesn't, it downloads the smallest available audio format and sends it to OpenAI's Whisper API for transcription using your own OpenAI key. Output is plain text or structured JSON with timestamps. No YouTube account needed.

Built on yt-dlp (the most reliable YouTube extraction tool in 2026) plus ffmpeg, with the actor wrapping subprocess calls in a strict SSRF / shell-injection defense.

Why use YouTube Transcriber?

Cheapest captions price on Apify Store — $0.0005 per transcript on the captions path, matching the captions-only price leader
BYOK Whisper at zero markup — when Whisper fallback fires you pay OpenAI directly (~$0.006/min). The actor charges $0.05 for the path (vs codepoetry's bundled $0.012/min × N min, ~5-6× cheaper for typical video lengths)
Predictable cost ceiling — maxWhisperMinutesPerRun caps your OpenAI bill per run
Audio always fits Whisper's limit — yt-dlp + ffmpeg picks smallest-format audio under 24 MB; configurable maxDurationMinutes (default 18)
In-product visibility — every video produces a record (success or skip with reason), so you can see why something was skipped without scrolling logs
No silent leaks — your OpenAI key is isSecret: true in the input form, never logged, sanitized from any error message before output

Who is this for?

Researchers — pull transcripts of academic talks, interviews, podcasts at scale
AI/ML engineers — feed real human speech into pipelines, fine-tune models on real conversations
Journalists — transcribe source video evidence quickly
Content marketers — repurpose video content as text for SEO
Power users with an OpenAI account — if you already have an OpenAI key, this actor is the cheapest way to get Whisper-quality transcripts for arbitrary YouTube videos

How to use YouTube Transcriber

Open the actor input page
Paste YouTube video URLs into Video URLs (one per line). Bare 11-char video IDs also work.
Set Preferred caption language (default en)
Choose Transcript method: auto (captions → Whisper), captions (captions only, skip if missing), or whisper (Whisper only, ignore CC)
(Optional) Paste your OpenAI API key — only needed when a video lacks captions in your preferred language and you want Whisper to fill the gap. Captions-only workflows work without a key.
Pick Output format: text or json
Click Start
Download results from the Dataset tab as JSON, CSV, Excel, etc.

Example input

{
    "videoUrls": [
        "https://www.youtube.com/watch?v=jNQXAC9IVRw",
        "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    ],
    "preferredLanguage": "en",
    "transcriptMethod": "auto",
    "openaiApiKey": "sk-YOUR_KEY_HERE",
    "outputFormat": "text",
    "includeTimestamps": false,
    "maxDurationMinutes": 18,
    "maxWhisperMinutesPerRun": 60
}

Input parameters

Parameter	Type	Default	Description
`videoUrls`	string[]	— (required)	YouTube URLs (any standard format) or bare 11-char video IDs
`preferredLanguage`	string	`"en"`	BCP-47 code. Falls through to Whisper if not available.
`transcriptMethod`	enum	`"auto"`	`auto`, `captions`, or `whisper`
`openaiApiKey`	secret string	— (optional)	Your OpenAI API key. Required only when `transcriptMethod=whisper`. In `auto` mode, missing key means videos without captions are skipped (with reason `no-openai-key-no-fallback`) instead of failing the run.
`whisperModel`	enum	`"whisper-1"`	Only `whisper-1` supports `verbose_json` segment timestamps
`outputFormat`	enum	`"text"`	`text` or `json`
`includeTimestamps`	boolean	`false`	When `text`, prefix each segment with `[HH:MM:SS]`
`maxDurationMinutes`	integer	`18`	Skip videos longer than this. Default keeps audio under Whisper's 25 MB limit.
`maxWhisperMinutesPerRun`	integer	`60`	Bounds your OpenAI bill per run. `0` = unlimited.
`proxyConfiguration`	object	`RESIDENTIAL`	YouTube blocks datacenter IPs in 2026; RESIDENTIAL recommended

Output examples

Text format (success)

{
    "videoId": "jNQXAC9IVRw",
    "videoUrl": "https://www.youtube.com/watch?v=jNQXAC9IVRw",
    "title": "Me at the zoo",
    "channelTitle": "jawed",
    "publishedAt": "2005-04-23T00:00:00.000Z",
    "durationSeconds": 19,
    "language": "en",
    "transcriptMethod": "captions",
    "outputFormat": "text",
    "transcript": "All right, so here we are, in front of the elephants, the cool thing about these guys is that they have really, really, really long trunks, and that's cool. And that's pretty much all there is to say.",
    "skipReason": null,
    "scrapedAt": "2026-04-19T20:35:00.000Z"
}

JSON format (success, with segment timestamps)

{
    "videoId": "jNQXAC9IVRw",
    "videoUrl": "https://www.youtube.com/watch?v=jNQXAC9IVRw",
    "transcriptMethod": "captions",
    "outputFormat": "json",
    "transcript": [
        { "start": 0.36, "end": 4.32, "text": "All right, so here we are, in front of the elephants" },
        { "start": 4.32, "end": 8.5, "text": "the cool thing about these guys is that they have really" },
        { "start": 8.5, "end": 14.2, "text": "really really long trunks and that's cool" }
    ],
    "skipReason": null
}

Skip record (visibility into why a video wasn't transcribed)

{
    "videoId": "OPf0YbXqDm0",
    "title": "Mark Ronson - Uptown Funk",
    "transcript": [],
    "skipReason": "no-captions",
    "outputFormat": "json",
    "transcriptMethod": "captions",
    "language": ""
}

Skip reasons you may see: video-unavailable, over-duration-cap, live-stream, no-captions, no-openai-key-no-fallback, whisper-budget-exceeded, audio-download-failed, audio-exceeds-whisper-limit, whisper-api-error:401-invalid-key, whisper-api-error:402-insufficient-quota, whisper-api-error:429-rate-limit, whisper-api-error:5xx, whisper-api-error:network.

Pricing

This actor uses pay-per-event pricing — you pay only for what runs.

Event	Price
`apify-actor-start` (run start, first 5s of compute included)	$0.003
`transcript-captions` (one charge per captioned video transcribed)	$0.0005
`transcript-whisper` (one charge per video routed to Whisper fallback)	$0.05

The Whisper-path price covers our proxy bandwidth + audio download + Apify compute. You additionally pay OpenAI ~$0.006/min for the actual Whisper API call (billed to your OpenAI account, not us).

Real-world examples

Run	Apify side	Your OpenAI side	Total
1 captioned video	$0.003 + $0.0005 = $0.0035	$0	$0.0035
1 video, no captions, 5 min	$0.003 + $0.05 = $0.053	5 × $0.006 = $0.030	$0.083
1 video, no captions, 18 min (default cap)	$0.003 + $0.05 = $0.053	18 × $0.006 = $0.108	$0.161
10 captioned videos	$0.003 + 10 × $0.0005 = $0.008	$0	$0.008
10 videos, all need Whisper, avg 10 min	$0.003 + 10 × $0.05 = $0.503	100 × $0.006 = $0.600	$1.103

How to scrape YouTube transcripts at scale

Set transcriptMethod=captions for the cheapest path — most popular videos have captions
For videos without captions, set transcriptMethod=auto with a real OpenAI key
Use maxWhisperMinutesPerRun to cap your OpenAI exposure per run (default 60 min = ~$0.36)
Schedule recurring runs via Apify scheduler for monitoring channels / playlists (process URLs in batches)
Pipe results to Google Sheets, BigQuery, Slack via Apify integrations

Anti-bot resilience

yt-dlp 2026.03.17 pinned in the Docker image. The 2026 YouTube anti-bot environment (PoToken, SABR signature ciphers) is handled by yt-dlp's mature extractor stack — validated at 87% audio-download success rate against a representative video sample on RESIDENTIAL proxy with no PoToken plugin
RESIDENTIAL proxy default — YouTube reliably blocks datacenter IPs in 2026
Real-Chrome User-Agent sent on subprocess calls
Per-run summary log lets you detect when audio-download success rate degrades

Security and credential handling

openaiApiKey is isSecret: true — masked in the Console input form and at rest
Key is sent only to api.openai.com over HTTPS
Errors from OpenAI are sanitized: sk-* patterns are masked before any log line, dataset record, or thrown error
Audio files are written only to OS temp dir (never to the actor's Apify storage), deleted immediately after the Whisper call (try/finally), and best-effort cleaned on actor abort

API usage

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('YOUR_USERNAME/youtube-transcriber').call({
    videoUrls: ['https://www.youtube.com/watch?v=jNQXAC9IVRw'],
    openaiApiKey: 'sk-YOUR_OPENAI_KEY',
    transcriptMethod: 'auto',
    outputFormat: 'text',
});

const { items } = await client.dataset(run.namedDatasetIds.transcripts).listItems();
console.log(items);

FAQ

Why do I need an OpenAI key? Whisper transcription quality is best in class and we don't bundle the cost — you pay OpenAI directly. If a video has captions, the key isn't used. The captions-only mode skips Whisper entirely so you can use the actor without an OpenAI account by setting transcriptMethod=captions and pasting any non-empty placeholder.

Why is maxDurationMinutes capped at 18 by default? OpenAI Whisper's hard limit is 25 MB per request. 18 minutes of audio at typical bitrates is comfortably under. If you raise the cap, you may hit audio-exceeds-whisper-limit skips on high-bitrate music videos.

Can I scrape playlists or channels? Not in v1 — single video URLs only. Workaround: use Apify's other YouTube actors to extract video URLs from a playlist/channel, then pipe them into this one.

What about livestreams? Active livestreams are skipped with reason live-stream. Concluded livestream archives may work but are not explicitly tested.

What about private/age-restricted videos? They produce video-unavailable skip records. The actor never tries to authenticate.

The actor returned a whisper-api-error:402-insufficient-quota skip — what now? Your OpenAI account is out of credit. Top up at platform.openai.com — the actor side cost is unaffected.

Can I get SRT/VTT subtitle files? Not in v1. JSON output gives you per-segment timestamps that you can convert client-side.

Legal

This actor accesses publicly available YouTube content. Scraping public data is generally permissible per hiQ Labs v. LinkedIn (2022). The actor does not log in, bypass age-gates, or download from private/restricted videos. For commercial uses, consult your own legal counsel — this is not legal advice. GDPR: video metadata may include creator names (public usernames); aggregate anonymized analysis is generally safe.

TikTok | Instagram | Facebook | YouTube Shorts Transcriber

tictechid/anoxvanzi-Transcriber

Extract accurate transcripts from Instagram Reels, Facebook Reels, YouTube Shorts, and TikTok videos. Use video URLs to transcribe public content with timestamps. Export transcripts in JSON format, run via API, schedule runs, or integrate with other tools for automated transcription workflows.

TicTech

1.6K

5.0

(8)

Youtube Transcriber

hearty_theme/my-actor-3

This actor transcribes a youtube video using an url provided by the user

Martin Paz

$0.15/min REAL YouTube Transcriber & Subtitles (JSON/SRT/VTT)

practicaltools/apify-youtube-transcribe

Download and transcribe YouTube videos into text and subtitle files – quickly, locally, and without external APIs. This Apify actor Faster-Whisper to generate transcripts and captions. It saves results in TXT, JSON, SRT, and VTT formats, plus provides a summary in the Dataset.

Practical Tools

5.0

(2)

YouTube Transcription for $0.006/min (GPT-4o)

stanvanrooy6/youtube-transcriber-gpt4o

Get premium YouTube transcriptions for just ~$0.006/minute by using your own OpenAI API key. No markups, no hidden costs.

Stan Van Rooy

Youtube Transcript Extractor

yesintelligent/youtube-transcript-extractor

Extract accurate transcripts from any YouTube video in seconds. Get text from YouTube videos, shorts, and ended live streams with 95%+ accuracy. Perfect for content creators, researchers, students, and businesses who need reliable YouTube transcript extraction.

yesintelligent

113

Yt Ai Summarizer

automify/yt-ai-summarizer

Turn any public YouTube video URL into multilingual transcript, subtitles, timing data, and AI summary for faster content reuse and analysis. Automatic language detection with clean segmentation and precise timings generated in one run: plain transcript, SRT, VTT, TSV

Krzysztof

Best Tiktok Ai Transcript Extractor

sian.agency/best-tiktok-ai-transcript-extractor

⚡️ COMPLETE TikTok data package - AI transcript + SRT/VTT subtitles + timestamped segments with speaker diarization + 45 data fields (views, likes, creator stats, hashtags, music, location, content categories). Bulk processing ready. 99%+ accuracy. Turn videos into actionable data instantly!

SIÁN OÜ

363

5.0

(5)

Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.

invideoiq/video-transcript-scraper

Scrapes transcripts from online video/audio content on multiple plateforms (Youtube, X, ..) in any available language. It delivers outputs in both JSON and LLM-ready formats, making it ideal for analytics, and AI-based applications. Perfect for research and building intelligent conversational agents

InVideoIQ

1.5K

4.4

(24)

INCREDIBLY FAST audio transcriber

sian.agency/INCREDIBLY-FAST-audio-transcriber

「 𝙊𝙉𝙇𝙔 $𝟬.𝟬𝟵/𝗺𝗶𝗻𝘂𝘁𝗲 𝗼𝗳 𝗮𝘂𝗱𝗶𝗼 」Process 100+ files/hour with 10x parallel processing. Hours to minutes. 50x faster than manual. Zero wait time. 100+ languages. Instant results!

SIÁN OÜ

5.0

(2)

Audio Transcriber

parseforge/audio-transcriber

Automates audio transcription from multiple sources (files or links). Normalizes input format to ensure optimal processing. Generates word-for-word transcriptions maintaining references to source audio, perfect for datasets requiring traceability and regulatory compliance.