Video Subtitle & Caption Extractor
Pricing
Pay per event + usage
Video Subtitle & Caption Extractor
Extract subtitles, captions, and AI transcripts from any video URL across 1000+ platforms (YouTube, Vimeo, TikTok, Instagram, X/Twitter, Facebook, Twitch, TED, Bilibili). Native captions first, Whisper AI fallback when none. JSON, SRT, VTT, text, or LLM-ready markdown.
Pricing
Pay per event + usage
Rating
0.0
(0)
Developer
Khadin Akbar
Actor stats
0
Bookmarked
10
Total users
7
Monthly active users
18 hours ago
Last modified
Categories
Share
Video Subtitle & Caption Extractor — 1000+ Sites + Whisper AI
Extract subtitles, captions, and AI transcripts from any video URL across 1000+ platforms — YouTube, Vimeo, TikTok, Instagram, X (Twitter), Facebook, Twitch, Dailymotion, TED, Bilibili, SoundCloud, Reddit and more. Native captions when available (cheap), Whisper AI fallback when not (accurate). Five output formats: timestamped JSON, SRT, VTT, plain text, and LLM-ready Markdown. No yt-dlp install, no FFmpeg setup, no API juggling — paste a URL and get a transcript.
What does Video Subtitle & Caption Extractor do?
This Actor turns any public video URL into a clean, timestamped transcript. It tries the cheap path first — native human-uploaded captions, then YouTube auto-generated captions — and only falls back to OpenAI Whisper for the small share of videos where no captions exist. You stay in full control of cost: cap video length, disable Whisper entirely, or bring your own OpenAI key to bypass the per-minute charge.
Under the hood it uses yt-dlp, the same tool that powers most professional video pipelines, so it handles 1000+ sites including YouTube, Vimeo, Twitch, TikTok, Instagram, Facebook, X, Twitch VODs, Dailymotion, TED, Bilibili, BBC iPlayer, SoundCloud, Reddit, and dozens of regional platforms. The Apify platform handles proxy rotation, retries, scheduling, and storage — you just call the API or click Run.
Try it: paste a YouTube URL into the Input tab and hit Start. You'll get back a JSON record with title, channel, duration, full timestamped transcript, plus an SRT file in the key-value store.
Why use Video Subtitle & Caption Extractor?
- One tool, every video site — stop maintaining 5 different scrapers for YouTube, TikTok, Instagram, Vimeo, and friends.
- Smart pricing — pay $0.005 per video that has captions; only pay the $0.02/min Whisper rate when there's no other option.
- 5 output formats — JSON for AI pipelines, SRT/VTT for video editing, text for search/embeddings, Markdown for LLM context windows.
- MCP-ready — designed to be called by Claude, GPT, and other agents. The tool description and output shape are built for LLM consumption.
- Translation built in — set
translateTo: "es"and YouTube's auto-translation kicks in for any video with captions. - Apify residential proxies — geo-locked or rate-limited videos work out of the box.
- No infrastructure — no FFmpeg install, no yt-dlp updates to chase, no OpenAI account required (BYOK is optional).
Common use cases:
- AI training data — build transcript corpora at scale across mixed video sources.
- Content repurposing — turn podcasts, talks, and interviews into blog posts, X threads, and LinkedIn articles.
- Accessibility — generate WCAG-compliant SRT/VTT files for your video library.
- Competitor / market research — extract messaging from competitor ads, demos, and webinars.
- SEO — pull full transcripts of YouTube videos to repurpose into search-indexed long-form pages.
- Localization — translate captions to ship videos in multiple languages without re-recording.
How to use Video Subtitle & Caption Extractor
- Open the Actor and click Try for free.
- Paste one or more video URLs into the Video URLs field. Any public URL works — YouTube, Vimeo, TikTok, Instagram, X, etc.
- (Optional) Pick your output format — JSON for AI/programmatic use, SRT for video editors, Markdown for LLM context.
- (Optional) Add an OpenAI API key if you want to bypass the platform Whisper charge ($0.005 flat per video, you pay OpenAI directly).
- Click Save & Start.
- When the run finishes, open the Output tab to view results, or grab the Dataset URL from the Output schema to pull JSON/CSV/Excel via API.
That's it. The Actor handles language selection, proxy rotation, retries, and Whisper fallback automatically.
Input
| Field | Type | Description |
|---|---|---|
videoUrls (required) | array of {url} | Public video URLs. Any platform supported by yt-dlp (1000+). |
preferredLanguages | array of strings | ISO 639-1 codes in priority order. Default ["en"]. Use "auto" for any. |
outputFormat | enum | json (default), srt, vtt, text, markdown. |
useWhisperFallback | boolean | Transcribe with Whisper when no captions exist. Default true. |
openaiApiKey | string (secret) | BYOK: bypass platform Whisper, pay OpenAI directly. |
translateTo | string | ISO 639-1 code to auto-translate captions (YouTube only). |
includeAutoCaptions | boolean | Accept auto-generated captions. Default true. |
maxDurationMinutes | integer | Skip videos longer than this. Default 120. |
includeMetadata | boolean | Include title, channel, duration, etc. Default true. |
proxyConfiguration | object | Defaults to Apify residential proxies. |
Example JSON input:
{"videoUrls": [{ "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" },{ "url": "https://www.tiktok.com/@scout2015/video/6718335390845095173" },{ "url": "https://vimeo.com/76979871" }],"preferredLanguages": ["en", "es"],"outputFormat": "json","useWhisperFallback": true,"maxDurationMinutes": 60}
Output
Each video produces one dataset record. Download as JSON, CSV, Excel, or HTML from the Output tab, or stream via the Apify API.
{"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ","platform": "youtube","title": "Rick Astley - Never Gonna Give You Up","channel": "Rick Astley","channelUrl": "https://www.youtube.com/@RickAstleyYT","durationSeconds": 213,"viewCount": 1700000000,"likeCount": 18000000,"uploadDate": "20091025","thumbnail": "https://i.ytimg.com/vi/dQw4w9WgXcQ/maxresdefault.jpg","transcript": [{ "start": 18.96, "end": 22.96, "text": "We're no strangers to love" },{ "start": 22.96, "end": 26.96, "text": "You know the rules and so do I" }],"transcriptFormat": "json","transcriptSource": "manual","language": "en","isAutoGenerated": false,"segmentCount": 87,"characterCount": 2104,"whisperMinutesCharged": 0,"srtKey": "youtube-dQw4w9WgXcQ.srt","vttKey": "youtube-dQw4w9WgXcQ.vtt"}
Raw .srt and .vtt files for each video are also saved to the default key-value store — fetch them at KEY_VALUE_STORE_URL/keys/{srtKey}.
Data table
| Field | Description |
|---|---|
videoUrl | Original URL processed. |
platform | Inferred platform (youtube, vimeo, tiktok, instagram, etc.). |
title, channel, channelUrl | Video metadata. |
durationSeconds | Length in seconds. |
viewCount, likeCount, uploadDate, thumbnail, description | Engagement + presentation metadata. |
transcript | Array of {start, end, text} (json) or formatted string (srt/vtt/text/markdown). |
transcriptFormat | Echoes the input outputFormat. |
transcriptSource | manual (human captions), auto (auto-captions), translated, or whisper. |
language | ISO 639-1 code of the returned transcript. |
isAutoGenerated | True if captions were auto-generated or Whisper-transcribed. |
segmentCount, characterCount | Quick sizing for downstream pipelines. |
whisperMinutesCharged | Minutes that contributed to the bill. 0 when captions were used or BYOK key supplied. |
srtKey, vttKey | Keys in the key-value store for raw subtitle files. |
error | Populated when extraction failed (with a hint to fix). |
Pricing — How much does it cost to extract video subtitles?
Pay-per-event, capped at three line items so cost is predictable:
| Event | Price | When charged |
|---|---|---|
| Actor start | $0.001 / GB RAM | Once per run start. |
| Transcript extracted | $0.005 | Per video where a transcript was returned (any source). |
| Whisper minute | $0.02 | Per minute when platform Whisper is used. Not charged with BYOK key. |
Typical run costs:
- 1 YouTube video with captions: ~$0.006
- 1 TikTok video without captions, 30s, platform Whisper: ~$0.026
- 10 mixed YouTube/Vimeo videos, all with captions: ~$0.05
- 100 podcast episodes, 30 min each, BYOK Whisper: $0.50 (you also pay OpenAI ~$18 directly)
Bring your own OpenAI key to drop platform Whisper charges to zero — you'll pay OpenAI directly at ~$0.006/min.
Tips & advanced options
- Cap cost on long videos — keep
maxDurationMinutesat the default 120, or lower it to 30 for tight budgets when running mixed-length playlists. - Reduce noise — set
includeAutoCaptions: falseto require human-reviewed captions only (lower coverage, higher quality). - Translation —
translateTo: "es"works on any YouTube video with captions, native or auto. Whisper itself does not translate — set the OpenAI Whisper-1 model and use a separate translation step if your source is non-YouTube. - Memory — default is 2 GB. Bump to 4 GB if you're feeding very long videos through Whisper in parallel.
- Datacenter proxy — switch from residential to datacenter on cheap, uncontested sites (TED, Vimeo public) to drop proxy costs.
- MCP usage — call this Actor as
apify--video-subtitle-extractorin Apify MCP. Works out of the box with Claude Desktop, Cursor, and any Streamable HTTP MCP client.
FAQ, disclaimers, and support
Which platforms are supported? Anything yt-dlp supports — see the yt-dlp supported sites list. 1000+ sites including all major social and video platforms.
What's the difference between this and the YouTube-only transcript actors? Multi-platform support, smart caption-first pricing, MCP-ready descriptions, 5 output formats, and Whisper fallback all in one place. Single-platform actors (khadinakbar/youtube-transcript-extractor for YouTube only) remain cheaper if you exclusively need one platform.
What happens if a video is private or geo-locked? The Actor returns an error field with a message explaining the issue. Try a different proxyConfiguration.countryCode for geo-locked content.
Can I scrape playlists or channels? Not directly in v0.1 — pass individual video URLs. Pair with khadinakbar/youtube-shorts-scraper or similar for URL discovery, then feed URLs here.
Will Whisper work on every site? Yes — if yt-dlp can reach the audio, Whisper transcribes it. The 25 MB Whisper file size cap means very long high-bitrate sources may be skipped. Reduce duration or accept lossier audio formats.
Legal disclaimer: This Actor extracts publicly accessible subtitle data. It is your responsibility to comply with each platform's Terms of Service and your local copyright laws. Do not extract content you do not have rights to use. The Actor does not bypass paywalls, age gates, or login walls.
Found a bug or need a feature? Open an issue on the Actor's Issues tab — bugs are usually fixed within a few days. Custom builds (private platforms, bulk channel ingestion, real-time webhooks) available on request.
Related Actors in this portfolio
youtube-transcript-extractor— YouTube-only transcripts at $0.005/transcript, no Whisper.youtube-shorts-scraper— Discover YouTube Shorts URLs to feed into this extractor.tiktok-video-comments-scraper— Pair with this to get full TikTok engagement data + transcript.instagram-reels-scraper— Discover Reel URLs at scale, then transcribe with this Actor.meta-ad-library-scraper— Pull competitor video ads, then run them through here for messaging analysis.