Video Subtitle & Caption Extractor avatar

Video Subtitle & Caption Extractor

Pricing

Pay per event + usage

Go to Apify Store
Video Subtitle & Caption Extractor

Video Subtitle & Caption Extractor

Extract subtitles, captions, and AI transcripts from any video URL across 1000+ platforms (YouTube, Vimeo, TikTok, Instagram, X/Twitter, Facebook, Twitch, TED, Bilibili). Native captions first, Whisper AI fallback when none. JSON, SRT, VTT, text, or LLM-ready markdown.

Pricing

Pay per event + usage

Rating

0.0

(0)

Developer

Khadin Akbar

Khadin Akbar

Maintained by Community

Actor stats

0

Bookmarked

10

Total users

7

Monthly active users

18 hours ago

Last modified

Share

Video Subtitle & Caption Extractor — 1000+ Sites + Whisper AI

Extract subtitles, captions, and AI transcripts from any video URL across 1000+ platforms — YouTube, Vimeo, TikTok, Instagram, X (Twitter), Facebook, Twitch, Dailymotion, TED, Bilibili, SoundCloud, Reddit and more. Native captions when available (cheap), Whisper AI fallback when not (accurate). Five output formats: timestamped JSON, SRT, VTT, plain text, and LLM-ready Markdown. No yt-dlp install, no FFmpeg setup, no API juggling — paste a URL and get a transcript.

What does Video Subtitle & Caption Extractor do?

This Actor turns any public video URL into a clean, timestamped transcript. It tries the cheap path first — native human-uploaded captions, then YouTube auto-generated captions — and only falls back to OpenAI Whisper for the small share of videos where no captions exist. You stay in full control of cost: cap video length, disable Whisper entirely, or bring your own OpenAI key to bypass the per-minute charge.

Under the hood it uses yt-dlp, the same tool that powers most professional video pipelines, so it handles 1000+ sites including YouTube, Vimeo, Twitch, TikTok, Instagram, Facebook, X, Twitch VODs, Dailymotion, TED, Bilibili, BBC iPlayer, SoundCloud, Reddit, and dozens of regional platforms. The Apify platform handles proxy rotation, retries, scheduling, and storage — you just call the API or click Run.

Try it: paste a YouTube URL into the Input tab and hit Start. You'll get back a JSON record with title, channel, duration, full timestamped transcript, plus an SRT file in the key-value store.

Why use Video Subtitle & Caption Extractor?

  • One tool, every video site — stop maintaining 5 different scrapers for YouTube, TikTok, Instagram, Vimeo, and friends.
  • Smart pricing — pay $0.005 per video that has captions; only pay the $0.02/min Whisper rate when there's no other option.
  • 5 output formats — JSON for AI pipelines, SRT/VTT for video editing, text for search/embeddings, Markdown for LLM context windows.
  • MCP-ready — designed to be called by Claude, GPT, and other agents. The tool description and output shape are built for LLM consumption.
  • Translation built in — set translateTo: "es" and YouTube's auto-translation kicks in for any video with captions.
  • Apify residential proxies — geo-locked or rate-limited videos work out of the box.
  • No infrastructure — no FFmpeg install, no yt-dlp updates to chase, no OpenAI account required (BYOK is optional).

Common use cases:

  • AI training data — build transcript corpora at scale across mixed video sources.
  • Content repurposing — turn podcasts, talks, and interviews into blog posts, X threads, and LinkedIn articles.
  • Accessibility — generate WCAG-compliant SRT/VTT files for your video library.
  • Competitor / market research — extract messaging from competitor ads, demos, and webinars.
  • SEO — pull full transcripts of YouTube videos to repurpose into search-indexed long-form pages.
  • Localization — translate captions to ship videos in multiple languages without re-recording.

How to use Video Subtitle & Caption Extractor

  1. Open the Actor and click Try for free.
  2. Paste one or more video URLs into the Video URLs field. Any public URL works — YouTube, Vimeo, TikTok, Instagram, X, etc.
  3. (Optional) Pick your output format — JSON for AI/programmatic use, SRT for video editors, Markdown for LLM context.
  4. (Optional) Add an OpenAI API key if you want to bypass the platform Whisper charge ($0.005 flat per video, you pay OpenAI directly).
  5. Click Save & Start.
  6. When the run finishes, open the Output tab to view results, or grab the Dataset URL from the Output schema to pull JSON/CSV/Excel via API.

That's it. The Actor handles language selection, proxy rotation, retries, and Whisper fallback automatically.

Input

FieldTypeDescription
videoUrls (required)array of {url}Public video URLs. Any platform supported by yt-dlp (1000+).
preferredLanguagesarray of stringsISO 639-1 codes in priority order. Default ["en"]. Use "auto" for any.
outputFormatenumjson (default), srt, vtt, text, markdown.
useWhisperFallbackbooleanTranscribe with Whisper when no captions exist. Default true.
openaiApiKeystring (secret)BYOK: bypass platform Whisper, pay OpenAI directly.
translateTostringISO 639-1 code to auto-translate captions (YouTube only).
includeAutoCaptionsbooleanAccept auto-generated captions. Default true.
maxDurationMinutesintegerSkip videos longer than this. Default 120.
includeMetadatabooleanInclude title, channel, duration, etc. Default true.
proxyConfigurationobjectDefaults to Apify residential proxies.

Example JSON input:

{
"videoUrls": [
{ "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" },
{ "url": "https://www.tiktok.com/@scout2015/video/6718335390845095173" },
{ "url": "https://vimeo.com/76979871" }
],
"preferredLanguages": ["en", "es"],
"outputFormat": "json",
"useWhisperFallback": true,
"maxDurationMinutes": 60
}

Output

Each video produces one dataset record. Download as JSON, CSV, Excel, or HTML from the Output tab, or stream via the Apify API.

{
"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"platform": "youtube",
"title": "Rick Astley - Never Gonna Give You Up",
"channel": "Rick Astley",
"channelUrl": "https://www.youtube.com/@RickAstleyYT",
"durationSeconds": 213,
"viewCount": 1700000000,
"likeCount": 18000000,
"uploadDate": "20091025",
"thumbnail": "https://i.ytimg.com/vi/dQw4w9WgXcQ/maxresdefault.jpg",
"transcript": [
{ "start": 18.96, "end": 22.96, "text": "We're no strangers to love" },
{ "start": 22.96, "end": 26.96, "text": "You know the rules and so do I" }
],
"transcriptFormat": "json",
"transcriptSource": "manual",
"language": "en",
"isAutoGenerated": false,
"segmentCount": 87,
"characterCount": 2104,
"whisperMinutesCharged": 0,
"srtKey": "youtube-dQw4w9WgXcQ.srt",
"vttKey": "youtube-dQw4w9WgXcQ.vtt"
}

Raw .srt and .vtt files for each video are also saved to the default key-value store — fetch them at KEY_VALUE_STORE_URL/keys/{srtKey}.

Data table

FieldDescription
videoUrlOriginal URL processed.
platformInferred platform (youtube, vimeo, tiktok, instagram, etc.).
title, channel, channelUrlVideo metadata.
durationSecondsLength in seconds.
viewCount, likeCount, uploadDate, thumbnail, descriptionEngagement + presentation metadata.
transcriptArray of {start, end, text} (json) or formatted string (srt/vtt/text/markdown).
transcriptFormatEchoes the input outputFormat.
transcriptSourcemanual (human captions), auto (auto-captions), translated, or whisper.
languageISO 639-1 code of the returned transcript.
isAutoGeneratedTrue if captions were auto-generated or Whisper-transcribed.
segmentCount, characterCountQuick sizing for downstream pipelines.
whisperMinutesChargedMinutes that contributed to the bill. 0 when captions were used or BYOK key supplied.
srtKey, vttKeyKeys in the key-value store for raw subtitle files.
errorPopulated when extraction failed (with a hint to fix).

Pricing — How much does it cost to extract video subtitles?

Pay-per-event, capped at three line items so cost is predictable:

EventPriceWhen charged
Actor start$0.001 / GB RAMOnce per run start.
Transcript extracted$0.005Per video where a transcript was returned (any source).
Whisper minute$0.02Per minute when platform Whisper is used. Not charged with BYOK key.

Typical run costs:

  • 1 YouTube video with captions: ~$0.006
  • 1 TikTok video without captions, 30s, platform Whisper: ~$0.026
  • 10 mixed YouTube/Vimeo videos, all with captions: ~$0.05
  • 100 podcast episodes, 30 min each, BYOK Whisper: $0.50 (you also pay OpenAI ~$18 directly)

Bring your own OpenAI key to drop platform Whisper charges to zero — you'll pay OpenAI directly at ~$0.006/min.

Tips & advanced options

  • Cap cost on long videos — keep maxDurationMinutes at the default 120, or lower it to 30 for tight budgets when running mixed-length playlists.
  • Reduce noise — set includeAutoCaptions: false to require human-reviewed captions only (lower coverage, higher quality).
  • TranslationtranslateTo: "es" works on any YouTube video with captions, native or auto. Whisper itself does not translate — set the OpenAI Whisper-1 model and use a separate translation step if your source is non-YouTube.
  • Memory — default is 2 GB. Bump to 4 GB if you're feeding very long videos through Whisper in parallel.
  • Datacenter proxy — switch from residential to datacenter on cheap, uncontested sites (TED, Vimeo public) to drop proxy costs.
  • MCP usage — call this Actor as apify--video-subtitle-extractor in Apify MCP. Works out of the box with Claude Desktop, Cursor, and any Streamable HTTP MCP client.

FAQ, disclaimers, and support

Which platforms are supported? Anything yt-dlp supports — see the yt-dlp supported sites list. 1000+ sites including all major social and video platforms.

What's the difference between this and the YouTube-only transcript actors? Multi-platform support, smart caption-first pricing, MCP-ready descriptions, 5 output formats, and Whisper fallback all in one place. Single-platform actors (khadinakbar/youtube-transcript-extractor for YouTube only) remain cheaper if you exclusively need one platform.

What happens if a video is private or geo-locked? The Actor returns an error field with a message explaining the issue. Try a different proxyConfiguration.countryCode for geo-locked content.

Can I scrape playlists or channels? Not directly in v0.1 — pass individual video URLs. Pair with khadinakbar/youtube-shorts-scraper or similar for URL discovery, then feed URLs here.

Will Whisper work on every site? Yes — if yt-dlp can reach the audio, Whisper transcribes it. The 25 MB Whisper file size cap means very long high-bitrate sources may be skipped. Reduce duration or accept lossier audio formats.

Legal disclaimer: This Actor extracts publicly accessible subtitle data. It is your responsibility to comply with each platform's Terms of Service and your local copyright laws. Do not extract content you do not have rights to use. The Actor does not bypass paywalls, age gates, or login walls.

Found a bug or need a feature? Open an issue on the Actor's Issues tab — bugs are usually fixed within a few days. Custom builds (private platforms, bulk channel ingestion, real-time webhooks) available on request.