Pricing

from $5.00 / 1,000 transcript extracteds

Video Subtitle & Caption Extractor

Extract subtitles, captions, and AI transcripts from any video URL across 1000+ platforms (YouTube, Vimeo, TikTok, Instagram, X/Twitter, Facebook, Twitch, TED, Bilibili). Native captions first, Whisper AI fallback when none. JSON, SRT, VTT, text, or LLM-ready markdown. MCP/API-ready.

Pricing

from $5.00 / 1,000 transcript extracteds

Rating

0.0

(0)

Developer

Khadin Akbar

Actor stats

Bookmarked

Total users

Monthly active users

9 hours ago

Last modified

Video Subtitle & Caption Extractor — 1000+ Sites + Whisper AI

Extract subtitles, captions, and AI transcripts from public video URLs across 1000+ platforms, including YouTube, Vimeo, TikTok, Instagram, X, Facebook, Twitch, TED, Bilibili, SoundCloud, and Reddit. Each accepted URL produces one dataset record with the original URL, inferred platform, transcript content, and useful metadata such as title, channel, duration, language, transcript source, segment count, character count, and subtitle file keys when available. This Actor is usable through Apify and Apify MCP, and it is designed as a focused standalone workflow for one-record-per-video processing.

Best fit and connected workflows

This Actor fits workflows that start with a known public video URL and need structured transcript output. It routes naturally into:

transcript indexing for search, embeddings, and knowledge bases
content repurposing from talks, interviews, webinars, and podcasts
accessibility file generation with SRT or VTT
multilingual caption review for YouTube videos with captions
AI agents that need one transcript record per input URL through Apify MCP
metadata plus transcript extraction in one dataset row

Focused standalone workflow

This Actor is designed as a focused standalone workflow.

Practical scenario

Maya, a content analyst, has three public video URLs from YouTube and Vimeo. She wants the title, channel, duration, transcript source, language, and transcript text for each one. She runs this Actor with outputFormat: "json" and includeMetadata: true. In the dataset, she reviews the transcript segments, sees whether each record came from manual captions, auto captions, translated captions, or Whisper fallback, and then decides which videos to turn into an internal summary and which ones to archive with subtitle files.

Input

Field	Type	Description
`videoUrls` (required)	array of objects	Public video URLs to extract subtitles from. Each item is one video URL. For playlists or channels, expand to individual video URLs first.
`preferredLanguages`	array of strings	ISO 639-1 codes in priority order. The first available match is returned. Use `auto` to accept any language.
`outputFormat`	string	Transcript format in each result: `json`, `srt`, `vtt`, `text`, or `markdown`.
`useWhisperFallback`	boolean	When native captions are unavailable, transcribe audio with Whisper AI.
`whisperModel`	string	Whisper model used for fallback transcription.
`openaiApiKey`	string (secret)	Optional BYOK OpenAI API key for Whisper fallback billing through your OpenAI account.
`translateTo`	string	Optional ISO 639-1 code for YouTube caption translation.
`includeAutoCaptions`	boolean	Accept YouTube auto-generated captions.
`maxDurationMinutes`	integer	Set the maximum video length sent to Whisper. Native captions from longer videos are still extracted.
`includeMetadata`	boolean	Include title, channel, duration, view count, upload date, and thumbnail in each result.
`proxyConfiguration`	object	Optional proxy settings.

Focused JSON input example

{
  "videoUrls": [
    {
      "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    }
  ],
  "preferredLanguages": [
    "en",
    "es"
  ],
  "outputFormat": "json",
  "useWhisperFallback": true,
  "whisperModel": "whisper-1",
  "openaiApiKey": "",
  "translateTo": "",
  "includeAutoCaptions": true,
  "maxDurationMinutes": 120,
  "includeMetadata": true,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}

Output

Each successfully extracted video produces one validated dataset record. Raw SRT and VTT files are also saved to the default key-value store. The output schema also exposes terminal run records for summary and downstream automation.

Field	Type	Description
`videoUrl`	string	Original URL processed.
`platform`	string	Platform inferred from the URL.
`title`	string or null	Video title from platform metadata.
`channel`	string or null	Uploader or channel name.
`channelUrl`	string or null	URL of the uploader or channel page.
`durationSeconds`	number or null	Video duration in seconds.
`viewCount`	number or null	View count at extraction time.
`likeCount`	number or null	Like count at extraction time.
`uploadDate`	string or null	Upload date in `YYYYMMDD` format.
`thumbnail`	string or null	Thumbnail image URL.
`description`	string or null	Video description, truncated to 1000 characters.
`transcript`	array or string	Transcript in the requested format. JSON returns `{start, end, text}` segments.
`transcriptFormat`	string or null	Format used for the transcript field.
`transcriptSource`	string or null	Source of the transcript: `manual`, `auto`, `translated`, or `whisper`.
`language`	string or null	ISO 639-1 code of the returned transcript.
`isAutoGenerated`	boolean or null	Indicates whether captions were auto-generated or Whisper-transcribed.
`segmentCount`	number or null	Number of timestamped segments in the transcript.
`characterCount`	number or null	Total transcript character count.
`whisperMinutesCharged`	number or null	Whisper minutes billed for the record.
`srtKey`	string or null	Key for the saved SRT file in the default key-value store.
`vttKey`	string or null	Key for the saved VTT file in the default key-value store.
`error`	string or null	Issue detail recorded for a specific URL.

Illustrative output record

{
  "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "platform": "youtube",
  "title": "Rick Astley - Never Gonna Give You Up",
  "channel": "Rick Astley",
  "channelUrl": "https://www.youtube.com/@RickAstleyYT",
  "durationSeconds": 213,
  "viewCount": 1700000000,
  "likeCount": 18000000,
  "uploadDate": "20091025",
  "thumbnail": "https://i.ytimg.com/vi/dQw4w9WgXcQ/maxresdefault.jpg",
  "transcript": [
    {
      "start": 18.96,
      "end": 22.96,
      "text": "We're no strangers to love"
    },
    {
      "start": 22.96,
      "end": 26.96,
      "text": "You know the rules and so do I"
    }
  ],
  "transcriptFormat": "json",
  "transcriptSource": "manual",
  "language": "en",
  "isAutoGenerated": false,
  "segmentCount": 87,
  "characterCount": 2104,
  "whisperMinutesCharged": 0,
  "srtKey": "youtube-dQw4w9WgXcQ.srt",
  "vttKey": "youtube-dQw4w9WgXcQ.vtt"
}

How it works

The Actor processes supplied video URLs with yt-dlp-supported site coverage and prefers native captions first. When multiple preferred languages are listed, it uses the first available match from the ordered list. When captions are available, it returns a validated transcript record and saves raw SRT and VTT files to the default key-value store—even when the video is longer than maxDurationMinutes. That cap applies only when no native captions are usable and Whisper fallback would incur per-minute transcription cost. The run also persists terminal outcome records such as COMPLETE, PARTIAL, CONFIG_ERROR, or UPSTREAM_FAILED, and stores URL diagnostics in FAILURES rather than the result dataset.

Pricing

This Actor uses Apify Pay per event plus Apify platform usage. Open the live Pricing tab on the Actor page for the current charges and platform usage details.

Example in words: if one execution extracts transcripts for ten videos and each video returns one transcript, the run charges ten Transcript extracted events, plus one Actor Start event, plus any Whisper minutes used by videos that needed fallback transcription.

If you supply your own OpenAI API key, Whisper billing follows the BYOK path described in the live configuration, while the Actor still charges per returned transcript event.

Use with AI agents (MCP)

This Actor is available as an Apify Actor usable through Apify MCP. The exact Actor identity is khadinakbar/video-subtitle-extractor.

Tool description: extract subtitles, captions, and AI transcripts from a supplied public video URL, then return one structured dataset record per video with transcript text, metadata, transcript source, and subtitle file references when available.

Extract transcripts from these public video URLs and return one dataset row per URL. Prefer native captions first, then Whisper fallback when needed. Include transcript source, language, and subtitle file keys in the output.

Output interpretation:

transcript is the primary content field.
transcriptSource shows whether the record came from manual captions, auto captions, translated captions, or Whisper.
segmentCount and characterCount help estimate transcript size.
srtKey and vttKey point to subtitle files in the default key-value store.
error is available when a URL produces a record with an issue detail.

Provenance and scope:

The dataset contains one record per processed video URL.
The transcript record keeps the original videoUrl and inferred platform.
Raw subtitle files are stored separately in the key-value store.
The output schema also exposes run summary records for terminal outcome handling.

Pagination and cost guidance:

For multiple videos, pass each video as its own URL.
The primary event is Transcript extracted.
Whisper-minute billing applies only when platform-funded Whisper transcription is used.
Check the live Pricing tab for the current Pay per event and platform usage details before large runs.

JavaScript API example

const { ApifyClient } = require("apify-client");

const client = new ApifyClient({
  token: process.env.APIFY_TOKEN,
});

(async () => {
  const run = await client.actor("khadinakbar/video-subtitle-extractor").call({
    videoUrls: [
      { url: "https://www.youtube.com/watch?v=dQw4w9WgXcQ" }
    ],
    preferredLanguages: ["en"],
    outputFormat: "json",
    useWhisperFallback: true,
    includeMetadata: true
  });

  const dataset = client.dataset(run.defaultDatasetId);
  const { items } = await dataset.listItems();

  console.log("First record:", items[0]);
})();

Best results and outcome guidance

Use one public URL per item. Put preferred caption languages in priority order so the Actor can return the first matching transcript. Keep outputFormat: "json" when you need segment-level timestamps for AI processing, and switch to srt or vtt when you need subtitle files for editing tools. Leave includeMetadata on when you want title and channel context alongside the transcript. Use useWhisperFallback: true when you want coverage for videos without native captions, and set maxDurationMinutes to control Whisper cost for longer videos.

Design note

I found that the live output contract centers on a single transcript record per video URL, with videoUrl and platform required in every dataset item. That makes the dataset shape straightforward for batch processing and agent workflows.

FAQ

Can I pass a playlist or channel URL?

This Actor is designed around individual public video URLs. For playlist or channel workflows, expand the collection into per-video URLs first, then send those URLs here.

Which transcript sources can appear in the dataset?

The dataset contract includes manual, auto, translated, and whisper as transcript sources.

What file formats are saved outside the dataset?

Raw SRT and VTT files are saved to the default key-value store, and the output schema exposes links for subtitle files and run summary records.

Can I use it with Apify MCP?

Yes. This Actor is built as an Apify Actor usable through Apify MCP, and the exact Actor identity is khadinakbar/video-subtitle-extractor.

How does it handle multiple caption languages?

Use preferredLanguages as a priority list. The Actor falls through the ordered list and returns the first available match.

Responsible use

Use this Actor only with public video URLs and in ways that respect the source platform's terms, applicable rights, and local law. Review the transcript source and metadata before reusing content, especially when captions are auto-generated or when Whisper fallback is used.

Video Transcript Scraper — YouTube, Vimeo, TED & 1000+ Sites

rikitrader/video-transcript-scraper

Extract video transcripts + metadata from YouTube, Vimeo, TED & 1000+ sites that publish captions, in any language. JSON + LLM-ready Markdown + RAG chunks. No API key. Pay per result. (TikTok/X limited.)

[R] Kuantum

YouTube Transcripts Subtitles Captions Extractor. ⚡

lume/yt-transcripts

YouTube transcript extractor, subtitle downloader, captions scraper, and video transcript crawler. Extract, download, and save YouTube video transcripts, subtitles, and captions for one or many Youtube Videos.

Lume

338

5.0

TikTok Transcript Scraper - JSON, SRT, VTT

jamhimself/tiktok-transcript-scraper

Extracts TikTok video transcripts from native captions (no AI transcription). Input: video URLs or IDs. Output: timestamped JSON segments, plain text, SRT, VTT, or RAG chunks + metadata. $0.003 per video with a transcript; no-caption videos free.

Jaime Martinez

YouTube Shorts/Videos - SRT/VTT/Whisper AI Fallback/Translate

memo23/youtube-video-details-scraper

💰$12 per month only. Extract YouTube transcripts as SRT, VTT, JSON or plain text from any video, Short or embed. 4-source extraction (captions → DownSub → yt-dlp → Whisper AI for caption-less videos), auto-translate to 100+ languages, plus full metadata, engagement stats & optional MP4 download.

Muhamed Didovic

206

5.0

YouTube Transcript API

api_merge/youtube-transcript-api

Get YouTube transcripts, captions, and subtitles from any video URL or video ID. Export results as JSON, text, SRT, or WebVTT for automation and analysis.

Api Merge

YouTube Subtitle Extractor

entertained_rattlesnake/youtube-subtitle-extractor

Extract subtitles and transcripts from YouTube videos and export them as JSON, TXT, SRT and VTT.

Entertained Rattlesnake

Video Captions Downloader

agentx/video-captions-downloader

Subtitle / caption API for any public video URL across 1000+ platforms. Pulls native captions when available, falls back to ASR otherwise, and normalizes VTT, ASS, TTML, and SBV into clean SRT plus a timestamped subtitle map for localization, translation memory, or transcript indexing.

AgentX

5.0

Twitter / X Video Transcript Scraper

crawlerbros/twitter-transcript-scraper

Extract transcripts from Twitter/X video posts. Returns timestamped segments using native Twitter captions (WebVTT) with automatic Whisper AI fallback for uncaptioned videos

Crawler Bros

TikTok Subtitles Extractor - Download Captions from Any Video

linen_snack/tiktok-subtitles-extractor---download-captions-from-any-video

Extract and download subtitles and captions from any public TikTok video. Converts TikTok captions to text or SRT format with support for multiple languages. Perfect for content creators, researchers, and accessibility needs. Simply paste a TikTok URL and get accurate transcripts in seconds.

ius iyb

315

📝 YouTube Transcript Scraper - Captions to Text

benthepythondev/youtube-transcript-scraper

Extract transcripts from any YouTube video with captions. Supports 100+ languages, auto-generated captions, and translation. Output as plain text, SRT, VTT, or JSON with timestamps. Includes video metadata (title, channel, views). Perfect for content repurposing and AI training.