Twitch VOD Transcript API – AI Video to Text avatar

Twitch VOD Transcript API – AI Video to Text

Pricing

from $3.80 / 1,000 transcription minutes

Go to Apify Store
Twitch VOD Transcript API – AI Video to Text

Twitch VOD Transcript API – AI Video to Text

Get transcripts and subtitles (SRT) from Twitch VODs and clips with AI speech-to-text. Timestamped segments + game and streamer metadata. No API key needed.

Pricing

from $3.80 / 1,000 transcription minutes

Rating

0.0

(0)

Developer

Tony

Tony

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Turn any Twitch VOD or clip into text: full AI transcripts, timestamped segments, and SRT subtitles via Whisper large-v3. Built for content repurposers, SEO agencies, AI/RAG pipelines, and researchers who need spoken-word text from Twitch streams.

Paste one or more VOD/clip URLs, get back timestamped transcripts with game and streamer metadata — no Twitch API key, no audio handling, no GPU.

Why a dedicated Twitch transcript actor?

Generic "1000+ platform" video transcribers treat Twitch as an afterthought — and charge $1.25–$4.50 per video. This actor is built only for Twitch, which buys you things generalists can't do:

  • Twitch-native metadata on every record: game/category, streamer, stream date, thumbnail
  • VOD-absolute timestamps — segments align to the VOD timeline, so they work directly in clip tools and ?t= deep links
  • Section windows (startMinute/endMinute) — transcribe the 13 minutes you need from an 8-hour stream and pay cents, not a flat per-video fee
  • Per-minute pricing — a 10-minute clip costs ~$0.05 and a 3-hour VOD ~$0.73, vs. $1.25–$7.50 flat on multi-platform actors

What it does / doesn't do

  • ✅ Public VODs and clips, batches of URLs in one run
  • ✅ Transcribe just a section (startMinute/endMinute) — pay only for the window
  • ✅ 90+ languages, auto-detected
  • ⛔ Sub-only VODs (requires auth we don't support)
  • ⛔ Live streams (VODs/clips only)
  • ⛔ Speaker labels — Whisper doesn't diarize; multi-speaker streams come back as one voice

Memory: run this actor with at least 2 GB (2048 MB) of memory — the default. Audio extraction uses ffmpeg, which is killed by the OS at lower memory and fails the run. The run form already defaults to 2 GB; don't lower it.

How to get a Twitch VOD transcript

{
"videoUrls": [
"https://www.twitch.tv/videos/2788523979",
"https://www.twitch.tv/tarik/clip/DeafObliviousGazelle..."
],
"startMinute": 0,
"maxDurationMinutes": 240,
"language": "auto"
}
FieldRequiredDefaultNotes
videoUrlsyesOne or more VOD (/videos/<id>) or clip URLs. One record each; failures are skipped without stopping the batch
startMinuteno0Begin transcription here. Timestamps stay absolute to the VOD
endMinutenoend of videoStop here. Combine with startMinute to transcribe only the section you need
maxDurationMinutesno240Safety cap per video. If a video exceeds it, you get a partial transcript marked truncated: true and a log warning. Raise to 600 for very long streams
languagenoautoISO 639-1 hint (en, es, tr, ...). Leave on auto unless certain
proxyConfigurationnodatacenterApify Proxy settings

Twitch video to text — the output

One dataset record per video:

{
"id": "twitch_vod_2788523979",
"source_url": "https://www.twitch.tv/videos/2788523979",
"platform": "twitch",
"type": "vod",
"title": "Casey Muratori Teaches Me Game Programming",
"streamer": "ThePrimeagen",
"game": "Software and Game Development",
"published_at": "2026-06-04T17:01:12+00:00",
"thumbnail": "https://static-cdn.jtvnw.net/cf_vods/...",
"duration_seconds": 8390,
"window_start_seconds": 1800,
"window_end_seconds": 1920,
"transcribed_seconds": 120.0,
"truncated": false,
"language": "en",
"transcript_text": "Full plain text transcript...",
"segments": [
{ "start": 1800.0, "end": 1804.2, "text": "And so then I had an AI summarize what the Lua code did." }
],
"scraped_at": "2026-06-07T12:00:00Z"
}

segments timestamps are absolute to the VOD timeline (not the window), so they're valid for clipping and deep links (?t=1h30m0s). language is ISO 639-1. id is stable per video for deduplication across runs.

Pricing & billing transparency

Pay-per-event:

EventPriceCovers
result$0.005Per successfully processed video — metadata + audio extraction
transcription_minute$0.004/min ($0.005 on the free plan)AI speech-to-text per transcribed minute (sub-minute rounds up)

Examples (paid plans): a 10-minute clip costs $0.045; a 3-hour VOD costs $0.725; a 13-minute window of that VOD costs $0.057.

If a video fails, you are charged nothing for it. Charges fire only after a transcript record is pushed to the dataset — you pay for minutes actually transcribed, never for failed extraction.

How long does a run take?

Transcription runs ~60x real-time. Rule of thumb end to end: a 10-minute clip ≈ 30s; a 3-hour VOD ≈ 3–5 minutes.

Twitch subtitles: convert segments to SRT captions

def to_srt(segments):
def ts(s):
h, rem = divmod(s, 3600); m, sec = divmod(rem, 60)
return f"{int(h):02}:{int(m):02}:{int(sec):02},{int(sec % 1 * 1000):03}"
return "\n".join(
f"{i}\n{ts(seg['start'])} --> {ts(seg['end'])}\n{seg['text']}\n"
for i, seg in enumerate(segments, 1)
)

(Subtract window_start_seconds from each timestamp first if you want subtitles relative to your extracted section.)

FAQ

How do I download a transcript of a Twitch VOD? Paste the VOD URL (https://www.twitch.tv/videos/<id>) into videoUrls and run the actor. The transcript lands in the dataset as plain text plus timestamped segments, exportable as JSON or CSV — or convert to SRT subtitles with the recipe above.

Can I transcribe a Twitch clip to text? Yes — clip URLs work the same way and cost ~$0.05 for a typical clip.

Does it work on live streams? No — VODs and clips only.

Does it work on sub-only VODs? No, public VODs only.

How accurate is it? Whisper large-v3; strong on commentary, may degrade with loud in-game music. Timestamped segments make spot-checking easy.

What languages? 90+, auto-detected. Set the language hint only when you're sure.

What happens to videos that fail in a batch? They're skipped, logged, and not charged; the rest of the batch continues. The run only fails if every video fails.