Twitch VOD Transcript API – AI Video to Text
Pricing
from $3.80 / 1,000 transcription minutes
Twitch VOD Transcript API – AI Video to Text
Get transcripts and subtitles (SRT) from Twitch VODs and clips with AI speech-to-text. Timestamped segments + game and streamer metadata. No API key needed.
Pricing
from $3.80 / 1,000 transcription minutes
Rating
0.0
(0)
Developer
Tony
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Turn any Twitch VOD or clip into text: full AI transcripts, timestamped segments, and SRT subtitles via Whisper large-v3. Built for content repurposers, SEO agencies, AI/RAG pipelines, and researchers who need spoken-word text from Twitch streams.
Paste one or more VOD/clip URLs, get back timestamped transcripts with game and streamer metadata — no Twitch API key, no audio handling, no GPU.
Why a dedicated Twitch transcript actor?
Generic "1000+ platform" video transcribers treat Twitch as an afterthought — and charge $1.25–$4.50 per video. This actor is built only for Twitch, which buys you things generalists can't do:
- Twitch-native metadata on every record: game/category, streamer, stream date, thumbnail
- VOD-absolute timestamps — segments align to the VOD timeline, so they work directly in clip tools and
?t=deep links - Section windows (
startMinute/endMinute) — transcribe the 13 minutes you need from an 8-hour stream and pay cents, not a flat per-video fee - Per-minute pricing — a 10-minute clip costs ~$0.05 and a 3-hour VOD ~$0.73, vs. $1.25–$7.50 flat on multi-platform actors
What it does / doesn't do
- ✅ Public VODs and clips, batches of URLs in one run
- ✅ Transcribe just a section (
startMinute/endMinute) — pay only for the window - ✅ 90+ languages, auto-detected
- ⛔ Sub-only VODs (requires auth we don't support)
- ⛔ Live streams (VODs/clips only)
- ⛔ Speaker labels — Whisper doesn't diarize; multi-speaker streams come back as one voice
Memory: run this actor with at least 2 GB (2048 MB) of memory — the default. Audio extraction uses ffmpeg, which is killed by the OS at lower memory and fails the run. The run form already defaults to 2 GB; don't lower it.
How to get a Twitch VOD transcript
{"videoUrls": ["https://www.twitch.tv/videos/2788523979","https://www.twitch.tv/tarik/clip/DeafObliviousGazelle..."],"startMinute": 0,"maxDurationMinutes": 240,"language": "auto"}
| Field | Required | Default | Notes |
|---|---|---|---|
videoUrls | yes | — | One or more VOD (/videos/<id>) or clip URLs. One record each; failures are skipped without stopping the batch |
startMinute | no | 0 | Begin transcription here. Timestamps stay absolute to the VOD |
endMinute | no | end of video | Stop here. Combine with startMinute to transcribe only the section you need |
maxDurationMinutes | no | 240 | Safety cap per video. If a video exceeds it, you get a partial transcript marked truncated: true and a log warning. Raise to 600 for very long streams |
language | no | auto | ISO 639-1 hint (en, es, tr, ...). Leave on auto unless certain |
proxyConfiguration | no | datacenter | Apify Proxy settings |
Twitch video to text — the output
One dataset record per video:
{"id": "twitch_vod_2788523979","source_url": "https://www.twitch.tv/videos/2788523979","platform": "twitch","type": "vod","title": "Casey Muratori Teaches Me Game Programming","streamer": "ThePrimeagen","game": "Software and Game Development","published_at": "2026-06-04T17:01:12+00:00","thumbnail": "https://static-cdn.jtvnw.net/cf_vods/...","duration_seconds": 8390,"window_start_seconds": 1800,"window_end_seconds": 1920,"transcribed_seconds": 120.0,"truncated": false,"language": "en","transcript_text": "Full plain text transcript...","segments": [{ "start": 1800.0, "end": 1804.2, "text": "And so then I had an AI summarize what the Lua code did." }],"scraped_at": "2026-06-07T12:00:00Z"}
segments timestamps are absolute to the VOD timeline (not the window), so they're valid for clipping and deep links (?t=1h30m0s). language is ISO 639-1. id is stable per video for deduplication across runs.
Pricing & billing transparency
Pay-per-event:
| Event | Price | Covers |
|---|---|---|
result | $0.005 | Per successfully processed video — metadata + audio extraction |
transcription_minute | $0.004/min ($0.005 on the free plan) | AI speech-to-text per transcribed minute (sub-minute rounds up) |
Examples (paid plans): a 10-minute clip costs $0.045; a 3-hour VOD costs $0.725; a 13-minute window of that VOD costs $0.057.
If a video fails, you are charged nothing for it. Charges fire only after a transcript record is pushed to the dataset — you pay for minutes actually transcribed, never for failed extraction.
How long does a run take?
Transcription runs ~60x real-time. Rule of thumb end to end: a 10-minute clip ≈ 30s; a 3-hour VOD ≈ 3–5 minutes.
Twitch subtitles: convert segments to SRT captions
def to_srt(segments):def ts(s):h, rem = divmod(s, 3600); m, sec = divmod(rem, 60)return f"{int(h):02}:{int(m):02}:{int(sec):02},{int(sec % 1 * 1000):03}"return "\n".join(f"{i}\n{ts(seg['start'])} --> {ts(seg['end'])}\n{seg['text']}\n"for i, seg in enumerate(segments, 1))
(Subtract window_start_seconds from each timestamp first if you want subtitles relative to your extracted section.)
FAQ
How do I download a transcript of a Twitch VOD? Paste the VOD URL (https://www.twitch.tv/videos/<id>) into videoUrls and run the actor. The transcript lands in the dataset as plain text plus timestamped segments, exportable as JSON or CSV — or convert to SRT subtitles with the recipe above.
Can I transcribe a Twitch clip to text? Yes — clip URLs work the same way and cost ~$0.05 for a typical clip.
Does it work on live streams? No — VODs and clips only.
Does it work on sub-only VODs? No, public VODs only.
How accurate is it? Whisper large-v3; strong on commentary, may degrade with loud in-game music. Timestamped segments make spot-checking easy.
What languages? 90+, auto-detected. Set the language hint only when you're sure.
What happens to videos that fail in a batch? They're skipped, logged, and not charged; the rest of the batch continues. The run only fails if every video fails.