Video & Audio Transcriber — Word-Level + SRT/VTT
Pricing
from $20.00 / 1,000 transcribed minutes
Video & Audio Transcriber — Word-Level + SRT/VTT
Transcribe any video or audio URL into accurate text with word-level and segment timestamps, plus ready-to-use SRT, VTT, and TXT files. Auto-detects language. For captions, subtitles, search & repurposing. Bring your own OpenAI API key.
Pricing
from $20.00 / 1,000 transcribed minutes
Rating
5.0
(1)
Developer
Dami's Studio
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
0
Monthly active users
7 hours ago
Last modified
Categories
Share
Video & Audio Transcriber
Give it a public video or audio URL and it returns accurate text with segment and word-level timestamps, plus ready-to-use SRT, VTT, and TXT files. It detects the spoken language automatically. Built for people who need captions, searchable transcripts, or source text to repurpose into clips, articles, or show notes.
How it works
The actor downloads your media, extracts the audio track with ffmpeg, and sends it to OpenAI's Whisper on your own API key. The timestamps and subtitle files come straight from the model's segment and word data, so timing lines up with the actual speech.
Input
| Field | Required | Notes |
|---|---|---|
mediaUrl | yes | Public URL to a video or audio file (mp4, mov, mp3, wav, m4a, webm, and similar). |
language | no | ISO code of the spoken language, or auto to detect it. Defaults to auto. |
wordTimestamps | no | Return per-word start/end times. Useful for karaoke-style captions. On by default. |
outputFormats | no | Which files to generate: any of srt, vtt, txt. Defaults to srt and vtt. |
openaiApiKey | yes | Your OpenAI (Whisper) key. Kept private and used only for this run. |
There are two advanced fields if you need them: model (defaults to whisper-1) and baseUrl for an OpenAI-compatible endpoint.
Output
One dataset record per run. It includes the detected language, the full text, segments with start/end times, and words when word timestamps are enabled, along with wordCount, segmentCount, and durationSeconds. Each requested subtitle file is saved to the key-value store and referenced by srtKey/srtUrl, vttKey/vttUrl, and txtKey/txtUrl.
Example
{"mediaUrl": "https://example.com/podcast.mp3","language": "auto","wordTimestamps": true,"outputFormats": ["srt", "vtt", "txt"],"openaiApiKey": "sk-..."}
Pricing
$0.04 per minute of audio, pay per result, no subscription. You bring your own OpenAI key, so Whisper usage is billed by OpenAI separately.
Notes
The mediaUrl has to be directly downloadable. Pages that require login or stream behind a player won't work, so point it at the raw file. Long files take longer and cost more since billing is per minute of audio.