
Audio & Video to Text
Pricing
Pay per event

Audio & Video to Text
Transcribes video and audio files into plain text and subtitle formats (TXT, SRT, VTT, TSV, JSON) using OpenAI's Whisper model. Supports preloaded tiny, base, and small models.
0.0 (0)
Pricing
Pay per event
2
Total users
11
Monthly users
11
Runs succeeded
97%
Last modified
25 days ago
🎬 Video and Audio to Text Transcription
🧠 Overview
This script is designed for the Apify platform and uses OpenAI Whisper to transcribe audio or video (e.g., from YouTube or MP4 files) into text and other formats (SRT, VTT, etc.).
📥 Input
Parameters
- model: (string) — Whisper model to use. Available options:
tiny
✅ (pre-installed)base
✅ (pre-installed)small
✅ (pre-installed)medium
(requires download)large
(requires download)turbo
(requires download)
✅ Note: Models
tiny
,base
, andsmall
are already downloaded in the Docker image for faster and offline-ready processing.
- source_url: (string) — Direct URL to the video/audio file (e.g., an MP4 file hosted online).
⚠️ YouTube links are not supported directly. You must download the video first.
Example Input
{"model": "tiny","source_url": "https://raw.githubusercontent.com/donjuanMime/audio_to_text/main/video.mp4"}
📤 Output
The output is a JSON array with one object, which includes multiple transcription formats:
json
: Full Whisper output with segments, tokens, and metadata.srt
: SubRip subtitle format.tsv
: Tab-separated values (start, end, text).txt
: Plain text transcription.vtt
: WebVTT subtitle format.
Example Output (excerpt)
[{"json": "{ ... Whisper segment data ... }","srt": "1\n00:00:00,000 --> 00:00:01,120\nWhat's your favorite drink?\n...","tsv": "start\tend\ttext\n0\t1120\tWhat's your favorite drink?\n...","txt": "What's your favorite drink?\nMy favorite drink is apple juice...\n","vtt": "WEBVTT\n\n00:00.000 --> 00:01.120\nWhat's your favorite drink?\n..."}]
🛠️ How to Use
- Go to your Apify dashboard and create a new actor or task.
- Paste this script into the actor’s source.
- Provide the input in the required JSON format (see above).
- Run the actor. It will download the media file, process it using Whisper, and return transcription in multiple formats.
⚠️ Disclaimer
This script is provided "as is", without warranties of any kind. Use it at your own risk. Ensure compliance with:
- YouTube’s Terms of Service (if downloading/transcribing from YouTube).
- Local and international copyright laws.
Let me know if you’d like the actual Apify actor code or instructions on downloading YouTube videos as .mp4
files to use with this.