Audio & Video to Text avatar
Audio & Video to Text

Pricing

Pay per event

Go to Store
Audio & Video to Text

Audio & Video to Text

Developed by

Donjuan

Donjuan

Maintained by Community

Transcribes video and audio files into plain text and subtitle formats (TXT, SRT, VTT, TSV, JSON) using OpenAI's Whisper model. Supports preloaded tiny, base, and small models.

0.0 (0)

Pricing

Pay per event

2

Total users

11

Monthly users

11

Runs succeeded

97%

Last modified

25 days ago


🎬 Video and Audio to Text Transcription

🧠 Overview

This script is designed for the Apify platform and uses OpenAI Whisper to transcribe audio or video (e.g., from YouTube or MP4 files) into text and other formats (SRT, VTT, etc.).


📥 Input

Parameters

  • model: (string) — Whisper model to use. Available options:
    • tiny(pre-installed)
    • base(pre-installed)
    • small(pre-installed)
    • medium (requires download)
    • large (requires download)
    • turbo (requires download)

Note: Models tiny, base, and small are already downloaded in the Docker image for faster and offline-ready processing.

  • source_url: (string) — Direct URL to the video/audio file (e.g., an MP4 file hosted online).
    ⚠️ YouTube links are not supported directly. You must download the video first.

Example Input

{
"model": "tiny",
"source_url": "https://raw.githubusercontent.com/donjuanMime/audio_to_text/main/video.mp4"
}

📤 Output

The output is a JSON array with one object, which includes multiple transcription formats:

  • json: Full Whisper output with segments, tokens, and metadata.
  • srt: SubRip subtitle format.
  • tsv: Tab-separated values (start, end, text).
  • txt: Plain text transcription.
  • vtt: WebVTT subtitle format.

Example Output (excerpt)

[
{
"json": "{ ... Whisper segment data ... }",
"srt": "1\n00:00:00,000 --> 00:00:01,120\nWhat's your favorite drink?\n...",
"tsv": "start\tend\ttext\n0\t1120\tWhat's your favorite drink?\n...",
"txt": "What's your favorite drink?\nMy favorite drink is apple juice...\n",
"vtt": "WEBVTT\n\n00:00.000 --> 00:01.120\nWhat's your favorite drink?\n..."
}
]

🛠️ How to Use

  1. Go to your Apify dashboard and create a new actor or task.
  2. Paste this script into the actor’s source.
  3. Provide the input in the required JSON format (see above).
  4. Run the actor. It will download the media file, process it using Whisper, and return transcription in multiple formats.

⚠️ Disclaimer

This script is provided "as is", without warranties of any kind. Use it at your own risk. Ensure compliance with:

  • YouTube’s Terms of Service (if downloading/transcribing from YouTube).
  • Local and international copyright laws.

Let me know if you’d like the actual Apify actor code or instructions on downloading YouTube videos as .mp4 files to use with this.