# Video & Audio Transcriber — Word-Level + SRT/VTT (`dami_studio/video-audio-transcriber`) Actor

Transcribe any video or audio URL into accurate text with word-level and segment timestamps, plus ready-to-use SRT, VTT, and TXT files. Auto-detects language. For captions, subtitles, search & repurposing. Bring your own OpenAI API key.

- **URL**: https://apify.com/dami\_studio/video-audio-transcriber.md
- **Developed by:** [Dami's Studio](https://apify.com/dami_studio) (community)
- **Categories:** AI, Videos, Automation
- **Stats:** 3 total users, 0 monthly users, 28.6% runs succeeded, 0 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

from $20.00 / 1,000 transcribed minutes

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Video & Audio Transcriber

Give it a public video or audio URL and it returns accurate text with segment and word-level timestamps, plus ready-to-use SRT, VTT, and TXT files. It detects the spoken language automatically. Built for people who need captions, searchable transcripts, or source text to repurpose into clips, articles, or show notes.

### How it works

The actor downloads your media, extracts the audio track with ffmpeg, and sends it to OpenAI's Whisper on your own API key. The timestamps and subtitle files come straight from the model's segment and word data, so timing lines up with the actual speech.

### Input

| Field | Required | Notes |
|-------|----------|-------|
| `mediaUrl` | yes | Public URL to a video or audio file (mp4, mov, mp3, wav, m4a, webm, and similar). |
| `language` | no | ISO code of the spoken language, or `auto` to detect it. Defaults to `auto`. |
| `wordTimestamps` | no | Return per-word start/end times. Useful for karaoke-style captions. On by default. |
| `outputFormats` | no | Which files to generate: any of `srt`, `vtt`, `txt`. Defaults to `srt` and `vtt`. |
| `openaiApiKey` | yes | Your OpenAI (Whisper) key. Kept private and used only for this run. |

There are two advanced fields if you need them: `model` (defaults to `whisper-1`) and `baseUrl` for an OpenAI-compatible endpoint.

### Output

One dataset record per run. It includes the detected `language`, the full `text`, `segments` with start/end times, and `words` when word timestamps are enabled, along with `wordCount`, `segmentCount`, and `durationSeconds`. Each requested subtitle file is saved to the key-value store and referenced by `srtKey`/`srtUrl`, `vttKey`/`vttUrl`, and `txtKey`/`txtUrl`.

### Example

```json
{
  "mediaUrl": "https://example.com/podcast.mp3",
  "language": "auto",
  "wordTimestamps": true,
  "outputFormats": ["srt", "vtt", "txt"],
  "openaiApiKey": "sk-..."
}
````

### Pricing

$0.04 per minute of audio, pay per result, no subscription. You bring your own OpenAI key, so Whisper usage is billed by OpenAI separately.

### Notes

The `mediaUrl` has to be directly downloadable. Pages that require login or stream behind a player won't work, so point it at the raw file. Long files take longer and cost more since billing is per minute of audio.

# Actor input Schema

## `mediaUrl` (type: `string`):

Public URL to a video or audio file (mp4, mov, mp3, wav, m4a, webm). Use this for a single file, or mediaUrls for a batch.

## `mediaUrls` (type: `array`):

Transcribe several files in one run — one dataset row per URL. Each item is a public video/audio URL.

## `language` (type: `string`):

Spoken language ISO code, or 'auto' to detect.

## `wordTimestamps` (type: `boolean`):

Return per-word start/end times (great for karaoke captions).

## `outputFormats` (type: `array`):

Which subtitle/text files to also produce: srt, vtt, txt.

## `openaiApiKey` (type: `string`):

Your OpenAI (Whisper) key. Kept private.

## `model` (type: `string`):

Transcription model. Default whisper-1.

## `baseUrl` (type: `string`):

OpenAI-compatible base URL. Default https://api.openai.com/v1.

## Actor input object example

```json
{
  "mediaUrl": "https://example.com/podcast.mp3",
  "language": "auto",
  "wordTimestamps": true,
  "outputFormats": [
    "srt",
    "vtt",
    "txt"
  ],
  "model": "whisper-1"
}
```

# Actor output Schema

## `results` (type: `string`):

Result rows / metadata are stored in the default dataset (one row per item).

## `files` (type: `string`):

Generated media/files (video, audio, images, captions) are stored in the default key-value store.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "outputFormats": [
        "srt",
        "vtt",
        "txt"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("dami_studio/video-audio-transcriber").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "outputFormats": [
        "srt",
        "vtt",
        "txt",
    ] }

# Run the Actor and wait for it to finish
run = client.actor("dami_studio/video-audio-transcriber").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "outputFormats": [
    "srt",
    "vtt",
    "txt"
  ]
}' |
apify call dami_studio/video-audio-transcriber --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=dami_studio/video-audio-transcriber",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Video & Audio Transcriber — Word-Level + SRT/VTT",
        "description": "Transcribe any video or audio URL into accurate text with word-level and segment timestamps, plus ready-to-use SRT, VTT, and TXT files. Auto-detects language. For captions, subtitles, search & repurposing. Bring your own OpenAI API key.",
        "version": "0.1",
        "x-build-id": "baN1dIXZvznPQrd1B"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/dami_studio~video-audio-transcriber/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-dami_studio-video-audio-transcriber",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/dami_studio~video-audio-transcriber/runs": {
            "post": {
                "operationId": "runs-sync-dami_studio-video-audio-transcriber",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/dami_studio~video-audio-transcriber/run-sync": {
            "post": {
                "operationId": "run-sync-dami_studio-video-audio-transcriber",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "mediaUrl": {
                        "title": "Media URL",
                        "type": "string",
                        "description": "Public URL to a video or audio file (mp4, mov, mp3, wav, m4a, webm). Use this for a single file, or mediaUrls for a batch."
                    },
                    "mediaUrls": {
                        "title": "Media URLs (batch)",
                        "type": "array",
                        "description": "Transcribe several files in one run — one dataset row per URL. Each item is a public video/audio URL.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "language": {
                        "title": "Language",
                        "type": "string",
                        "description": "Spoken language ISO code, or 'auto' to detect.",
                        "default": "auto"
                    },
                    "wordTimestamps": {
                        "title": "Include word timestamps",
                        "type": "boolean",
                        "description": "Return per-word start/end times (great for karaoke captions).",
                        "default": true
                    },
                    "outputFormats": {
                        "title": "Output files",
                        "type": "array",
                        "description": "Which subtitle/text files to also produce: srt, vtt, txt.",
                        "default": [
                            "srt",
                            "vtt"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "openaiApiKey": {
                        "title": "OpenAI API key (BYO)",
                        "type": "string",
                        "description": "Your OpenAI (Whisper) key. Kept private."
                    },
                    "model": {
                        "title": "Model (advanced)",
                        "type": "string",
                        "description": "Transcription model. Default whisper-1.",
                        "default": "whisper-1"
                    },
                    "baseUrl": {
                        "title": "API base URL (advanced)",
                        "type": "string",
                        "description": "OpenAI-compatible base URL. Default https://api.openai.com/v1."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
