Pricing

from $100.00 / 1,000 results

Speech-to-Text Converter

Introducing the Speech-to-Text Converter — Apify Actor! Transform your audio into text effortlessly with our powerful, serverless multi-engine transcription solution on Apify. Experience seamless and accurate transcription like never before!

Pricing

from $100.00 / 1,000 results

Rating

0.0

(0)

Developer

Jamshaid Arif

Actor stats

Bookmarked

Total users

Monthly active users

10 days ago

Last modified

🎙️ Speech-to-Text Converter — Apify Actor

Serverless multi-engine speech-to-text transcription on Apify.

🛠️ Engines

Engine	Cost	Internet	Best For
Whisper Local	Free	No	Long files, best accuracy, 45+ languages
Google Speech	Free	Yes	Quick short clips, real-time results
Whisper API	Paid	Yes	Fast cloud processing, large files

📁 Supported Formats

Audio: WAV, MP3, FLAC, OGG, M4A, AAC, WMA, OPUS
Video: MP4, MKV, AVI, MOV, WEBM, FLV (audio auto-extracted)
Output: TXT, SRT subtitles, WebVTT, JSON

⚡ Quick Start

Via Apify Console

Select an Engine (Whisper Local recommended)
Paste the Input File URL (direct download link)
Choose Language and Output Format
Click Start

Via API

curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/runs?token=<TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
    "engine": "whisper_local",
    "input_file_url": "https://example.com/audio.mp3",
    "language": "en",
    "whisper_model": "small",
    "output_format": "srt"
  }'

📥 Input

Parameter	Type	Default	Description
`engine`	string	`whisper_local`	`whisper_local`, `google`, or `whisper_api`
`input_file_url`	string	—	Direct URL to media file
`language`	string	`en`	Language code or empty for auto-detect
`whisper_model`	string	`small`	`tiny`, `base`, `small`, `medium`, `large`
`openai_api_key`	string	—	OpenAI key (whisper_api only)
`output_format`	string	`txt`	`txt`, `srt`, `vtt`, `json`
`max_file_size_mb`	int	`500`	Max file size limit
`google_chunk_seconds`	int	`55`	Chunk size for Google engine

📤 Output

Dataset contains metadata:

{
    "status": "success",
    "engine": "whisper_local (small)",
    "language_detected": "en",
    "input_file": "interview.mp3",
    "output_url": "https://api.apify.com/.../interview_transcription.srt",
    "duration_audio_sec": 342.5,
    "processing_time_sec": 28.3,
    "character_count": 4521,
    "segment_count": 87,
    "text_preview": "First 500 characters of the transcription...",
    "message": "Transcribed with Whisper 'small' on CPU."
}

Key-Value Store contains:

interview_transcription.txt/srt/vtt/json — the output file (downloadable via output_url)
FULL_RESULT — complete JSON with full text + all segments

🌍 Languages (45+)

English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, Turkish, Urdu, Dutch, Polish, Swedish, Danish, Finnish, Norwegian, Greek, Czech, Romanian, Hungarian, Thai, Vietnamese, Indonesian, Malay, Ukrainian, Bulgarian, Croatian, Slovak, Slovenian, Serbian, Hebrew, Bengali, Tamil, Telugu, Malayalam, Kannada, Marathi, Gujarati, Punjabi, Swahili, Afrikaans, Filipino.

Speech to Text Converter (Transcript / Captcha)

saswave/speech-to-text-converter

Transform audio records to text. Get transcription from sales or customer success teams audio files. Get Captcha text from captcha audio challenge. Speech to text converter helps you analyse, build KPI with audio records and bypass captcha.

SASWAVE

Google Free Text to Speech

jupri/google-speech

Use free Google Text to Speech to translate text into voice

cat

302

Speech-to-Text Transcription

hgservices/speech-to-text

Transcribe audio and video from YouTube, TikTok, podcasts, X, and 1,000+ other sites or any direct media URL into accurate, speaker-labeled text. Uses World's best speech to text AI models with automatic language detection, multilingual support, and smart formatting.

Harish Garg

213

5.0

Hugging Face Audio AI

alizarin_refrigerator-owner/hugging-face-audio-ai

Audio w/Hugging Face models speech recognition, text-to-speech & audio analysis Speech-to-Text: Transcribe audio Text-to-Speech: Generate natural speech Audio Classification: Classify sounds Voice Activity Detection: Detect speech Speaker Diarization: Identify speakers Music Generation: Create music

The Howlers

Text to Speech Generator

moving_beacon-owner1/my-actor-30

Convert text into natural-sounding speech in multiple languages with ease.

Jamshaid Arif

Audio Converter API

vivid_astronaut/audio-converter

Fabio Suizu

Speech AI MCP Server

vivid_astronaut/pronunciation-assessment-mcp

Speech AI MCP server with 9 tools: pronunciation scoring (0-100 at phoneme/word/sentence level), speech-to-text with timestamps, text-to-speech with 12 English voices, and multilingual Whisper transcription (99 languages + speaker diarization). Sub-300ms latency. Pay-per-use: $0.02/call.

Fabio Suizu

Text to speech generator

akash9078/advanced-text-to-speech

Professional-grade Text-to-Speech (TTS) actor powered by advanced AI models. Convert any text into natural, human-like speech with 50+ premium voices across 9 languages. Perfect for content creation, accessibility, voiceovers, audiobooks, podcasts, and multilingual applications.

Akash Kumar Naik

Text to Speech

theapicompany/text-to-speech

Transfers your Text input into a MP3 file.This is the Text to Speech API; The Input: { "text": "Your text that will be an audio" } The Output: To get the Output, which is a MP3 Data file, you have to go to Storage, in there you need to click on Key-Value-Storage and Download the file.

Jonah

5.0

Video to Text Transcription

aizen0/video-to-text-transcription

Convert video speech to text in bulk. Supports Only Twitter/Instagram, auto-detects languages, handles large files automatically. Uses OpenAI Whisper for high accuracy.