Speech-to-Text Converter avatar

Speech-to-Text Converter

Pricing

from $10.00 / 1,000 results

Go to Apify Store
Speech-to-Text Converter

Speech-to-Text Converter

Introducing the Speech-to-Text Converter — Apify Actor! Transform your audio into text effortlessly with our powerful, serverless multi-engine transcription solution on Apify. Experience seamless and accurate transcription like never before!

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

Jamshaid Arif

Jamshaid Arif

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

3 days ago

Last modified

Share

🎙️ Speech-to-Text Converter — Apify Actor

Serverless multi-engine speech-to-text transcription on Apify.

🛠️ Engines

EngineCostInternetBest For
Whisper LocalFreeNoLong files, best accuracy, 45+ languages
Google SpeechFreeYesQuick short clips, real-time results
Whisper APIPaidYesFast cloud processing, large files

📁 Supported Formats

Audio: WAV, MP3, FLAC, OGG, M4A, AAC, WMA, OPUS
Video: MP4, MKV, AVI, MOV, WEBM, FLV (audio auto-extracted)
Output: TXT, SRT subtitles, WebVTT, JSON

⚡ Quick Start

Via Apify Console

  1. Select an Engine (Whisper Local recommended)
  2. Paste the Input File URL (direct download link)
  3. Choose Language and Output Format
  4. Click Start

Via API

curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/runs?token=<TOKEN>" \
-H "Content-Type: application/json" \
-d '{
"engine": "whisper_local",
"input_file_url": "https://example.com/audio.mp3",
"language": "en",
"whisper_model": "small",
"output_format": "srt"
}'

📥 Input

ParameterTypeDefaultDescription
enginestringwhisper_localwhisper_local, google, or whisper_api
input_file_urlstringDirect URL to media file
languagestringenLanguage code or empty for auto-detect
whisper_modelstringsmalltiny, base, small, medium, large
openai_api_keystringOpenAI key (whisper_api only)
output_formatstringtxttxt, srt, vtt, json
max_file_size_mbint500Max file size limit
google_chunk_secondsint55Chunk size for Google engine

📤 Output

Dataset contains metadata:

{
"status": "success",
"engine": "whisper_local (small)",
"language_detected": "en",
"input_file": "interview.mp3",
"output_url": "https://api.apify.com/.../interview_transcription.srt",
"duration_audio_sec": 342.5,
"processing_time_sec": 28.3,
"character_count": 4521,
"segment_count": 87,
"text_preview": "First 500 characters of the transcription...",
"message": "Transcribed with Whisper 'small' on CPU."
}

Key-Value Store contains:

  • interview_transcription.txt/srt/vtt/json — the output file (downloadable via output_url)
  • FULL_RESULT — complete JSON with full text + all segments

🌍 Languages (45+)

English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, Turkish, Urdu, Dutch, Polish, Swedish, Danish, Finnish, Norwegian, Greek, Czech, Romanian, Hungarian, Thai, Vietnamese, Indonesian, Malay, Ukrainian, Bulgarian, Croatian, Slovak, Slovenian, Serbian, Hebrew, Bengali, Tamil, Telugu, Malayalam, Kannada, Marathi, Gujarati, Punjabi, Swahili, Afrikaans, Filipino.