Speech-to-Text Converter
Pricing
from $10.00 / 1,000 results
Speech-to-Text Converter
Introducing the Speech-to-Text Converter — Apify Actor! Transform your audio into text effortlessly with our powerful, serverless multi-engine transcription solution on Apify. Experience seamless and accurate transcription like never before!
Pricing
from $10.00 / 1,000 results
Rating
0.0
(0)
Developer
Jamshaid Arif
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
3 days ago
Last modified
Categories
Share
🎙️ Speech-to-Text Converter — Apify Actor
Serverless multi-engine speech-to-text transcription on Apify.
🛠️ Engines
| Engine | Cost | Internet | Best For |
|---|---|---|---|
| Whisper Local | Free | No | Long files, best accuracy, 45+ languages |
| Google Speech | Free | Yes | Quick short clips, real-time results |
| Whisper API | Paid | Yes | Fast cloud processing, large files |
📁 Supported Formats
Audio: WAV, MP3, FLAC, OGG, M4A, AAC, WMA, OPUS
Video: MP4, MKV, AVI, MOV, WEBM, FLV (audio auto-extracted)
Output: TXT, SRT subtitles, WebVTT, JSON
⚡ Quick Start
Via Apify Console
- Select an Engine (Whisper Local recommended)
- Paste the Input File URL (direct download link)
- Choose Language and Output Format
- Click Start
Via API
curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/runs?token=<TOKEN>" \-H "Content-Type: application/json" \-d '{"engine": "whisper_local","input_file_url": "https://example.com/audio.mp3","language": "en","whisper_model": "small","output_format": "srt"}'
📥 Input
| Parameter | Type | Default | Description |
|---|---|---|---|
engine | string | whisper_local | whisper_local, google, or whisper_api |
input_file_url | string | — | Direct URL to media file |
language | string | en | Language code or empty for auto-detect |
whisper_model | string | small | tiny, base, small, medium, large |
openai_api_key | string | — | OpenAI key (whisper_api only) |
output_format | string | txt | txt, srt, vtt, json |
max_file_size_mb | int | 500 | Max file size limit |
google_chunk_seconds | int | 55 | Chunk size for Google engine |
📤 Output
Dataset contains metadata:
{"status": "success","engine": "whisper_local (small)","language_detected": "en","input_file": "interview.mp3","output_url": "https://api.apify.com/.../interview_transcription.srt","duration_audio_sec": 342.5,"processing_time_sec": 28.3,"character_count": 4521,"segment_count": 87,"text_preview": "First 500 characters of the transcription...","message": "Transcribed with Whisper 'small' on CPU."}
Key-Value Store contains:
interview_transcription.txt/srt/vtt/json— the output file (downloadable viaoutput_url)FULL_RESULT— complete JSON with full text + all segments
🌍 Languages (45+)
English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, Turkish, Urdu, Dutch, Polish, Swedish, Danish, Finnish, Norwegian, Greek, Czech, Romanian, Hungarian, Thai, Vietnamese, Indonesian, Malay, Ukrainian, Bulgarian, Croatian, Slovak, Slovenian, Serbian, Hebrew, Bengali, Tamil, Telugu, Malayalam, Kannada, Marathi, Gujarati, Punjabi, Swahili, Afrikaans, Filipino.