Deprecated

Pricing

Pay per event

See alternative Actors

Go to Apify Store

Sonartext Speech To Text

Deprecated

See alternative Actors

SonarText Speech to Text Transcription Service

Pricing

Pay per event

Rating

0.0

(0)

Developer

Kyle

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

Sonartext STT Actor - Professional Speech-to-Text Transcription

Transform audio content into high-quality text transcriptions with enterprise-grade accuracy using state-of-the-art AI models. Perfect for content creators, researchers, businesses, and developers who need reliable speech-to-text conversion.

🚀 Key Features

🎯 State-of-the-Art AI: Powered by Whisper-v3-large-turbo for industry-leading accuracy
👥 Speaker Diarization: Automatically identify and separate different speakers
📁 Multiple Input Sources: Support for direct uploads, YouTube videos, Twitter audio, Google Drive, AWS S3, and URLs
⚡ Fast Processing: Optimized for speed without compromising quality
💰 Cost Control: Built-in cost estimation and limits to manage expenses
📊 Flexible Output: Multiple formats including JSON, plain text, SRT, and VTT
🔐 Enterprise Security: Secure API integration with proper error handling
📈 Usage Analytics: Detailed processing statistics and performance metrics

🎯 Use Cases

Content Creation & Media

Podcast Transcription: Convert episodes to searchable text for show notes and SEO
Video Subtitles: Generate accurate captions for YouTube, social media, and streaming
Interview Processing: Transcribe interviews with automatic speaker identification
Course Materials: Create accessible text versions of educational audio content

Business & Research

Meeting Transcription: Document important discussions with speaker attribution
Research Analysis: Convert audio interviews and focus groups to analyzable text
Call Center Analytics: Process customer service calls for quality assurance
Legal Documentation: Transcribe depositions and hearings with high accuracy

Accessibility & Compliance

ADA Compliance: Provide text alternatives for audio content
Multi-language Support: Transcribe content in multiple languages
Hearing Accessibility: Make audio content accessible to deaf and hard-of-hearing users

📥 Input Configuration

Simple Input Fields

No complex JSON required! Just fill out the form fields:

Input Method: Choose how to provide your audio (Upload File, URL, YouTube, etc.)
Audio File/URL: Provide the file or URL based on your chosen method
Language: Select language or leave blank for auto-detection
Timestamps: Choose timestamp detail level
Speaker Diarization: Enable to identify different speakers
Response Format: JSON, Plain Text, SRT, or VTT subtitles

Example Inputs

File Upload

Input Method: Upload File
Audio File: [Choose your file...]
Language: English
Response Format: JSON

YouTube Video

Input Method: YouTube Video
YouTube URL: https://youtube.com/watch?v=abc123
Language: Auto-detect
Timestamps: Segment-level
Response Format: SRT Subtitles

URL Processing

Input Method: Direct URL
File URL: https://example.com/audio.mp3
Speaker Diarization: Yes
Min Speakers: 2
Max Speakers: 4

Input Source Options

Input Method	Description	What You Provide
Upload File	Upload audio/video from your computer	Click "Choose file" and select your audio/video file
Direct URL	Audio/video file from any web URL	Enter the direct URL to the file
YouTube Video	Extract audio from YouTube video	YouTube video URL (e.g. youtube.com/watch?v=...)
Twitter/X Video	Extract audio from Twitter/X post	Twitter post URL with video
Google Drive	File shared via Google Drive	Public Google Drive file link
AWS S3	File stored in S3 bucket	S3 URL or presigned URL

Configuration Options

Setting	Options	Description
Language	Auto-detect, English, Spanish, French, etc.	Language of the audio content
Timestamps	None, Segment-level, Word-level, Both	Detail level of timestamps
Speaker Diarization	On/Off	Identify different speakers
Min/Max Speakers	1-20	Expected number of speakers (if diarization enabled)
Response Format	JSON, Plain Text, SRT, VTT	Output format for results
Cost Limit	1-10000 cents	Optional spending limit for safety

📤 Output Structure

JSON Format (Default)

{
  "success": true,
  "result": {
    "transcription": "Hello, welcome to our podcast. Today we're discussing AI innovations.",
    "speakers": [
      {
        "speaker": "Speaker 1",
        "start": 0.0,
        "end": 3.2,
        "text": "Hello, welcome to our podcast."
      },
      {
        "speaker": "Speaker 2", 
        "start": 3.5,
        "end": 7.8,
        "text": "Today we're discussing AI innovations."
      }
    ],
    "metadata": {
      "duration": 45.6,
      "wordCount": 124,
      "confidence": 0.94,
      "language": "en"
    }
  },
  "usage": {
    "costCents": 23,
    "processingTime": 12.5,
    "audioMinutes": 0.76
  }
}

SRT Format Example

1
00:00:00,000 --> 00:00:03,200
Hello, welcome to our podcast.

2
00:00:03,500 --> 00:00:07,800
Today we're discussing AI innovations.

Error Handling

{
  "success": false,
  "error": {
    "code": "AUDIO_TOO_LONG",
    "message": "Audio duration exceeds maximum limit",
    "details": {
      "maxDuration": 3600,
      "actualDuration": 4200
    }
  }
}

🚀 Getting Started

1. Basic Usage

1. Choose "Direct URL" as Input Method
2. Enter: https://example.com/meeting-recording.mp3
3. Enable Speaker Diarization: Yes
4. Click "Run"

2. File Upload Usage

1. Choose "Upload File" as Input Method  
2. Click "Choose file" and select your audio
3. Set Language to "Auto-detect"
4. Choose Response Format: "JSON"
5. Click "Run"

3. Run the Actor

Execute the actor and retrieve your transcription from the dataset.

💡 Pro Tips

Optimize Audio Quality

Clear Audio: Use high-quality recordings with minimal background noise
Proper Levels: Ensure consistent audio levels across speakers
Format: MP3, WAV, and FLAC formats work best

Speaker Diarization Best Practices

Enable for multi-speaker content (meetings, interviews, podcasts)
Works best with 2-10 distinct speakers
Clearer speaker separation improves accuracy

Cost Management

Set maxCostCents to control expenses
Use cost estimation feature before processing large files
Consider audio length: pricing is typically per minute

YouTube Processing

Supports both public and unlisted videos
Automatically extracts best quality audio
Handles various video lengths efficiently

💰 Pricing

This actor uses Sonartext's professional STT service with transparent, usage-based pricing:

Event Based Pricing: $0.01 base fee per Actor Start
Usage Fees: Pay $0.004 per minute of audio processed
Volume Discounts: Available for high-usage customers
Cost Control: Set maximum spending limits per job

🔒 Security & Privacy

API Security: All requests use secure HTTPS with API key authentication
Data Privacy: Audio files are processed securely and not stored permanently
Compliance: Built for enterprise use with privacy-first architecture
Rate Limiting: Respectful API usage with built-in rate limiting

⚠️ Legal Considerations

Content Rights: Ensure you have proper rights to transcribe the audio content
Privacy Compliance: Follow applicable privacy laws (GDPR, CCPA, etc.) when processing personal data
Platform Terms: Respect YouTube's Terms of Service and other platform policies
Copyright: Be mindful of copyrighted content when processing media files

🛠️ Advanced Features

Batch Processing

Process multiple files efficiently by chaining multiple actor runs or using the batch processing capabilities.

Integration Options

APIs: Direct REST API integration available
Webhooks: Real-time processing notifications
Zapier: No-code workflow integration
Custom Solutions: Enterprise integrations available

Supported Audio Formats

Audio: MP3, WAV, FLAC, AAC, OGG, M4A
Video: MP4, AVI, MOV, MKV (audio extracted)
Containers: Support for most common audio/video containers

📞 Support & Documentation

Help & Support

Documentation: Full API Documentation
Support: Contact support through the Apify platform
Community: Join our Discord for community support
Enterprise: Dedicated support for business customers

Troubleshooting

Rate Limits: Wait and retry if you hit API limits
Large Files: Consider breaking very large files into segments
Quality Issues: Check audio quality and format compatibility
Costs: Monitor usage through the actor's cost reporting features

Updates & Changelog

This actor is actively maintained with regular updates for:

New AI model versions
Additional output formats
Performance improvements
Enhanced error handling

📊 Actor Metrics

Response Time: < 30 seconds for typical audio files
Accuracy: 95%+ for clear audio with proper formatting
Supported Languages: 50+ languages supported
File Size Limit: Up to 500MB per file
Duration Limit: Up to 6 hours per audio file

Developed by: Sonartext
Version: 1.0.0
Last Updated: September 2025
License: MIT

For technical issues or feature requests, please use the Apify platform's support system or visit our documentation.

Universal Speech to Text Transcriber

tictechid/vanzi-universal-transcriber

Transcribe audio from videos stored on Google Drive, Dropbox, GitHub raw, OneDrive, Box, iCloud, AWS S3, GCS, Azure Blob, and Backblaze B2. Convert share links to direct downloads for fast, accurate transcripts with timestamps and easy API integration.

TicTech

5.0

(1)

Speech to Text Converter (Transcript / Captcha)

saswave/speech-to-text-converter

Transform audio records to text. Get transcription from sales or customer success teams audio files. Get Captcha text from captcha audio challenge. Speech to text converter helps you analyse, build KPI with audio records and bypass captcha.

SASWAVE

Speech To Text

vivid_astronaut/speech-to-text

Convert speech to text with high accuracy using Azure AI. Supports 100+ languages, speaker detection, and timestamps. Perfect for transcription, subtitles, and voice-to-text applications.

Fabio Suizu

Video Transcriber: Instagram, X, Facebook, TikTok

invideoiq/video-transcriber

Retrieves transcripts from online video content from multiple plateforms (Instagram, X, ..) using speech-to-text models. It delivers outputs in JSON and LLM-ready formats, making it ideal for analytics, and AI-based applications. Perfect for research and building intelligent conversational agents

InVideoIQ

281

4.9

(5)

Video to Text Transcription

aizen0/video-to-text-transcription

Convert video speech to text in bulk. Supports Only Twitter/Instagram, auto-detects languages, handles large files automatically. Uses OpenAI Whisper for high accuracy.

Aizen

Instagram Content Intelligence Pro

sian.agency/instagram-content-intelligence-pro

Revolutionary AI system that delivers comprehensive speech-to-text transcription combined with premium data analytics. Pay only for successful results - no processing fees, no setup costs.

SIÁN OÜ

5.0

(1)

🏁 TikTok Video Transcriber & Downloader +12 Languages

ingeniela/tiktok-video-transcriber

Download TikTok videos without watermark & get AI transcriptions with timestamps. Extract subtitles, captions & keywords. Multi-language speech-to-text converter. Direct download links included.

Ingeniela

Instagram To Text

cheapget/instagram-to-text

AI-powered video transcription and translation. Convert video speech to text with timestamped subtitles in 100+ languages

CheapGET

5.0

(1)

Twilio API Actor

alizarin_refrigerator-owner/twilio-api-actor

Access Twilio communication data including calls, SMS/MMS, recordings, transcriptions & usage analytics. Call Logs Detailed call history SMS/MMS Message history & sending Recordings Call recordings Transcriptions Speech-to-text Account phone numbers Billing usage data Lookup Phone number validation

The Howlers

Hugging Face Audio AI

alizarin_refrigerator-owner/hugging-face-audio-ai

Audio w/Hugging Face models speech recognition, text-to-speech & audio analysis Speech-to-Text: Transcribe audio Text-to-Speech: Generate natural speech Audio Classification: Classify sounds Voice Activity Detection: Detect speech Speaker Diarization: Identify speakers Music Generation: Create music