Sonartext Speech To Text
Pricing
Pay per event
Sonartext Speech To Text
Under maintenanceSonarText Speech to Text Transcription Service
0.0 (0)
Pricing
Pay per event
1
2
2
Last modified
20 days ago
Sonartext STT Actor - Professional Speech-to-Text Transcription
Transform audio content into high-quality text transcriptions with enterprise-grade accuracy using state-of-the-art AI models. Perfect for content creators, researchers, businesses, and developers who need reliable speech-to-text conversion.
π Key Features
- π― State-of-the-Art AI: Powered by Whisper-v3-large-turbo for industry-leading accuracy
- π₯ Speaker Diarization: Automatically identify and separate different speakers
- π Multiple Input Sources: Support for direct uploads, YouTube videos, Twitter audio, Google Drive, AWS S3, and URLs
- β‘ Fast Processing: Optimized for speed without compromising quality
- π° Cost Control: Built-in cost estimation and limits to manage expenses
- π Flexible Output: Multiple formats including JSON, plain text, SRT, and VTT
- π Enterprise Security: Secure API integration with proper error handling
- π Usage Analytics: Detailed processing statistics and performance metrics
π― Use Cases
Content Creation & Media
- Podcast Transcription: Convert episodes to searchable text for show notes and SEO
- Video Subtitles: Generate accurate captions for YouTube, social media, and streaming
- Interview Processing: Transcribe interviews with automatic speaker identification
- Course Materials: Create accessible text versions of educational audio content
Business & Research
- Meeting Transcription: Document important discussions with speaker attribution
- Research Analysis: Convert audio interviews and focus groups to analyzable text
- Call Center Analytics: Process customer service calls for quality assurance
- Legal Documentation: Transcribe depositions and hearings with high accuracy
Accessibility & Compliance
- ADA Compliance: Provide text alternatives for audio content
- Multi-language Support: Transcribe content in multiple languages
- Hearing Accessibility: Make audio content accessible to deaf and hard-of-hearing users
π₯ Input Configuration
Required Parameters
{"audioSource": {"method": "url","fileUrl": "https://example.com/audio.mp3"}}
Complete Configuration Example
{"audioSource": {"method": "youtube","fileUrl": "https://youtube.com/watch?v=example","youTubeOptions": {"audioQuality": "highest"}},"transcriptionOptions": {"outputFormat": "json","speakerDiarization": true,"wordTimestamps": true,"maxCostCents": 500},"outputOptions": {"saveFiles": true,"includeRawResponse": false}}
Input Source Options
Method | Description | Example |
---|---|---|
url | Direct HTTP/HTTPS URL or local file | https://cdn.example.com/audio.mp3 |
upload | File upload via buffer | Used with file upload interface |
youtube | YouTube video URL | https://youtube.com/watch?v=dQw4w9WgXcQ |
twitter | Twitter/X post with audio/video | https://twitter.com/user/status/123456 |
gdrive | Google Drive file (public) | https://drive.google.com/file/d/FILE_ID |
s3 | AWS S3 object | s3://bucket-name/path/to/file.mp3 |
Advanced Options
{"transcriptionOptions": {"outputFormat": "srt", // json, text, srt, vtt"speakerDiarization": true, // Identify different speakers"wordTimestamps": true, // Include word-level timing"maxCostCents": 1000 // Maximum cost limit (in cents)},"outputOptions": {"saveFiles": true, // Save files to Apify dataset"includeRawResponse": true // Include full API response}}
π€ Output Structure
JSON Format (Default)
{"success": true,"result": {"transcription": "Hello, welcome to our podcast. Today we're discussing AI innovations.","speakers": [{"speaker": "Speaker 1","start": 0.0,"end": 3.2,"text": "Hello, welcome to our podcast."},{"speaker": "Speaker 2","start": 3.5,"end": 7.8,"text": "Today we're discussing AI innovations."}],"metadata": {"duration": 45.6,"wordCount": 124,"confidence": 0.94,"language": "en"}},"usage": {"costCents": 23,"processingTime": 12.5,"audioMinutes": 0.76}}
SRT Format Example
100:00:00,000 --> 00:00:03,200Hello, welcome to our podcast.200:00:03,500 --> 00:00:07,800Today we're discussing AI innovations.
Error Handling
{"success": false,"error": {"code": "AUDIO_TOO_LONG","message": "Audio duration exceeds maximum limit","details": {"maxDuration": 3600,"actualDuration": 4200}}}
π Getting Started
1. Basic Usage
{"audioSource": {"method": "url","fileUrl": "https://example.com/meeting-recording.mp3"},"transcriptionOptions": {"speakerDiarization": true}}
2. Run the Actor
Execute the actor and retrieve your transcription from the dataset.
π‘ Pro Tips
Optimize Audio Quality
- Clear Audio: Use high-quality recordings with minimal background noise
- Proper Levels: Ensure consistent audio levels across speakers
- Format: MP3, WAV, and FLAC formats work best
Speaker Diarization Best Practices
- Enable for multi-speaker content (meetings, interviews, podcasts)
- Works best with 2-10 distinct speakers
- Clearer speaker separation improves accuracy
Cost Management
- Set
maxCostCents
to control expenses - Use cost estimation feature before processing large files
- Consider audio length: pricing is typically per minute
YouTube Processing
- Supports both public and unlisted videos
- Automatically extracts best quality audio
- Handles various video lengths efficiently
π° Pricing
This actor uses Sonartext's professional STT service with transparent, usage-based pricing:
- Event Based Pricing: $0.01 base fee per Actor Start
- Usage Fees: Pay $0.004 per minute of audio processed
- Volume Discounts: Available for high-usage customers
- Cost Control: Set maximum spending limits per job
π Security & Privacy
- API Security: All requests use secure HTTPS with API key authentication
- Data Privacy: Audio files are processed securely and not stored permanently
- Compliance: Built for enterprise use with privacy-first architecture
- Rate Limiting: Respectful API usage with built-in rate limiting
β οΈ Legal Considerations
- Content Rights: Ensure you have proper rights to transcribe the audio content
- Privacy Compliance: Follow applicable privacy laws (GDPR, CCPA, etc.) when processing personal data
- Platform Terms: Respect YouTube's Terms of Service and other platform policies
- Copyright: Be mindful of copyrighted content when processing media files
π οΈ Advanced Features
Batch Processing
Process multiple files efficiently by chaining multiple actor runs or using the batch processing capabilities.
Integration Options
- APIs: Direct REST API integration available
- Webhooks: Real-time processing notifications
- Zapier: No-code workflow integration
- Custom Solutions: Enterprise integrations available
Supported Audio Formats
- Audio: MP3, WAV, FLAC, AAC, OGG, M4A
- Video: MP4, AVI, MOV, MKV (audio extracted)
- Containers: Support for most common audio/video containers
π Support & Documentation
Help & Support
- Documentation: Full API Documentation
- Support: Contact support through the Apify platform
- Community: Join our Discord for community support
- Enterprise: Dedicated support for business customers
Troubleshooting
- Rate Limits: Wait and retry if you hit API limits
- Large Files: Consider breaking very large files into segments
- Quality Issues: Check audio quality and format compatibility
- Costs: Monitor usage through the actor's cost reporting features
Updates & Changelog
This actor is actively maintained with regular updates for:
- New AI model versions
- Additional output formats
- Performance improvements
- Enhanced error handling
π Actor Metrics
- Response Time: < 30 seconds for typical audio files
- Accuracy: 95%+ for clear audio with proper formatting
- Supported Languages: 50+ languages supported
- File Size Limit: Up to 500MB per file
- Duration Limit: Up to 6 hours per audio file
Developed by: Sonartext
Version: 1.0.0
Last Updated: September 2025
License: MIT
For technical issues or feature requests, please use the Apify platform's support system or visit our documentation.