Sonartext Speech To Text
Pricing
Pay per event
Sonartext Speech To Text
Under maintenanceSonarText Speech to Text Transcription Service
0.0 (0)
Pricing
Pay per event
1
2
2
Last modified
16 days ago
Sonartext STT Actor - Professional Speech-to-Text Transcription
Transform audio content into high-quality text transcriptions with enterprise-grade accuracy using state-of-the-art AI models. Perfect for content creators, researchers, businesses, and developers who need reliable speech-to-text conversion.
π Key Features
- π― State-of-the-Art AI: Powered by Whisper-v3-large-turbo for industry-leading accuracy
- π₯ Speaker Diarization: Automatically identify and separate different speakers
- π Multiple Input Sources: Support for direct uploads, YouTube videos, Twitter audio, Google Drive, AWS S3, and URLs
- β‘ Fast Processing: Optimized for speed without compromising quality
- π° Cost Control: Built-in cost estimation and limits to manage expenses
- π Flexible Output: Multiple formats including JSON, plain text, SRT, and VTT
- π Enterprise Security: Secure API integration with proper error handling
- π Usage Analytics: Detailed processing statistics and performance metrics
π― Use Cases
Content Creation & Media
- Podcast Transcription: Convert episodes to searchable text for show notes and SEO
- Video Subtitles: Generate accurate captions for YouTube, social media, and streaming
- Interview Processing: Transcribe interviews with automatic speaker identification
- Course Materials: Create accessible text versions of educational audio content
Business & Research
- Meeting Transcription: Document important discussions with speaker attribution
- Research Analysis: Convert audio interviews and focus groups to analyzable text
- Call Center Analytics: Process customer service calls for quality assurance
- Legal Documentation: Transcribe depositions and hearings with high accuracy
Accessibility & Compliance
- ADA Compliance: Provide text alternatives for audio content
- Multi-language Support: Transcribe content in multiple languages
- Hearing Accessibility: Make audio content accessible to deaf and hard-of-hearing users
π₯ Input Configuration
Simple Input Fields
No complex JSON required! Just fill out the form fields:
- Input Method: Choose how to provide your audio (Upload File, URL, YouTube, etc.)
- Audio File/URL: Provide the file or URL based on your chosen method
- Language: Select language or leave blank for auto-detection
- Timestamps: Choose timestamp detail level
- Speaker Diarization: Enable to identify different speakers
- Response Format: JSON, Plain Text, SRT, or VTT subtitles
Example Inputs
File Upload
Input Method: Upload FileAudio File: [Choose your file...]Language: EnglishResponse Format: JSON
YouTube Video
Input Method: YouTube VideoYouTube URL: https://youtube.com/watch?v=abc123Language: Auto-detectTimestamps: Segment-levelResponse Format: SRT Subtitles
URL Processing
Input Method: Direct URLFile URL: https://example.com/audio.mp3Speaker Diarization: YesMin Speakers: 2Max Speakers: 4
Input Source Options
| Input Method | Description | What You Provide |
|---|---|---|
| Upload File | Upload audio/video from your computer | Click "Choose file" and select your audio/video file |
| Direct URL | Audio/video file from any web URL | Enter the direct URL to the file |
| YouTube Video | Extract audio from YouTube video | YouTube video URL (e.g. youtube.com/watch?v=...) |
| Twitter/X Video | Extract audio from Twitter/X post | Twitter post URL with video |
| Google Drive | File shared via Google Drive | Public Google Drive file link |
| AWS S3 | File stored in S3 bucket | S3 URL or presigned URL |
Configuration Options
| Setting | Options | Description |
|---|---|---|
| Language | Auto-detect, English, Spanish, French, etc. | Language of the audio content |
| Timestamps | None, Segment-level, Word-level, Both | Detail level of timestamps |
| Speaker Diarization | On/Off | Identify different speakers |
| Min/Max Speakers | 1-20 | Expected number of speakers (if diarization enabled) |
| Response Format | JSON, Plain Text, SRT, VTT | Output format for results |
| Cost Limit | 1-10000 cents | Optional spending limit for safety |
π€ Output Structure
JSON Format (Default)
{"success": true,"result": {"transcription": "Hello, welcome to our podcast. Today we're discussing AI innovations.","speakers": [{"speaker": "Speaker 1","start": 0.0,"end": 3.2,"text": "Hello, welcome to our podcast."},{"speaker": "Speaker 2","start": 3.5,"end": 7.8,"text": "Today we're discussing AI innovations."}],"metadata": {"duration": 45.6,"wordCount": 124,"confidence": 0.94,"language": "en"}},"usage": {"costCents": 23,"processingTime": 12.5,"audioMinutes": 0.76}}
SRT Format Example
100:00:00,000 --> 00:00:03,200Hello, welcome to our podcast.200:00:03,500 --> 00:00:07,800Today we're discussing AI innovations.
Error Handling
{"success": false,"error": {"code": "AUDIO_TOO_LONG","message": "Audio duration exceeds maximum limit","details": {"maxDuration": 3600,"actualDuration": 4200}}}
π Getting Started
1. Basic Usage
1. Choose "Direct URL" as Input Method2. Enter: https://example.com/meeting-recording.mp33. Enable Speaker Diarization: Yes4. Click "Run"
2. File Upload Usage
1. Choose "Upload File" as Input Method2. Click "Choose file" and select your audio3. Set Language to "Auto-detect"4. Choose Response Format: "JSON"5. Click "Run"
3. Run the Actor
Execute the actor and retrieve your transcription from the dataset.
π‘ Pro Tips
Optimize Audio Quality
- Clear Audio: Use high-quality recordings with minimal background noise
- Proper Levels: Ensure consistent audio levels across speakers
- Format: MP3, WAV, and FLAC formats work best
Speaker Diarization Best Practices
- Enable for multi-speaker content (meetings, interviews, podcasts)
- Works best with 2-10 distinct speakers
- Clearer speaker separation improves accuracy
Cost Management
- Set
maxCostCentsto control expenses - Use cost estimation feature before processing large files
- Consider audio length: pricing is typically per minute
YouTube Processing
- Supports both public and unlisted videos
- Automatically extracts best quality audio
- Handles various video lengths efficiently
π° Pricing
This actor uses Sonartext's professional STT service with transparent, usage-based pricing:
- Event Based Pricing: $0.01 base fee per Actor Start
- Usage Fees: Pay $0.004 per minute of audio processed
- Volume Discounts: Available for high-usage customers
- Cost Control: Set maximum spending limits per job
π Security & Privacy
- API Security: All requests use secure HTTPS with API key authentication
- Data Privacy: Audio files are processed securely and not stored permanently
- Compliance: Built for enterprise use with privacy-first architecture
- Rate Limiting: Respectful API usage with built-in rate limiting
β οΈ Legal Considerations
- Content Rights: Ensure you have proper rights to transcribe the audio content
- Privacy Compliance: Follow applicable privacy laws (GDPR, CCPA, etc.) when processing personal data
- Platform Terms: Respect YouTube's Terms of Service and other platform policies
- Copyright: Be mindful of copyrighted content when processing media files
π οΈ Advanced Features
Batch Processing
Process multiple files efficiently by chaining multiple actor runs or using the batch processing capabilities.
Integration Options
- APIs: Direct REST API integration available
- Webhooks: Real-time processing notifications
- Zapier: No-code workflow integration
- Custom Solutions: Enterprise integrations available
Supported Audio Formats
- Audio: MP3, WAV, FLAC, AAC, OGG, M4A
- Video: MP4, AVI, MOV, MKV (audio extracted)
- Containers: Support for most common audio/video containers
π Support & Documentation
Help & Support
- Documentation: Full API Documentation
- Support: Contact support through the Apify platform
- Community: Join our Discord for community support
- Enterprise: Dedicated support for business customers
Troubleshooting
- Rate Limits: Wait and retry if you hit API limits
- Large Files: Consider breaking very large files into segments
- Quality Issues: Check audio quality and format compatibility
- Costs: Monitor usage through the actor's cost reporting features
Updates & Changelog
This actor is actively maintained with regular updates for:
- New AI model versions
- Additional output formats
- Performance improvements
- Enhanced error handling
π Actor Metrics
- Response Time: < 30 seconds for typical audio files
- Accuracy: 95%+ for clear audio with proper formatting
- Supported Languages: 50+ languages supported
- File Size Limit: Up to 500MB per file
- Duration Limit: Up to 6 hours per audio file
Developed by: Sonartext
Version: 1.0.0
Last Updated: September 2025
License: MIT
For technical issues or feature requests, please use the Apify platform's support system or visit our documentation.
