Instagram Youtube Transcripts With Speaker Labels
Pricing
Pay per usage
Instagram Youtube Transcripts With Speaker Labels
Generate transcripts with speaker diarization from Instagram Reels & YouTube videos. Automatically identifies speakers, outputs SRT/VTT subtitles, timestamps & full text. Perfect for podcasts, interviews & meetings. Bulk processing supported.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Transcript Downloader
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
4 days ago
Last modified
Categories
Share
๐๏ธ Transcript Downloader - Transcripts with Speaker Labels
Generate transcripts with automatic speaker diarization (speaker labels) from previously downloaded Instagram or YouTube audio using the Transcript Downloader API. Perfect for interviews, podcasts, meetings, and multi-speaker content.
๐ API Documentation
For complete API reference, endpoint details, and advanced usage examples, visit our official documentation:
Transcript Downloader API Documentation
Get Your API Key โข API Pricing
โ ๏ธ Prerequisites
This actor requires a transcript_speaker_id from a previously downloaded audio file.
You must first use one of these actors to download audio and obtain the ID:
- Instagram Audio Scraper - For Instagram reels and posts
- YouTube Audio Scraper - For YouTube videos
The transcript_speaker_id is included in the audio download response.
โจ Features
- ๐ฏ Speaker diarization - Automatically identifies and labels different speakers
- ๐ Multiple output formats - Full JSON, plain text, SRT, or VTT subtitles
- โฑ๏ธ Timestamps included - Each segment includes start time and duration
- ๐ Language detection - Automatically detects the spoken language
- ๐ Speaker count - Reports the number of unique speakers detected
- ๐ Bulk processing - Process multiple transcripts in a single run
- ๐พ Optional file storage - Save SRT/VTT files to Apify key-value store
- ๐ Polling logic with automatic retries
- ๐ง Progress tracking and run logs
- ๐ Secure API token-based authentication
๐ง Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
transcriptSpeakerIds | array | โ Yes | โ | List of transcript_speaker_id values from audio download responses |
apiToken | string | โ Yes | โ | Bearer token for Transcript Downloader API |
outputFormat | string | No | full | Output format: full, text_only, srt, or vtt |
maxWaitTime | number | No | 10 | Max time to wait for transcription (in minutes, range: 1โ15) |
pollingInterval | number | No | 30 | Interval between polling status (in seconds, range: 30โ300) |
๐ฅ Example Input
{"transcriptSpeakerIds": ["01KB21QX05P6B4JA7FJHTM7AWE","01KB22YZ06Q7C5KB8GLIUN8BWF"],"apiToken": "your-api-token","outputFormat": "full","maxWaitTime": 10,"pollingInterval": 30}
๐ค Output Format
Each transcript_speaker_id generates an output record with metadata and processing info:
Full JSON (default)
Complete transcript with all metadata:
{"transcriptSpeakerId": "01KB21QX05P6B4JA7FJHTM7AWE","status": "success","mediaId": "ABC123xyz","language": "en","duration": 30.0,"speakerCount": 2,"cost": "0.030","format": "full","segments": [{"text": "Hello everyone, welcome to the show.","start": 0.0,"duration": 2.5,"speaker": "Speaker 1"},{"text": "Thanks for having me.","start": 2.5,"duration": 1.8,"speaker": "Speaker 2"}],"fullText": "Speaker 1: Hello everyone, welcome to the show.\nSpeaker 2: Thanks for having me."}
Plain Text (text_only)
Readable transcript grouped by speaker:
{"transcriptSpeakerId": "01KB21QX05P6B4JA7FJHTM7AWE","status": "success","format": "text_only","content": "Speaker 1: Hello everyone, welcome to the show.\n\nSpeaker 2: Thanks for having me. It's great to be here."}
SRT Format (srt)
Standard subtitle format with speaker labels:
100:00:00,000 --> 00:00:02,500[Speaker 1] Hello everyone, welcome to the show.200:00:02,500 --> 00:00:04,300[Speaker 2] Thanks for having me.
VTT Format (vtt)
WebVTT subtitle format with voice tags:
WEBVTT100:00:00.000 --> 00:00:02.500<v Speaker 1>Hello everyone, welcome to the show.200:00:02.500 --> 00:00:04.300<v Speaker 2>Thanks for having me.
๐ Special Response Types
No Speech Detected
When audio contains no recognizable speech:
{"transcriptSpeakerId": "01KB21QX05P6B4JA7FJHTM7AWE","status": "no_speech","message": "No speech detected in audio","mediaId": "ABC123xyz","duration": 0,"cost": "0.030"}
Failed Response
{"transcriptSpeakerId": "01KB21QX05P6B4JA7FJHTM7AWE","status": "failed","error": "Invalid transcript_speaker_id or audio file not found"}
๐ How to Use
- Get your API token from Transcript Downloader
- Run the Instagram Audio Scraper or YouTube Audio Scraper actor first
- Copy the
transcript_speaker_idfrom the audio download response - Add the ID(s) to this actor's input
- Run the actor and access results in the dataset or key-value store
Example Workflow
Step 1: Run Instagram Audio ScraperโResponse includes: "transcript_speaker_id": "01KB21QX05P6B4JA7FJHTM7AWE"โStep 2: Run this actor with that IDโGet transcript with speaker labels
โ Error Handling
This actor includes robust handling for common issues:
| Status Code | Description |
|---|---|
400 | Audio processing failed โ verify audio was downloaded successfully |
401 | Insufficient credits or invalid token โ check credits and API token |
403 | Invalid API key โ check or regenerate key |
404 | Invalid ID or audio file not found โ verify transcript_speaker_id |
429 | Too many requests โ reduce polling frequency |
503 | Transcript Downloader API under maintenance |
Failed items are captured in the dataset with detailed error information.
โ ๏ธ Rate Limiting
- ๐ Max 75 requests per minute
- โฑ๏ธ Keep polling interval above 30 seconds to avoid throttling
- ๐ Default polling interval of 30 seconds is recommended
โฑ๏ธ Processing Time & Performance
-
๐ Estimated processing time per transcript:
- Short audio (< 1 minute): ~30-60 seconds
- Medium audio (1-5 minutes): ~1-3 minutes
- Long audio (5-15 minutes): ~3-8 minutes
- Very long audio (15+ minutes): ~8-15 minutes
-
๐ Batch processing: Sequential processing with 30s polling interval
-
โก First-time vs cached: First transcription takes longer; subsequent requests may be faster if cached
๐ก Best Practices
- โ Ensure audio download is complete before requesting transcript
- โณ Use appropriate polling intervals (30s recommended)
- ๐ Keep your
apiTokensecret (never log it) - ๐ Monitor for
no_speechstatus on music-only content - ๐ฏ Use
srtorvttformat for video subtitles - ๐ Use
text_onlyfor readable documents - ๐ง Monitor output for incomplete or failed transcriptions
- ๐๏ธ SRT/VTT files are automatically saved to key-value store
๐ฐ Pricing & Billing
The Transcript Downloader API used by this actor requires a valid API token. API usage is billed separately:
- Transcription with speaker labels: ~$0.03 per transcript
- Cost displayed: Exact cost shown in each response
๐ Very cost-effective for speaker-labeled transcription. View full details and subscription plans on our pricing page
๐ฏ Use Cases
- ๐๏ธ Podcast transcription - Multi-host shows with speaker identification
- ๐น Interview processing - Separate interviewer and interviewee
- ๐ Meeting notes - Identify who said what
- ๐บ Video subtitles - Generate SRT/VTT files with speaker labels
- ๐ Content analysis - Analyze speaking patterns and participation
- โฟ Accessibility - Create accessible transcripts for hearing impaired
- ๐ Content repurposing - Convert audio content to written format
- ๐ Research - Analyze conversations and dialogues
๐ Integration with Other Actors
This actor works with the Transcript Downloader suite:
- Instagram Audio Scraper โ Download audio, get
transcript_speaker_id - YouTube Audio Scraper โ Download audio, get
transcript_speaker_id - Transcripts with Speaker Labels (this actor) โ Generate diarized transcript
Complete Workflow:
Instagram/YouTube URL โ Audio Scraper โ transcript_speaker_id โ This Actor โ Transcript with Speakers
๐ Monitoring & Analytics
Track performance and usage with Apify tools:
- Run history
- Success/failure rates
- Storage and resource usage
- Output file availability
Example completion log:
Transcript with Speaker Labels Actor completed {totalProcessed: 10,successful: 8,noSpeech: 1,failed: 1,successRate: '80.0%'}
๐ Support
Need help? Visit Transcript Downloader Support. We respond within 24 business hours.
For technical issues with this actor, check the run logs for detailed error messages.
๐ License
This actor is provided under the ISC License.
Made with โค๏ธ by Transcript Downloader | Website | API Dashboard