Instagram Youtube Transcripts With Speaker Labels avatar
Instagram Youtube Transcripts With Speaker Labels

Pricing

Pay per usage

Go to Apify Store
Instagram Youtube Transcripts With Speaker Labels

Instagram Youtube Transcripts With Speaker Labels

Generate transcripts with speaker diarization from Instagram Reels & YouTube videos. Automatically identifies speakers, outputs SRT/VTT subtitles, timestamps & full text. Perfect for podcasts, interviews & meetings. Bulk processing supported.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Transcript Downloader

Transcript Downloader

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

4 days ago

Last modified

Share

๐ŸŽ™๏ธ Transcript Downloader - Transcripts with Speaker Labels

Generate transcripts with automatic speaker diarization (speaker labels) from previously downloaded Instagram or YouTube audio using the Transcript Downloader API. Perfect for interviews, podcasts, meetings, and multi-speaker content.


๐Ÿ“š API Documentation

For complete API reference, endpoint details, and advanced usage examples, visit our official documentation:

Transcript Downloader API Documentation

Get Your API Key โ€ข API Pricing


โš ๏ธ Prerequisites

This actor requires a transcript_speaker_id from a previously downloaded audio file.

You must first use one of these actors to download audio and obtain the ID:

  • Instagram Audio Scraper - For Instagram reels and posts
  • YouTube Audio Scraper - For YouTube videos

The transcript_speaker_id is included in the audio download response.


โœจ Features

  • ๐ŸŽฏ Speaker diarization - Automatically identifies and labels different speakers
  • ๐Ÿ“ Multiple output formats - Full JSON, plain text, SRT, or VTT subtitles
  • โฑ๏ธ Timestamps included - Each segment includes start time and duration
  • ๐ŸŒ Language detection - Automatically detects the spoken language
  • ๐Ÿ“Š Speaker count - Reports the number of unique speakers detected
  • ๐Ÿ”„ Bulk processing - Process multiple transcripts in a single run
  • ๐Ÿ’พ Optional file storage - Save SRT/VTT files to Apify key-value store
  • ๐Ÿ•’ Polling logic with automatic retries
  • ๐Ÿง  Progress tracking and run logs
  • ๐Ÿ” Secure API token-based authentication

๐Ÿ“ง Input Parameters

ParameterTypeRequiredDefaultDescription
transcriptSpeakerIdsarrayโœ… Yesโ€”List of transcript_speaker_id values from audio download responses
apiTokenstringโœ… Yesโ€”Bearer token for Transcript Downloader API
outputFormatstringNofullOutput format: full, text_only, srt, or vtt
maxWaitTimenumberNo10Max time to wait for transcription (in minutes, range: 1โ€“15)
pollingIntervalnumberNo30Interval between polling status (in seconds, range: 30โ€“300)

๐Ÿ“ฅ Example Input

{
"transcriptSpeakerIds": [
"01KB21QX05P6B4JA7FJHTM7AWE",
"01KB22YZ06Q7C5KB8GLIUN8BWF"
],
"apiToken": "your-api-token",
"outputFormat": "full",
"maxWaitTime": 10,
"pollingInterval": 30
}

๐Ÿ“ค Output Format

Each transcript_speaker_id generates an output record with metadata and processing info:

Full JSON (default)

Complete transcript with all metadata:

{
"transcriptSpeakerId": "01KB21QX05P6B4JA7FJHTM7AWE",
"status": "success",
"mediaId": "ABC123xyz",
"language": "en",
"duration": 30.0,
"speakerCount": 2,
"cost": "0.030",
"format": "full",
"segments": [
{
"text": "Hello everyone, welcome to the show.",
"start": 0.0,
"duration": 2.5,
"speaker": "Speaker 1"
},
{
"text": "Thanks for having me.",
"start": 2.5,
"duration": 1.8,
"speaker": "Speaker 2"
}
],
"fullText": "Speaker 1: Hello everyone, welcome to the show.\nSpeaker 2: Thanks for having me."
}

Plain Text (text_only)

Readable transcript grouped by speaker:

{
"transcriptSpeakerId": "01KB21QX05P6B4JA7FJHTM7AWE",
"status": "success",
"format": "text_only",
"content": "Speaker 1: Hello everyone, welcome to the show.\n\nSpeaker 2: Thanks for having me. It's great to be here."
}

SRT Format (srt)

Standard subtitle format with speaker labels:

1
00:00:00,000 --> 00:00:02,500
[Speaker 1] Hello everyone, welcome to the show.
2
00:00:02,500 --> 00:00:04,300
[Speaker 2] Thanks for having me.

VTT Format (vtt)

WebVTT subtitle format with voice tags:

WEBVTT
1
00:00:00.000 --> 00:00:02.500
<v Speaker 1>Hello everyone, welcome to the show.
2
00:00:02.500 --> 00:00:04.300
<v Speaker 2>Thanks for having me.

๐Ÿ“Š Special Response Types

No Speech Detected

When audio contains no recognizable speech:

{
"transcriptSpeakerId": "01KB21QX05P6B4JA7FJHTM7AWE",
"status": "no_speech",
"message": "No speech detected in audio",
"mediaId": "ABC123xyz",
"duration": 0,
"cost": "0.030"
}

Failed Response

{
"transcriptSpeakerId": "01KB21QX05P6B4JA7FJHTM7AWE",
"status": "failed",
"error": "Invalid transcript_speaker_id or audio file not found"
}

๐Ÿš€ How to Use

  1. Get your API token from Transcript Downloader
  2. Run the Instagram Audio Scraper or YouTube Audio Scraper actor first
  3. Copy the transcript_speaker_id from the audio download response
  4. Add the ID(s) to this actor's input
  5. Run the actor and access results in the dataset or key-value store

Example Workflow

Step 1: Run Instagram Audio Scraper
โ†“
Response includes: "transcript_speaker_id": "01KB21QX05P6B4JA7FJHTM7AWE"
โ†“
Step 2: Run this actor with that ID
โ†“
Get transcript with speaker labels

โŒ Error Handling

This actor includes robust handling for common issues:

Status CodeDescription
400Audio processing failed โ€” verify audio was downloaded successfully
401Insufficient credits or invalid token โ€” check credits and API token
403Invalid API key โ€” check or regenerate key
404Invalid ID or audio file not found โ€” verify transcript_speaker_id
429Too many requests โ€” reduce polling frequency
503Transcript Downloader API under maintenance

Failed items are captured in the dataset with detailed error information.


โš ๏ธ Rate Limiting

  • ๐Ÿ”„ Max 75 requests per minute
  • โฑ๏ธ Keep polling interval above 30 seconds to avoid throttling
  • ๐Ÿ“Š Default polling interval of 30 seconds is recommended

โฑ๏ธ Processing Time & Performance

  • ๐Ÿ“Š Estimated processing time per transcript:

    • Short audio (< 1 minute): ~30-60 seconds
    • Medium audio (1-5 minutes): ~1-3 minutes
    • Long audio (5-15 minutes): ~3-8 minutes
    • Very long audio (15+ minutes): ~8-15 minutes
  • ๐Ÿ”„ Batch processing: Sequential processing with 30s polling interval

  • โšก First-time vs cached: First transcription takes longer; subsequent requests may be faster if cached


๐Ÿ’ก Best Practices

  • โœ… Ensure audio download is complete before requesting transcript
  • โณ Use appropriate polling intervals (30s recommended)
  • ๐Ÿ” Keep your apiToken secret (never log it)
  • ๐Ÿ“Š Monitor for no_speech status on music-only content
  • ๐ŸŽฏ Use srt or vtt format for video subtitles
  • ๐Ÿ“ Use text_only for readable documents
  • ๐Ÿง  Monitor output for incomplete or failed transcriptions
  • ๐Ÿ—‚๏ธ SRT/VTT files are automatically saved to key-value store

๐Ÿ’ฐ Pricing & Billing

The Transcript Downloader API used by this actor requires a valid API token. API usage is billed separately:

  • Transcription with speaker labels: ~$0.03 per transcript
  • Cost displayed: Exact cost shown in each response

๐Ÿ“Š Very cost-effective for speaker-labeled transcription. View full details and subscription plans on our pricing page


๐ŸŽฏ Use Cases

  • ๐ŸŽ™๏ธ Podcast transcription - Multi-host shows with speaker identification
  • ๐Ÿ“น Interview processing - Separate interviewer and interviewee
  • ๐Ÿ“‹ Meeting notes - Identify who said what
  • ๐Ÿ“บ Video subtitles - Generate SRT/VTT files with speaker labels
  • ๐Ÿ“Š Content analysis - Analyze speaking patterns and participation
  • โ™ฟ Accessibility - Create accessible transcripts for hearing impaired
  • ๐Ÿ“ Content repurposing - Convert audio content to written format
  • ๐Ÿ” Research - Analyze conversations and dialogues

๐Ÿ”„ Integration with Other Actors

This actor works with the Transcript Downloader suite:

  1. Instagram Audio Scraper โ†’ Download audio, get transcript_speaker_id
  2. YouTube Audio Scraper โ†’ Download audio, get transcript_speaker_id
  3. Transcripts with Speaker Labels (this actor) โ†’ Generate diarized transcript

Complete Workflow:

Instagram/YouTube URL โ†’ Audio Scraper โ†’ transcript_speaker_id โ†’ This Actor โ†’ Transcript with Speakers

๐Ÿ“ˆ Monitoring & Analytics

Track performance and usage with Apify tools:

  • Run history
  • Success/failure rates
  • Storage and resource usage
  • Output file availability

Example completion log:

Transcript with Speaker Labels Actor completed {
totalProcessed: 10,
successful: 8,
noSpeech: 1,
failed: 1,
successRate: '80.0%'
}

๐Ÿ™‹ Support

Need help? Visit Transcript Downloader Support. We respond within 24 business hours.

For technical issues with this actor, check the run logs for detailed error messages.


๐Ÿ“„ License

This actor is provided under the ISC License.


Made with โค๏ธ by Transcript Downloader | Website | API Dashboard