
YouTube Transcript Extractor Pro
Pricing
$10.00/month + usage

YouTube Transcript Extractor Pro
Transform any YouTube video into searchable, accessible text with YouTube Transcript Extractor Pro. Designed with advanced anti-block technology and residential proxy networks, this powerful Apify actor ensures consistent and reliable transcript extraction.
0.0 (0)
Pricing
$10.00/month + usage
0
Total users
5
Monthly users
5
Runs succeeded
71%
Last modified
14 hours ago
YouTube Video Transcriber v2 - Apify Actor
Extract transcript from a single YouTube video with support for multiple languages, formats, and automatic translation. This Apify actor provides a robust solution for obtaining YouTube video transcripts while handling IP blocking through residential proxies.
Features
π― Core Functionality
- Single Video Processing: Accepts a single YouTube URL or video ID
- Language Support: Retrieve transcripts in preferred languages with priority ordering
- Format Options: Export transcripts in JSON, plain text, SRT, or WebVTT formats
- Manual vs Generated: Choose between manually created or auto-generated subtitles
π‘οΈ Advanced Capabilities
- IP Blocking Protection: Uses Apify residential proxies to bypass YouTube's IP restrictions
- Smart Retry Logic: Automatic retry with proxy rotation and exponential backoff
- Session Management: Optimized session handling for better success rates
- Robust Metadata Extraction: Multi-strategy extraction of video metadata (upload date, views, etc.)
- Error Handling: Comprehensive error reporting and graceful failure handling
π Output Options
- JSON: Structured data with timestamps and metadata
- Plain Text: Clean text format for easy reading
- SRT: Standard subtitle format for video players
- WebVTT: Web-compatible subtitle format
- Formatting Preservation: Option to keep HTML formatting (bold, italics)
Input Configuration
Required Parameters
- Video URL or ID: Single YouTube video to process
- Supports:
https://youtu.be/VIDEO_ID
,https://www.youtube.com/watch?v=VIDEO_ID
, or justVIDEO_ID
- Supports:
Optional Parameters
- Languages: Priority-ordered list of language codes (default:
["en"]
) - Output Format: Choose from
json
,text
,srt
, orwebvtt
- Preserve Formatting: Keep HTML tags like
<i>
and<b>
- Exclude Generated: Skip auto-generated transcripts
- Exclude Manual: Skip manually created transcripts
- Proxy Configuration: Enable residential proxies (recommended)
Example Usage
Basic Usage
{"videoUrl": "https://youtu.be/dQw4w9WgXcQ","languages": ["en"],"outputFormat": "json"}
Advanced Configuration
{"videoUrl": "https://youtu.be/dQw4w9WgXcQ","languages": ["en", "es", "fr"],"outputFormat": "srt","preserveFormatting": true,"excludeGenerated": false,"proxyConfiguration": {"useApifyProxy": true}}
Output Format
Successful Response
{"videoInput": "https://youtu.be/dQw4w9WgXcQ","videoId": "dQw4w9WgXcQ","status": "success","language": "English","languageCode": "en","isGenerated": false,"outputFormat": "json","transcript": [{"text": "Hello there","start": 0.0,"duration": 1.54}],"metadata": {"snippetCount": 150,"preserveFormatting": false,"totalDuration": 212.5},"upload_date": "2024-01-15T10:30:00-08:00","views": "1234567","extractedAt": "2024-01-15T10:30:00.000Z","videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ","textSummary": "Hello there...","error": null}
Error Response
{"videoInput": "https://youtu.be/invalid","videoId": "invalid","status": "error","language": null,"languageCode": null,"isGenerated": null,"outputFormat": "json","transcript": null,"metadata": null,"extractedAt": "2024-01-15T10:30:00.000Z","videoUrl": "https://www.youtube.com/watch?v=invalid","textSummary": null,"error": "Video is unavailable"}
Proxy Configuration
Why Use Proxies?
YouTube actively blocks requests from cloud providers and high-traffic IPs. This actor uses Apify's residential proxy network to:
- Avoid IP blocking: Rotate through residential IP addresses
- Increase success rate: Residential IPs appear as regular users
- Handle scale: Process multiple videos reliably
Proxy Types
- Residential Proxies (Recommended): Real home/office IP addresses with highest success rate
- Cost: More expensive than datacenter proxies but essential for YouTube scraping
Configuration
{"proxyConfiguration": {"useApifyProxy": true}}
Supported Languages
YouTube supports transcripts in 42+ languages with automatic generation and translation capabilities.
Most Popular Languages (Top 20)
Language | Code | Usage | Regional Variants |
---|---|---|---|
English | en | 39% of content | en-US , en-GB , en-CA , en-AU |
Spanish | es | 11.8% of content | es-ES , es-MX , es-AR |
Hindi | hi | 610M+ users | - |
Portuguese | pt | Growing market | pt-BR , pt-PT |
French | fr | Global reach | fr-FR , fr-CA |
German | de | High engagement | - |
Japanese | ja | High-value audience | - |
Russian | ru | Popular for subtitles | - |
Korean | ko | K-pop, entertainment | - |
Arabic | ar | Growing market | - |
Chinese | zh | Multiple variants | zh-CN , zh-TW , zh-HK |
Italian | it | European market | - |
Turkish | tr | Regional importance | - |
Dutch | nl | High engagement | - |
Polish | pl | Growing European market | - |
Indonesian | id | Southeast Asia growth | - |
Thai | th | Southeast Asia growth | - |
Vietnamese | vi | Southeast Asia growth | - |
Bengali | bn | Large population | - |
Tamil | ta | Growing in India | - |
Complete Language List
All 42 supported languages: ar
, bn
, bg
, cs
, da
, nl
, en
, fr
, fa
, fil
, fi
, ka
, de
, el
, gu
, he
, hi
, hu
, id
, it
, ja
, kn
, ko
, lv
, lt
, ml
, mr
, no
, pl
, pt
, pa
, ro
, ru
, sk
, es
, sv
, ta
, te
, th
, tr
, uk
, vi
Language Configuration Examples
Basic Language Priority
{"languages": ["en", "es", "fr"],"translateTo": "de"}
Regional Variants
{"languages": ["en-US", "en-GB", "en"],"translateTo": "pt-BR"}
Multi-Market Approach
{"languages": ["hi", "en", "es", "pt", "fr", "de", "ja", "ko"]}
Best Practices
- Always include English (
en
) as fallback - most common language - Use priority order - list most preferred languages first
- Consider your audience - check YouTube Analytics for viewer languages
- Regional variants - use
pt-BR
for Brazil,zh-CN
for China, etc. - Multiple fallbacks - include 3-5 language options for better coverage
Supported URL Formats
The actor automatically extracts video IDs from various YouTube URL formats:
https://www.youtube.com/watch?v=VIDEO_ID
https://youtu.be/VIDEO_ID
https://youtube.com/watch?v=VIDEO_ID
https://m.youtube.com/watch?v=VIDEO_ID
https://www.youtube.com/embed/VIDEO_ID
https://www.youtube.com/v/VIDEO_ID
- Raw video IDs:
VIDEO_ID
Error Handling
Common Error Types
- Transcripts Disabled: Video has no available transcripts
- Video Unavailable: Video is private, deleted, or restricted
- No Transcript Found: No transcript in requested languages
- IP Blocked: Resolved automatically with proxy rotation
Retry Logic
- 5 retry attempts with residential proxy rotation
- Progressive delays with random jitter (3s, 5s, 7s, 9s)
- Fresh sessions for each retry attempt
- Smart error detection to avoid unnecessary retries
Performance & Costs
Processing Speed
- Single video: ~2-5 seconds per video
- With retries: Up to 30 seconds for heavily blocked videos
- Batch processing: Parallel processing of multiple videos
Cost Considerations
- Residential proxies: Higher cost but necessary for reliability
- Request volume: Costs scale with number of videos processed
- Retry attempts: Failed attempts still consume proxy bandwidth
Use Cases
Content Analysis
- Extract video content for analysis and research
- Create searchable databases of video content
- Generate automated summaries and insights
Accessibility
- Create subtitles for videos without captions
- Translate content for international audiences
- Generate text versions for screen readers
Education & Training
- Create study materials from educational videos
- Generate transcripts for online courses
- Build searchable knowledge bases
Media & Publishing
- Create blog posts from video content
- Generate quotes and excerpts
- Translate content for global audiences
Technical Details
Built With
- Python 3.11: Modern Python runtime
- YouTube Transcript API: Core transcript extraction
- Apify SDK: Platform integration and proxy management
- Requests: HTTP client with session management
- BeautifulSoup4: HTML parsing for metadata extraction
Architecture
- Modular design: Separate concerns for maintainability
- Multi-strategy extraction: 4-tier fallback system for robust metadata extraction
- Error resilience: Comprehensive error handling and recovery
- Scalable: Handles single videos to large batches
- Monitoring: Detailed logging for debugging and optimization
Metadata Extraction Strategies
- Primary: Extract from
ytInitialPlayerResponse
JavaScript object - Fallback 1: Extract from
ytInitialData
JavaScript object - Fallback 2: Extract from JSON-LD and microdata structured data
- Fallback 3: Extract from YouTube oEmbed API
Limitations
YouTube Restrictions
- Private videos: Cannot access private or unlisted videos
- Age-restricted content: May require additional authentication
- Geographic restrictions: Some videos blocked in certain regions
- No transcripts: Some videos genuinely have no available transcripts
Technical Limits
- Rate limiting: YouTube may throttle high-volume requests
- Proxy costs: Residential proxies required for reliability
- Processing time: Complex retry logic may increase execution time
Support
For issues, feature requests, or questions about this Apify actor, please refer to the Apify platform documentation or contact support through the Apify Console.
Note: This actor is designed for legitimate use cases such as accessibility, research, and content analysis. Please respect YouTube's Terms of Service and copyright laws when using extracted transcripts.