YouTube Transcript Scraper (Multiple Language)
Pricing
$20.00/month + usage
Go to Apify Store

YouTube Transcript Scraper (Multiple Language)
A powerful actor that extracts transcripts/captions from YouTube videos with built-in translation support for 100+ languages.
Pricing
$20.00/month + usage
Rating
0.0
(0)
Developer

Deepanshu Sharma
Maintained by Community
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
YouTube Transcript Scraper
A powerful actor that extracts transcripts/captions from YouTube videos with built-in translation support for 100+ languages.
π Features
- Extract YouTube Transcripts: Get captions/subtitles from any YouTube video
- Multi-Language Translation: Translate transcripts to 100+ languages using free Google Translate
- Batch Processing: Process multiple videos in a single run
- Smart Caption Selection: Automatically finds the best available captions
- Multiple Output Formats: Get results in JSON or plain text format
- Proxy Support: Built-in Apify proxy support to avoid IP blocking
- Fast Translation: Optimized batch translation for speed (5-10x faster than individual translation)
- Progress Tracking: Clean progress indicators to monitor translation status
π Input
The actor accepts the following input parameters:
Required Parameters
| Parameter | Type | Description |
|---|---|---|
videos | Array of strings | List of YouTube video URLs or video IDs |
Optional Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
translate_to | String | "" (none) | Target language code for translation (e.g., "en", "es", "hi"). Leave empty for original language |
output_format | String | "json" | Output format: "json" (structured) or "txt" (plain text) |
proxyConfiguration | Object | Residential proxy enabled | Proxy settings for YouTube requests |
delay_seconds | Integer | 2 | Delay in seconds between processing videos (0-60) |
π Usage
Input Example
{"videos": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ","https://youtu.be/jNQXAC9IVRw","dQw4w9WgXcQ"],"translate_to": "en","output_format": "json","delay_seconds": 2}
Output Example (JSON Format)
{"video_id": "dQw4w9WgXcQ","video_url": "https://youtube.com/watch?v=dQw4w9WgXcQ","transcript": [{"text": "We're no strangers to love","start": 0.0,"duration": 3.5},{"text": "You know the rules and so do I","start": 3.5,"duration": 4.2}],"output_format": "json","metadata": {"available_languages": [{"language": "English","code": "en","type": "auto-generated"}],"selected_language": "en","translated_to": "es","translation_attempted": true,"translation_success": true,"translation_method": "Google Translate (deep-translator)"},"status": "success"}
Output Example (Text Format)
{"video_id": "dQw4w9WgXcQ","video_url": "https://youtube.com/watch?v=dQw4w9WgXcQ","transcript": "We're no strangers to love\nYou know the rules and so do I\nA full commitment's what I'm thinking of...","output_format": "txt","metadata": { ... },"status": "success"}
And 80+ more languages!
[See full language list in the input schema]
π§ How It Works
- Video Processing: The actor extracts video IDs from URLs or accepts direct video IDs
- Caption Discovery: Searches for available captions/transcripts in the video
- Caption Retrieval: Fetches the best available caption (auto-generated or manual)
- Translation (if enabled): Translates captions using Google Translate API
- Uses batch translation for speed (20 segments per batch)
- Progress tracking at 10%, 25%, 50%, 75%, 90%, 100%
- Fallback to individual translation if batch fails
- Output Formatting: Returns data in JSON or text format
β‘ Performance
- Without Translation: ~1-2 seconds per video
- With Translation:
- Small videos (50-100 segments): ~5-10 seconds
- Medium videos (500-1000 segments): ~30-60 seconds
- Large videos (2000+ segments): ~2-3 minutes
Optimization: Batch translation makes it 5-10x faster than translating individual segments!
π‘οΈ Proxy Configuration
Recommended Settings (Default)
{"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"],"apifyProxyCountry": "US"}
Why use proxies?
- YouTube may block IP addresses making too many requests
- Residential proxies are recommended to avoid detection
- The actor rotates proxies between videos automatically
π Use Cases
- Content Analysis: Analyze video content at scale
- Accessibility: Create captions for videos that don't have them
- Translation: Translate video content to reach global audiences
- Research: Extract data from educational or documentary videos
- SEO: Generate text content from video for search optimization
- Subtitles: Create subtitle files for videos
- Data Mining: Extract information from video tutorials or courses
β οΈ Limitations
- Caption Availability: Videos must have captions/subtitles available (auto-generated or manual)
- Translation Quality: Uses Google Translate - quality varies by language pair
- Rate Limiting: Free Google Translate may have rate limits for heavy usage
- Video Access: Cannot access private or age-restricted videos
- Disabled Captions: Some videos have captions disabled by the creator
π Error Handling
The actor handles various error cases:
| Error | Reason | Solution |
|---|---|---|
| "Transcripts disabled" | Video creator disabled captions | Try another video |
| "Video unavailable" | Video is private/deleted | Check video URL |
| "No transcripts available" | No captions exist for this video | YouTube may add auto-captions later |
| "Translation failed" | Translation service error | Original captions will be returned |
π‘ Tips for Best Results
- Use Residential Proxies: Prevents IP blocking from YouTube
- Add Delay Between Videos: Set
delay_secondsto 2-5 for better reliability - Batch Processing: Process multiple videos in one run for efficiency
- Check Metadata: The output includes info about available languages and translation status
- JSON Format: Use JSON format if you need timestamps and structured data
- Text Format: Use text format for simple transcript reading