
YouTube Transcript Scraper - Multi-Language
Pricing
$50.00 / 1,000 results

YouTube Transcript Scraper - Multi-Language
Extract YouTube video transcripts in multiple languages. Get time-stamped or text-only subtitles. Includes translation support and residential proxy for reliable access. Built by Rush.
0.0 (0)
Pricing
$50.00 / 1,000 results
0
2
2
Last modified
7 days ago
YouTube Transcript Scraper - Apify Actor
A robust Apify Actor that extracts transcripts/subtitles from YouTube videos in multiple languages. Built with comprehensive error handling and proxy rotation to ensure reliable transcript extraction.
Important Notes
- Output is generated for all videos, regardless of whether subtitles exist or not (videos without subtitles will still consume 2 result operations)
- Before using: It's recommended to check if your target videos have subtitles available to avoid wasting operations
- Enable Auto-Generated Transcripts: Turn on the "Include Auto-Generated Transcripts" option for videos without manual subtitles
- Videos without any subtitles will return an error in the output but still count toward your usage
Features
- π Multi-language Support: Extract transcripts in various languages
- π Multiple Language Support: Extract available subtitles in different languages
- β±οΈ Flexible Output: Choose between time-stamped or text-only format
- π‘οΈ Smart Proxy Management: Built-in residential proxy with automatic rotation on rate limits
- π Batch Processing: Process multiple videos efficiently in one run
- π Retry Logic: Automatic retries with exponential backoff for failed requests
- π Detailed Statistics: Track success rates, retries, and proxy rotations
- π Language Mapping: Intelligent language code matching using YouTube's ISO 639-1 two-letter language codes (e.g., zh, zh-TW, zh-Hans) - See full list
Input
The Actor accepts the following input:
{"video_ids": ["VIDEO_ID_1", "VIDEO_ID_2"],"text_only": false,"languages": ["en", "zh-TW", "es"],"use_proxy": true,"include_generated": true,"include_translation": true,"fetch_all": false}
Input Parameters
- video_ids (required): Array of YouTube video IDs or URLs to scrape
- text_only: If
true
, returns only text without timestamps (default:false
) - languages: Array of language codes to fetch. If empty, fetches all available languages
- use_proxy: Use Apify residential proxy to avoid rate limits (default:
true
) - include_generated: Include auto-generated transcripts (default:
true
) - include_translation: Attempt to translate missing languages (default:
true
) - fetch_all: Fetch all available transcripts regardless of languages parameter (default:
false
)
Language Codes
YouTube uses ISO 639-1 language codes, with extended formats for some languages. Common examples:
en
- Englishzh
- Chinese (auto-selects based on region)zh-Hans
- Chinese (Simplified script)zh-Hant
- Chinese (Traditional script)zh-CN
- Chinese (China/Simplified)zh-TW
- Chinese (Taiwan/Traditional)zh-HK
- Chinese (Hong Kong/Traditional)es
- Spanishfr
- Frenchde
- Germanja
- Japaneseko
- Korean
Note: For Chinese subtitles, YouTube may use different codes (zh, zh-Hans, zh-Hant, zh-CN, zh-TW) depending on how the uploader configured them.
Language code references:
- ISO 639-1 codes: https://www.loc.gov/standards/iso639-2/php/code_list.php
- Searchable list with native names: https://localizely.com/iso-639-1-list/
Output
The Actor produces output in the following structure:
- Individual Video Outputs: One output for each video containing the subtitle/transcript content
- Execution Summary Report: A final output with the overall execution statistics and results
For time-stamped transcripts:
{"video_id": "dQw4w9WgXcQ","video_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ","transcripts": {"en": {"language": "en","language_name": "English","is_generated": false,"is_translatable": true,"transcript": [{"index": 0,"text": "Never gonna give you up","start": "00:00:00.000","end": "00:00:03.000","start_seconds": 0.0,"end_seconds": 3.0,"duration": 3.0,"word_count": 5,"char_count": 23}],"stats": {"total_segments": 87,"total_words": 435,"total_characters": 2340,"average_segment_duration": 3.2,"total_duration": 278.4}}},"languages_count": 1,"successful_languages": ["en"],"failed_languages": [],"text_only": false,"metadata": {"fetched_at": "2025-08-02T12:34:56.789Z","proxy_used": true,"retries": 0}}
For text-only transcripts:
{"video_id": "dQw4w9WgXcQ","video_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ","transcripts": {"en": {"language": "en","language_name": "English","is_generated": false,"is_translatable": true,"transcript": ["Never gonna give you up","Never gonna let you down","Never gonna run around and desert you"]}},"languages_count": 1,"text_only": true}
Example Usage
Basic usage - single video:
{"video_ids": ["dQw4w9WgXcQ"],"text_only": false,"languages": ["en"],"include_generated": true}
Multiple videos with multiple languages:
{"video_ids": ["VIDEO_ID_1", "VIDEO_ID_2", "VIDEO_ID_3"],"text_only": false,"languages": ["en", "zh-TW", "es"],"use_proxy": true,"include_generated": true}
Text-only mode for all available languages:
{"video_ids": ["VIDEO_ID"],"text_only": true,"languages": [],"use_proxy": true}
Use Cases
- Content Analysis: Extract video transcripts for sentiment analysis or content categorization
- Multi-language Content: Get transcripts in multiple languages where available
- Accessibility: Create text versions of video content
- Research: Analyze speech patterns, word frequency, or content themes
- SEO: Extract video content for search engine optimization
Limitations
- Only works with videos that have subtitles available (manual or auto-generated)
- Auto-generated subtitles may have lower accuracy but are often the only option
- Output is generated for all videos, even those without subtitles (will contain error information)
- Some videos may have region restrictions
- YouTube enforces rate limits on transcript API requests
Error Handling
The Actor handles various error scenarios with automatic recovery:
- Invalid video IDs: Validated and reported with clear error messages
- Videos without subtitles: Error code
TRANSCRIPTS_DISABLED
orNO_TRANSCRIPTS_AVAILABLE
- Rate limiting (429 errors): Automatic proxy rotation and retry with exponential backoff
- Language not available: Attempts translation if
include_translation
is enabled - Network errors: Retries up to 3 times with increasing delays
Failed videos will include detailed error information while successful videos continue processing.
Error Response Example:
{"video_id": "INVALID_ID","video_url": "https://www.youtube.com/watch?v=INVALID_ID","error": "Video unavailable","code": "VIDEO_UNAVAILABLE"}
Summary Statistics
At the end of each run, the Actor provides a comprehensive summary with error tracking:
{"type": "summary","total_videos": 10,"successful": 8,"failed": 2,"scraper_stats": {"total_videos": 10,"successful": 8,"failed": 2,"retries": 3,"proxy_rotations": 1},"error_counts": {"VIDEO_UNAVAILABLE": 0,"TRANSCRIPTS_DISABLED": 1,"NO_TRANSCRIPTS_AVAILABLE": 0,"RATE_LIMITED": 0,"COULD_NOT_RETRIEVE": 0,"XML_PARSE_ERROR": 2,"TRANSLATION_FAILED": 0,"NO_TRANSCRIPT_FOUND": 1,"NETWORK_ERROR": 0,"PROXY_ERROR": 0,"UNKNOWN_ERROR": 0,"RETRY_SUCCESS": 1,"EMPTY_RESPONSE": 2},"error_descriptions": {"VIDEO_UNAVAILABLE": "Video is not available (deleted, private, or region-blocked)","TRANSCRIPTS_DISABLED": "Subtitles are disabled for this video","NO_TRANSCRIPTS_AVAILABLE": "No transcript files available","RATE_LIMITED": "YouTube rate limiting (too many requests)","COULD_NOT_RETRIEVE": "Failed to retrieve transcript content","XML_PARSE_ERROR": "Empty or invalid response from YouTube (often proxy-related)","TRANSLATION_FAILED": "Failed to translate to requested language","NO_TRANSCRIPT_FOUND": "Transcript not found in requested language","NETWORK_ERROR": "Network connection or timeout error","PROXY_ERROR": "Proxy configuration or connection failed","UNKNOWN_ERROR": "Unexpected error occurred","RETRY_SUCCESS": "Successfully retrieved after retry","EMPTY_RESPONSE": "YouTube returned empty response (0 bytes)"}}
Error Count Tracking
The Actor now tracks detailed error statistics to help diagnose issues:
- XML_PARSE_ERROR: Most common when YouTube returns empty responses through proxy
- EMPTY_RESPONSE: Indicates YouTube is blocking the request (often proxy-related)
- RETRY_SUCCESS: Shows how many requests succeeded after retrying
- RATE_LIMITED: Tracks when YouTube enforces rate limiting
- TRANSLATION_FAILED: Counts failed translation attempts
Each video result also includes error summary in metadata:
{"metadata": {"fetched_at": "2025-08-02T12:34:56.789Z","proxy_used": true,"retries": 0,"error_summary": {"xml_parse_errors": 2,"empty_responses": 2,"retry_successes": 1,"translation_failures": 0}}}
Development
To run locally:
$apify call
To deploy to Apify:
$apify push
Troubleshooting
-
All videos failing with "Transcripts are disabled":
- Ensure
use_proxy
is set totrue
- Check if videos actually have subtitles available
- Ensure
-
Rate limiting errors:
- The Actor automatically handles these with proxy rotation
- For persistent issues, reduce batch size
-
Missing languages:
- Enable
include_translation
to get translations - Check available languages first by fetching without specific language codes
- Enable
Important Note
This Actor was developed without testing all edge cases. If you encounter any errors or inaccuracies in the documentation, please provide feedback. Thank you!