YouTube Transcript Scraper - Multi-Language avatar
YouTube Transcript Scraper - Multi-Language

Pricing

$50.00 / 1,000 results

Go to Store
YouTube Transcript Scraper - Multi-Language

YouTube Transcript Scraper - Multi-Language

Developed by

Futurize Rush

Futurize Rush

Maintained by Community

Extract YouTube video transcripts in multiple languages. Get time-stamped or text-only subtitles. Includes translation support and residential proxy for reliable access. Built by Rush.

0.0 (0)

Pricing

$50.00 / 1,000 results

0

2

2

Last modified

7 days ago

YouTube Transcript Scraper - Apify Actor

A robust Apify Actor that extracts transcripts/subtitles from YouTube videos in multiple languages. Built with comprehensive error handling and proxy rotation to ensure reliable transcript extraction.

Important Notes

  • Output is generated for all videos, regardless of whether subtitles exist or not (videos without subtitles will still consume 2 result operations)
  • Before using: It's recommended to check if your target videos have subtitles available to avoid wasting operations
  • Enable Auto-Generated Transcripts: Turn on the "Include Auto-Generated Transcripts" option for videos without manual subtitles
  • Videos without any subtitles will return an error in the output but still count toward your usage

Features

  • 🌍 Multi-language Support: Extract transcripts in various languages
  • πŸ”„ Multiple Language Support: Extract available subtitles in different languages
  • ⏱️ Flexible Output: Choose between time-stamped or text-only format
  • πŸ›‘οΈ Smart Proxy Management: Built-in residential proxy with automatic rotation on rate limits
  • πŸ“Š Batch Processing: Process multiple videos efficiently in one run
  • πŸ” Retry Logic: Automatic retries with exponential backoff for failed requests
  • πŸ“ˆ Detailed Statistics: Track success rates, retries, and proxy rotations
  • 🌐 Language Mapping: Intelligent language code matching using YouTube's ISO 639-1 two-letter language codes (e.g., zh, zh-TW, zh-Hans) - See full list

Input

The Actor accepts the following input:

{
"video_ids": ["VIDEO_ID_1", "VIDEO_ID_2"],
"text_only": false,
"languages": ["en", "zh-TW", "es"],
"use_proxy": true,
"include_generated": true,
"include_translation": true,
"fetch_all": false
}

Input Parameters

  • video_ids (required): Array of YouTube video IDs or URLs to scrape
  • text_only: If true, returns only text without timestamps (default: false)
  • languages: Array of language codes to fetch. If empty, fetches all available languages
  • use_proxy: Use Apify residential proxy to avoid rate limits (default: true)
  • include_generated: Include auto-generated transcripts (default: true)
  • include_translation: Attempt to translate missing languages (default: true)
  • fetch_all: Fetch all available transcripts regardless of languages parameter (default: false)

Language Codes

YouTube uses ISO 639-1 language codes, with extended formats for some languages. Common examples:

  • en - English
  • zh - Chinese (auto-selects based on region)
  • zh-Hans - Chinese (Simplified script)
  • zh-Hant - Chinese (Traditional script)
  • zh-CN - Chinese (China/Simplified)
  • zh-TW - Chinese (Taiwan/Traditional)
  • zh-HK - Chinese (Hong Kong/Traditional)
  • es - Spanish
  • fr - French
  • de - German
  • ja - Japanese
  • ko - Korean

Note: For Chinese subtitles, YouTube may use different codes (zh, zh-Hans, zh-Hant, zh-CN, zh-TW) depending on how the uploader configured them.

Language code references:

Output

The Actor produces output in the following structure:

  1. Individual Video Outputs: One output for each video containing the subtitle/transcript content
  2. Execution Summary Report: A final output with the overall execution statistics and results

For time-stamped transcripts:

{
"video_id": "dQw4w9WgXcQ",
"video_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"transcripts": {
"en": {
"language": "en",
"language_name": "English",
"is_generated": false,
"is_translatable": true,
"transcript": [
{
"index": 0,
"text": "Never gonna give you up",
"start": "00:00:00.000",
"end": "00:00:03.000",
"start_seconds": 0.0,
"end_seconds": 3.0,
"duration": 3.0,
"word_count": 5,
"char_count": 23
}
],
"stats": {
"total_segments": 87,
"total_words": 435,
"total_characters": 2340,
"average_segment_duration": 3.2,
"total_duration": 278.4
}
}
},
"languages_count": 1,
"successful_languages": ["en"],
"failed_languages": [],
"text_only": false,
"metadata": {
"fetched_at": "2025-08-02T12:34:56.789Z",
"proxy_used": true,
"retries": 0
}
}

For text-only transcripts:

{
"video_id": "dQw4w9WgXcQ",
"video_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"transcripts": {
"en": {
"language": "en",
"language_name": "English",
"is_generated": false,
"is_translatable": true,
"transcript": [
"Never gonna give you up",
"Never gonna let you down",
"Never gonna run around and desert you"
]
}
},
"languages_count": 1,
"text_only": true
}

Example Usage

Basic usage - single video:

{
"video_ids": ["dQw4w9WgXcQ"],
"text_only": false,
"languages": ["en"],
"include_generated": true
}

Multiple videos with multiple languages:

{
"video_ids": ["VIDEO_ID_1", "VIDEO_ID_2", "VIDEO_ID_3"],
"text_only": false,
"languages": ["en", "zh-TW", "es"],
"use_proxy": true,
"include_generated": true
}

Text-only mode for all available languages:

{
"video_ids": ["VIDEO_ID"],
"text_only": true,
"languages": [],
"use_proxy": true
}

Use Cases

  1. Content Analysis: Extract video transcripts for sentiment analysis or content categorization
  2. Multi-language Content: Get transcripts in multiple languages where available
  3. Accessibility: Create text versions of video content
  4. Research: Analyze speech patterns, word frequency, or content themes
  5. SEO: Extract video content for search engine optimization

Limitations

  • Only works with videos that have subtitles available (manual or auto-generated)
  • Auto-generated subtitles may have lower accuracy but are often the only option
  • Output is generated for all videos, even those without subtitles (will contain error information)
  • Some videos may have region restrictions
  • YouTube enforces rate limits on transcript API requests

Error Handling

The Actor handles various error scenarios with automatic recovery:

  • Invalid video IDs: Validated and reported with clear error messages
  • Videos without subtitles: Error code TRANSCRIPTS_DISABLED or NO_TRANSCRIPTS_AVAILABLE
  • Rate limiting (429 errors): Automatic proxy rotation and retry with exponential backoff
  • Language not available: Attempts translation if include_translation is enabled
  • Network errors: Retries up to 3 times with increasing delays

Failed videos will include detailed error information while successful videos continue processing.

Error Response Example:

{
"video_id": "INVALID_ID",
"video_url": "https://www.youtube.com/watch?v=INVALID_ID",
"error": "Video unavailable",
"code": "VIDEO_UNAVAILABLE"
}

Summary Statistics

At the end of each run, the Actor provides a comprehensive summary with error tracking:

{
"type": "summary",
"total_videos": 10,
"successful": 8,
"failed": 2,
"scraper_stats": {
"total_videos": 10,
"successful": 8,
"failed": 2,
"retries": 3,
"proxy_rotations": 1
},
"error_counts": {
"VIDEO_UNAVAILABLE": 0,
"TRANSCRIPTS_DISABLED": 1,
"NO_TRANSCRIPTS_AVAILABLE": 0,
"RATE_LIMITED": 0,
"COULD_NOT_RETRIEVE": 0,
"XML_PARSE_ERROR": 2,
"TRANSLATION_FAILED": 0,
"NO_TRANSCRIPT_FOUND": 1,
"NETWORK_ERROR": 0,
"PROXY_ERROR": 0,
"UNKNOWN_ERROR": 0,
"RETRY_SUCCESS": 1,
"EMPTY_RESPONSE": 2
},
"error_descriptions": {
"VIDEO_UNAVAILABLE": "Video is not available (deleted, private, or region-blocked)",
"TRANSCRIPTS_DISABLED": "Subtitles are disabled for this video",
"NO_TRANSCRIPTS_AVAILABLE": "No transcript files available",
"RATE_LIMITED": "YouTube rate limiting (too many requests)",
"COULD_NOT_RETRIEVE": "Failed to retrieve transcript content",
"XML_PARSE_ERROR": "Empty or invalid response from YouTube (often proxy-related)",
"TRANSLATION_FAILED": "Failed to translate to requested language",
"NO_TRANSCRIPT_FOUND": "Transcript not found in requested language",
"NETWORK_ERROR": "Network connection or timeout error",
"PROXY_ERROR": "Proxy configuration or connection failed",
"UNKNOWN_ERROR": "Unexpected error occurred",
"RETRY_SUCCESS": "Successfully retrieved after retry",
"EMPTY_RESPONSE": "YouTube returned empty response (0 bytes)"
}
}

Error Count Tracking

The Actor now tracks detailed error statistics to help diagnose issues:

  • XML_PARSE_ERROR: Most common when YouTube returns empty responses through proxy
  • EMPTY_RESPONSE: Indicates YouTube is blocking the request (often proxy-related)
  • RETRY_SUCCESS: Shows how many requests succeeded after retrying
  • RATE_LIMITED: Tracks when YouTube enforces rate limiting
  • TRANSLATION_FAILED: Counts failed translation attempts

Each video result also includes error summary in metadata:

{
"metadata": {
"fetched_at": "2025-08-02T12:34:56.789Z",
"proxy_used": true,
"retries": 0,
"error_summary": {
"xml_parse_errors": 2,
"empty_responses": 2,
"retry_successes": 1,
"translation_failures": 0
}
}
}

Development

To run locally:

$apify call

To deploy to Apify:

$apify push

Troubleshooting

  1. All videos failing with "Transcripts are disabled":

    • Ensure use_proxy is set to true
    • Check if videos actually have subtitles available
  2. Rate limiting errors:

    • The Actor automatically handles these with proxy rotation
    • For persistent issues, reduce batch size
  3. Missing languages:

    • Enable include_translation to get translations
    • Check available languages first by fetching without specific language codes

Important Note

This Actor was developed without testing all edge cases. If you encounter any errors or inaccuracies in the documentation, please provide feedback. Thank you!