YouTube Transcript Extractor Pro avatar
YouTube Transcript Extractor Pro

Pricing

$10.00/month + usage

Go to Store
YouTube Transcript Extractor Pro

YouTube Transcript Extractor Pro

Developed by

Akash Kumar Naik

Akash Kumar Naik

Maintained by Community

Transform any YouTube video into searchable, accessible text with YouTube Transcript Extractor Pro. Designed with advanced anti-block technology and residential proxy networks, this powerful Apify actor ensures consistent and reliable transcript extraction.

0.0 (0)

Pricing

$10.00/month + usage

0

Total users

5

Monthly users

5

Runs succeeded

71%

Last modified

14 hours ago

YouTube Video Transcriber v2 - Apify Actor

Extract transcript from a single YouTube video with support for multiple languages, formats, and automatic translation. This Apify actor provides a robust solution for obtaining YouTube video transcripts while handling IP blocking through residential proxies.

Features

🎯 Core Functionality

  • Single Video Processing: Accepts a single YouTube URL or video ID
  • Language Support: Retrieve transcripts in preferred languages with priority ordering
  • Format Options: Export transcripts in JSON, plain text, SRT, or WebVTT formats
  • Manual vs Generated: Choose between manually created or auto-generated subtitles

πŸ›‘οΈ Advanced Capabilities

  • IP Blocking Protection: Uses Apify residential proxies to bypass YouTube's IP restrictions
  • Smart Retry Logic: Automatic retry with proxy rotation and exponential backoff
  • Session Management: Optimized session handling for better success rates
  • Robust Metadata Extraction: Multi-strategy extraction of video metadata (upload date, views, etc.)
  • Error Handling: Comprehensive error reporting and graceful failure handling

πŸ“Š Output Options

  • JSON: Structured data with timestamps and metadata
  • Plain Text: Clean text format for easy reading
  • SRT: Standard subtitle format for video players
  • WebVTT: Web-compatible subtitle format
  • Formatting Preservation: Option to keep HTML formatting (bold, italics)

Input Configuration

Required Parameters

  • Video URL or ID: Single YouTube video to process
    • Supports: https://youtu.be/VIDEO_ID, https://www.youtube.com/watch?v=VIDEO_ID, or just VIDEO_ID

Optional Parameters

  • Languages: Priority-ordered list of language codes (default: ["en"])
  • Output Format: Choose from json, text, srt, or webvtt
  • Preserve Formatting: Keep HTML tags like <i> and <b>
  • Exclude Generated: Skip auto-generated transcripts
  • Exclude Manual: Skip manually created transcripts
  • Proxy Configuration: Enable residential proxies (recommended)

Example Usage

Basic Usage

{
"videoUrl": "https://youtu.be/dQw4w9WgXcQ",
"languages": ["en"],
"outputFormat": "json"
}

Advanced Configuration

{
"videoUrl": "https://youtu.be/dQw4w9WgXcQ",
"languages": ["en", "es", "fr"],
"outputFormat": "srt",
"preserveFormatting": true,
"excludeGenerated": false,
"proxyConfiguration": {
"useApifyProxy": true
}
}

Output Format

Successful Response

{
"videoInput": "https://youtu.be/dQw4w9WgXcQ",
"videoId": "dQw4w9WgXcQ",
"status": "success",
"language": "English",
"languageCode": "en",
"isGenerated": false,
"outputFormat": "json",
"transcript": [
{
"text": "Hello there",
"start": 0.0,
"duration": 1.54
}
],
"metadata": {
"snippetCount": 150,
"preserveFormatting": false,
"totalDuration": 212.5
},
"upload_date": "2024-01-15T10:30:00-08:00",
"views": "1234567",
"extractedAt": "2024-01-15T10:30:00.000Z",
"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"textSummary": "Hello there...",
"error": null
}

Error Response

{
"videoInput": "https://youtu.be/invalid",
"videoId": "invalid",
"status": "error",
"language": null,
"languageCode": null,
"isGenerated": null,
"outputFormat": "json",
"transcript": null,
"metadata": null,
"extractedAt": "2024-01-15T10:30:00.000Z",
"videoUrl": "https://www.youtube.com/watch?v=invalid",
"textSummary": null,
"error": "Video is unavailable"
}

Proxy Configuration

Why Use Proxies?

YouTube actively blocks requests from cloud providers and high-traffic IPs. This actor uses Apify's residential proxy network to:

  • Avoid IP blocking: Rotate through residential IP addresses
  • Increase success rate: Residential IPs appear as regular users
  • Handle scale: Process multiple videos reliably

Proxy Types

  • Residential Proxies (Recommended): Real home/office IP addresses with highest success rate
  • Cost: More expensive than datacenter proxies but essential for YouTube scraping

Configuration

{
"proxyConfiguration": {
"useApifyProxy": true
}
}

Supported Languages

YouTube supports transcripts in 42+ languages with automatic generation and translation capabilities.

LanguageCodeUsageRegional Variants
Englishen39% of contenten-US, en-GB, en-CA, en-AU
Spanishes11.8% of contentes-ES, es-MX, es-AR
Hindihi610M+ users-
PortugueseptGrowing marketpt-BR, pt-PT
FrenchfrGlobal reachfr-FR, fr-CA
GermandeHigh engagement-
JapanesejaHigh-value audience-
RussianruPopular for subtitles-
KoreankoK-pop, entertainment-
ArabicarGrowing market-
ChinesezhMultiple variantszh-CN, zh-TW, zh-HK
ItalianitEuropean market-
TurkishtrRegional importance-
DutchnlHigh engagement-
PolishplGrowing European market-
IndonesianidSoutheast Asia growth-
ThaithSoutheast Asia growth-
VietnameseviSoutheast Asia growth-
BengalibnLarge population-
TamiltaGrowing in India-

Complete Language List

All 42 supported languages: ar, bn, bg, cs, da, nl, en, fr, fa, fil, fi, ka, de, el, gu, he, hi, hu, id, it, ja, kn, ko, lv, lt, ml, mr, no, pl, pt, pa, ro, ru, sk, es, sv, ta, te, th, tr, uk, vi

Language Configuration Examples

Basic Language Priority

{
"languages": ["en", "es", "fr"],
"translateTo": "de"
}

Regional Variants

{
"languages": ["en-US", "en-GB", "en"],
"translateTo": "pt-BR"
}

Multi-Market Approach

{
"languages": ["hi", "en", "es", "pt", "fr", "de", "ja", "ko"]
}

Best Practices

  • Always include English (en) as fallback - most common language
  • Use priority order - list most preferred languages first
  • Consider your audience - check YouTube Analytics for viewer languages
  • Regional variants - use pt-BR for Brazil, zh-CN for China, etc.
  • Multiple fallbacks - include 3-5 language options for better coverage

Supported URL Formats

The actor automatically extracts video IDs from various YouTube URL formats:

  • https://www.youtube.com/watch?v=VIDEO_ID
  • https://youtu.be/VIDEO_ID
  • https://youtube.com/watch?v=VIDEO_ID
  • https://m.youtube.com/watch?v=VIDEO_ID
  • https://www.youtube.com/embed/VIDEO_ID
  • https://www.youtube.com/v/VIDEO_ID
  • Raw video IDs: VIDEO_ID

Error Handling

Common Error Types

  • Transcripts Disabled: Video has no available transcripts
  • Video Unavailable: Video is private, deleted, or restricted
  • No Transcript Found: No transcript in requested languages
  • IP Blocked: Resolved automatically with proxy rotation

Retry Logic

  • 5 retry attempts with residential proxy rotation
  • Progressive delays with random jitter (3s, 5s, 7s, 9s)
  • Fresh sessions for each retry attempt
  • Smart error detection to avoid unnecessary retries

Performance & Costs

Processing Speed

  • Single video: ~2-5 seconds per video
  • With retries: Up to 30 seconds for heavily blocked videos
  • Batch processing: Parallel processing of multiple videos

Cost Considerations

  • Residential proxies: Higher cost but necessary for reliability
  • Request volume: Costs scale with number of videos processed
  • Retry attempts: Failed attempts still consume proxy bandwidth

Use Cases

Content Analysis

  • Extract video content for analysis and research
  • Create searchable databases of video content
  • Generate automated summaries and insights

Accessibility

  • Create subtitles for videos without captions
  • Translate content for international audiences
  • Generate text versions for screen readers

Education & Training

  • Create study materials from educational videos
  • Generate transcripts for online courses
  • Build searchable knowledge bases

Media & Publishing

  • Create blog posts from video content
  • Generate quotes and excerpts
  • Translate content for global audiences

Technical Details

Built With

  • Python 3.11: Modern Python runtime
  • YouTube Transcript API: Core transcript extraction
  • Apify SDK: Platform integration and proxy management
  • Requests: HTTP client with session management
  • BeautifulSoup4: HTML parsing for metadata extraction

Architecture

  • Modular design: Separate concerns for maintainability
  • Multi-strategy extraction: 4-tier fallback system for robust metadata extraction
  • Error resilience: Comprehensive error handling and recovery
  • Scalable: Handles single videos to large batches
  • Monitoring: Detailed logging for debugging and optimization

Metadata Extraction Strategies

  1. Primary: Extract from ytInitialPlayerResponse JavaScript object
  2. Fallback 1: Extract from ytInitialData JavaScript object
  3. Fallback 2: Extract from JSON-LD and microdata structured data
  4. Fallback 3: Extract from YouTube oEmbed API

Limitations

YouTube Restrictions

  • Private videos: Cannot access private or unlisted videos
  • Age-restricted content: May require additional authentication
  • Geographic restrictions: Some videos blocked in certain regions
  • No transcripts: Some videos genuinely have no available transcripts

Technical Limits

  • Rate limiting: YouTube may throttle high-volume requests
  • Proxy costs: Residential proxies required for reliability
  • Processing time: Complex retry logic may increase execution time

Support

For issues, feature requests, or questions about this Apify actor, please refer to the Apify platform documentation or contact support through the Apify Console.


Note: This actor is designed for legitimate use cases such as accessibility, research, and content analysis. Please respect YouTube's Terms of Service and copyright laws when using extracted transcripts.