YouTube Transcript Scraper (Multiple Language) avatar
YouTube Transcript Scraper (Multiple Language)

Pricing

$20.00/month + usage

Go to Apify Store
YouTube Transcript Scraper (Multiple Language)

YouTube Transcript Scraper (Multiple Language)

A powerful actor that extracts transcripts/captions from YouTube videos with built-in translation support for 100+ languages.

Pricing

$20.00/month + usage

Rating

0.0

(0)

Developer

Deepanshu Sharma

Deepanshu Sharma

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

YouTube Transcript Scraper

A powerful actor that extracts transcripts/captions from YouTube videos with built-in translation support for 100+ languages.

🌟 Features

  • Extract YouTube Transcripts: Get captions/subtitles from any YouTube video
  • Multi-Language Translation: Translate transcripts to 100+ languages using free Google Translate
  • Batch Processing: Process multiple videos in a single run
  • Smart Caption Selection: Automatically finds the best available captions
  • Multiple Output Formats: Get results in JSON or plain text format
  • Proxy Support: Built-in Apify proxy support to avoid IP blocking
  • Fast Translation: Optimized batch translation for speed (5-10x faster than individual translation)
  • Progress Tracking: Clean progress indicators to monitor translation status

πŸ“‹ Input

The actor accepts the following input parameters:

Required Parameters

ParameterTypeDescription
videosArray of stringsList of YouTube video URLs or video IDs

Optional Parameters

ParameterTypeDefaultDescription
translate_toString"" (none)Target language code for translation (e.g., "en", "es", "hi"). Leave empty for original language
output_formatString"json"Output format: "json" (structured) or "txt" (plain text)
proxyConfigurationObjectResidential proxy enabledProxy settings for YouTube requests
delay_secondsInteger2Delay in seconds between processing videos (0-60)

πŸš€ Usage

Input Example

{
"videos": [
"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"https://youtu.be/jNQXAC9IVRw",
"dQw4w9WgXcQ"
],
"translate_to": "en",
"output_format": "json",
"delay_seconds": 2
}

Output Example (JSON Format)

{
"video_id": "dQw4w9WgXcQ",
"video_url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
"transcript": [
{
"text": "We're no strangers to love",
"start": 0.0,
"duration": 3.5
},
{
"text": "You know the rules and so do I",
"start": 3.5,
"duration": 4.2
}
],
"output_format": "json",
"metadata": {
"available_languages": [
{
"language": "English",
"code": "en",
"type": "auto-generated"
}
],
"selected_language": "en",
"translated_to": "es",
"translation_attempted": true,
"translation_success": true,
"translation_method": "Google Translate (deep-translator)"
},
"status": "success"
}

Output Example (Text Format)

{
"video_id": "dQw4w9WgXcQ",
"video_url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
"transcript": "We're no strangers to love\nYou know the rules and so do I\nA full commitment's what I'm thinking of...",
"output_format": "txt",
"metadata": { ... },
"status": "success"
}

And 80+ more languages!

[See full language list in the input schema]

πŸ”§ How It Works

  1. Video Processing: The actor extracts video IDs from URLs or accepts direct video IDs
  2. Caption Discovery: Searches for available captions/transcripts in the video
  3. Caption Retrieval: Fetches the best available caption (auto-generated or manual)
  4. Translation (if enabled): Translates captions using Google Translate API
    • Uses batch translation for speed (20 segments per batch)
    • Progress tracking at 10%, 25%, 50%, 75%, 90%, 100%
    • Fallback to individual translation if batch fails
  5. Output Formatting: Returns data in JSON or text format

⚑ Performance

  • Without Translation: ~1-2 seconds per video
  • With Translation:
    • Small videos (50-100 segments): ~5-10 seconds
    • Medium videos (500-1000 segments): ~30-60 seconds
    • Large videos (2000+ segments): ~2-3 minutes

Optimization: Batch translation makes it 5-10x faster than translating individual segments!

πŸ›‘οΈ Proxy Configuration

{
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"],
"apifyProxyCountry": "US"
}

Why use proxies?

  • YouTube may block IP addresses making too many requests
  • Residential proxies are recommended to avoid detection
  • The actor rotates proxies between videos automatically

πŸ“Š Use Cases

  • Content Analysis: Analyze video content at scale
  • Accessibility: Create captions for videos that don't have them
  • Translation: Translate video content to reach global audiences
  • Research: Extract data from educational or documentary videos
  • SEO: Generate text content from video for search optimization
  • Subtitles: Create subtitle files for videos
  • Data Mining: Extract information from video tutorials or courses

⚠️ Limitations

  1. Caption Availability: Videos must have captions/subtitles available (auto-generated or manual)
  2. Translation Quality: Uses Google Translate - quality varies by language pair
  3. Rate Limiting: Free Google Translate may have rate limits for heavy usage
  4. Video Access: Cannot access private or age-restricted videos
  5. Disabled Captions: Some videos have captions disabled by the creator

πŸ› Error Handling

The actor handles various error cases:

ErrorReasonSolution
"Transcripts disabled"Video creator disabled captionsTry another video
"Video unavailable"Video is private/deletedCheck video URL
"No transcripts available"No captions exist for this videoYouTube may add auto-captions later
"Translation failed"Translation service errorOriginal captions will be returned

πŸ’‘ Tips for Best Results

  1. Use Residential Proxies: Prevents IP blocking from YouTube
  2. Add Delay Between Videos: Set delay_seconds to 2-5 for better reliability
  3. Batch Processing: Process multiple videos in one run for efficiency
  4. Check Metadata: The output includes info about available languages and translation status
  5. JSON Format: Use JSON format if you need timestamps and structured data
  6. Text Format: Use text format for simple transcript reading