YouTube Transcript Scraper - Any Language avatar

YouTube Transcript Scraper - Any Language

Pricing

from $10.00 / 1,000 results

Go to Apify Store
YouTube Transcript Scraper - Any Language

YouTube Transcript Scraper - Any Language

Extract YouTube video transcripts and subtitles in any available language. Get timestamped text segments, full transcript text, and available language list. Perfect for content analysis, AI training data, and accessibility.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

Donny

Donny

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 hours ago

Last modified

Categories

Share

YouTube Transcript Scraper

Extract YouTube video transcripts and subtitles in any available language. This actor fetches caption data directly from YouTube's InnerTube API, parses timed text XML, and delivers clean, structured transcript data with timestamps for every segment.

Overview

The YouTube Transcript Scraper is designed for researchers, content creators, AI engineers, and accessibility professionals who need to extract spoken content from YouTube videos at scale. It processes video URLs, identifies available caption tracks, and extracts the full transcript text along with precise timing information for each segment. Whether you need English subtitles, auto-generated captions, or manually uploaded translations, this actor handles it all with a simple, straightforward configuration.

Features

  • Extract transcripts in any available language from YouTube videos
  • Get timestamped segments with start time, duration, and text for each line
  • Detect whether captions are auto-generated or manually uploaded
  • List all available languages for each video
  • Support for youtube.com/watch, youtu.be short links, and YouTube Shorts URLs
  • Word count calculation for content analysis
  • Graceful handling of videos without captions

Input Configuration

FieldTypeDefaultDescription
urlsarray(required)List of YouTube video URLs to process
languagestring"en"Preferred language code (e.g., en, es, fr, de, ja)
maxResultsinteger100Maximum number of transcripts to extract
proxyConfigurationobjectApify ProxyProxy settings for avoiding rate limits

Output Format

Each result in the dataset contains the following fields:

  • videoId - The YouTube video ID
  • url - Full YouTube watch URL
  • title - Video title
  • channelName - Name of the uploading channel
  • language - Language code of the extracted transcript
  • languageName - Human-readable language name
  • isAutoGenerated - Boolean indicating auto-generated captions
  • availableLanguages - Array of all available caption languages
  • transcript - Full concatenated transcript text
  • segments - Array of objects with start, duration, and text fields
  • wordCount - Total word count of the transcript
  • durationSeconds - Video duration in seconds
  • scrapedAt - ISO timestamp of extraction

Use Cases

This actor is ideal for a wide range of applications. Content researchers can analyze spoken content across thousands of videos to identify trends and topics. AI and machine learning engineers can gather training data for natural language processing models, speech recognition systems, and large language model fine-tuning. SEO professionals can extract transcript text for keyword analysis and content optimization. Accessibility teams can verify and improve caption quality across video libraries. Journalists and fact-checkers can search through video content to find specific statements or claims. Educators can create study materials and searchable archives from lecture videos.

This actor works well as part of a YouTube data pipeline. Combine it with other quick_kirigami YouTube actors for comprehensive video intelligence: use the YouTube Search Scraper to discover videos by keyword, then feed those URLs into this transcript scraper for full-text extraction. You can also pair it with other YouTube actors from the quick_kirigami suite for channel analytics, comment extraction, and metadata collection.

Pricing and Performance

The actor processes approximately 1,000 transcripts per dollar of Apify platform credits. Processing speed depends on video availability and caption complexity, but typical throughput is 5-10 transcripts per second. Memory usage is minimal since transcripts are processed one at a time and pushed to the dataset incrementally. For best results with large batches, configure proxy rotation to avoid YouTube rate limiting.