YouTube Transcript Scraper - Any Language
Pricing
from $10.00 / 1,000 results
YouTube Transcript Scraper - Any Language
Extract YouTube video transcripts and subtitles in any available language. Get timestamped text segments, full transcript text, and available language list. Perfect for content analysis, AI training data, and accessibility.
YouTube Transcript Scraper
Extract YouTube video transcripts and subtitles in any available language. This actor fetches caption data directly from YouTube's InnerTube API, parses timed text XML, and delivers clean, structured transcript data with timestamps for every segment.
Overview
The YouTube Transcript Scraper is designed for researchers, content creators, AI engineers, and accessibility professionals who need to extract spoken content from YouTube videos at scale. It processes video URLs, identifies available caption tracks, and extracts the full transcript text along with precise timing information for each segment. Whether you need English subtitles, auto-generated captions, or manually uploaded translations, this actor handles it all with a simple, straightforward configuration.
Features
- Extract transcripts in any available language from YouTube videos
- Get timestamped segments with start time, duration, and text for each line
- Detect whether captions are auto-generated or manually uploaded
- List all available languages for each video
- Support for youtube.com/watch, youtu.be short links, and YouTube Shorts URLs
- Word count calculation for content analysis
- Graceful handling of videos without captions
Input Configuration
| Field | Type | Default | Description |
|---|---|---|---|
urls | array | (required) | List of YouTube video URLs to process |
language | string | "en" | Preferred language code (e.g., en, es, fr, de, ja) |
maxResults | integer | 100 | Maximum number of transcripts to extract |
proxyConfiguration | object | Apify Proxy | Proxy settings for avoiding rate limits |
Output Format
Each result in the dataset contains the following fields:
videoId- The YouTube video IDurl- Full YouTube watch URLtitle- Video titlechannelName- Name of the uploading channellanguage- Language code of the extracted transcriptlanguageName- Human-readable language nameisAutoGenerated- Boolean indicating auto-generated captionsavailableLanguages- Array of all available caption languagestranscript- Full concatenated transcript textsegments- Array of objects withstart,duration, andtextfieldswordCount- Total word count of the transcriptdurationSeconds- Video duration in secondsscrapedAt- ISO timestamp of extraction
Use Cases
This actor is ideal for a wide range of applications. Content researchers can analyze spoken content across thousands of videos to identify trends and topics. AI and machine learning engineers can gather training data for natural language processing models, speech recognition systems, and large language model fine-tuning. SEO professionals can extract transcript text for keyword analysis and content optimization. Accessibility teams can verify and improve caption quality across video libraries. Journalists and fact-checkers can search through video content to find specific statements or claims. Educators can create study materials and searchable archives from lecture videos.
Integrations and Related Actors
This actor works well as part of a YouTube data pipeline. Combine it with other quick_kirigami YouTube actors for comprehensive video intelligence: use the YouTube Search Scraper to discover videos by keyword, then feed those URLs into this transcript scraper for full-text extraction. You can also pair it with other YouTube actors from the quick_kirigami suite for channel analytics, comment extraction, and metadata collection.
Pricing and Performance
The actor processes approximately 1,000 transcripts per dollar of Apify platform credits. Processing speed depends on video availability and caption complexity, but typical throughput is 5-10 transcripts per second. Memory usage is minimal since transcripts are processed one at a time and pushed to the dataset incrementally. For best results with large batches, configure proxy rotation to avoid YouTube rate limiting.
