Deprecated

Pricing

from $3.50 / 1,000 results

See alternative Actors

Go to Apify Store

YouTube Transcript Scraper — 100+ Languages & Auto-Translation

Deprecated

See alternative Actors

Extract transcripts from YouTube videos. Batch URLs, channel mode, 100+ languages, auto-translation, SRT/WebVTT export. $2.00/1K — 60% cheaper than competitors.

Pricing

from $3.50 / 1,000 results

Rating

0.0

(0)

Developer

XiaoZhi DataTools

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

YouTube Transcript Scraper — Multi-Language + Translation + Rich Metadata + RAG Chunking

Extract transcripts from YouTube videos with multi-language support, auto-translation, 4 output formats (timestamped, text, SRT, WebVTT), 25+ metadata fields, playlist support, RAG chunking, and concurrent processing.

Why Choose This Scraper?

	This Scraper	pintostudio (17K)	starvibe (5K)	karamelo (6.4K)
Price	$2.00/1K	$10.00/1K	$5.00/1K	$5.00/1K
Batch URLs	✅	❌ Single	❌ Single	✅
Channel mode	✅	❌	✅	❌
Playlist mode	✅	❌	❌	❌
Language select	✅ 100+	✅ 80+	✅	❌
Auto-translate	✅	❌	❌	❌
Output formats	4 (text/timestamps/SRT/WebVTT)	1	1	5
Metadata	✅ 25+ fields	❌	✅ 20+ fields	✅ 14 fields
RAG chunking	✅	❌	❌	❌
Concurrency	✅ parallel	❌	❌	❌
Available langs	✅ List all	❌	✅	❌
Geo-restriction	✅	❌	❌	❌
Channel details	✅ subs/desc/joined	❌	✅	❌
Error tracking	✅ per-video status	❌	❌	❌

3 Modes

1. 🔗 URL Mode

Extract transcripts from specific video URLs (batch supported, one per line).

2. 📺 Channel Mode

Get transcripts from all recent videos on a channel. Supports date filtering.

3. 📋 Playlist Mode

Extract transcripts from all videos in a YouTube playlist. Provide one or more playlist URLs (one per line). Use maxVideos to limit how many videos are processed per playlist.

4 Output Formats

timestamped — [{text, start, duration}] JSON array
text — Plain text, no timestamps
srt — Standard subtitle format (for video players)
webvtt — Web subtitle format (for HTML5 video)

RAG / AI Chunking

For RAG (Retrieval-Augmented Generation) and AI pipelines, you can split transcripts into overlapping word chunks:

chunkSize — Number of words per chunk (0 = disabled, 100–5000). Set to e.g. 200 for typical RAG use.
chunkOverlap — Overlapping words between consecutive chunks (default: 50). Provides context continuity.

When enabled, each chunk is output as a separate dataset item with:

chunk_index — Position of this chunk (0-based)
total_chunks — Total number of chunks for this video
chunk_text — The chunk content
chunk_start_char — Character offset where chunk starts in full_text
chunk_end_char — Character offset where chunk ends in full_text
word_count — Word count for this chunk

The original full transcript item is always preserved alongside the chunk items.

Concurrency

Process multiple videos in parallel for faster runs:

maxConcurrency — Number of parallel transcript fetches (1–20, default: 5)
Each video is processed with a semaphore to respect rate limits
Failed transcripts are automatically retried up to 3 times with a 2-second delay

Rich Metadata (25+ fields)

When includeMetadata=true, each video includes:

Video Info

title, description, duration, duration_formatted
view_count, like_count, comment_count
published_at (ISO date), category, keywords (tags array)
thumbnail (high quality), is_live
has_captions, caption_languages
word_count — Number of words in the full transcript

Channel Info

channel_name, channel_id, channel_url
channel_subscribers (e.g. "1.2M subscribers")
channel_description
channel_joined_date

Geo-Restriction

is_restricted — boolean flag
restriction_reason — reason if restricted
available_countries — list of country codes where video is available
blocked_countries — list of blocked country codes

Internal Fields

_metadata_level — "full" (with proxy) or "basic" (no proxy)
_status — "success", "transcript_failed", "chunk", or "skipped"

Key Features

🌐 100+ languages — Select any transcript language
🔄 Auto-translate — Translate transcripts to any language
📺 Channel mode — Batch extract from channels
📋 Playlist mode — Extract from YouTube playlists
📋 List languages — Show all available transcripts per video
🎬 Shorts support — Works with YouTube Shorts too
📊 Rich metadata — 25+ fields with proxy, basic info without
🔍 Auto-detect — Automatically picks the best available language
🌍 Geo-restriction — Detect region-blocked videos
🏷️ Keywords/tags — Extract video tags from YouTube
🔒 Proxy support — Auto Apify residential proxy or custom proxy
✅ Error tracking — Per-video success/failure status in output
📝 Word count — Every output item includes word count
🧩 RAG chunking — Split transcripts into overlapping chunks for AI pipelines
⚡ Concurrency — Parallel processing with configurable concurrency
🔁 Auto-retry — Failed transcript fetches retried up to 3 times

Proxy & Metadata Levels

Metadata richness depends on proxy availability:

Level	Proxy	Fields Available
full	Apify Residential / Custom	All 25+ fields (views, likes, comments, geo-restriction, channel details)
basic	None	Title, channel name, thumbnail only

Recommendation: Use Apify residential proxy (automatic, no config needed) for full metadata. Without proxy, YouTube may return limited data from cloud IPs.

Input Parameters

Parameter	Type	Description
`mode`	select	URL, Channel, or Playlist
`urls`	textarea	Video URLs (one per line)
`channels`	textarea	Channel handles (one per line)
`playlists`	textarea	Playlist URLs (one per line)
`maxVideos`	number	Max videos per channel/playlist (default: 20)
`maxConcurrency`	number	Parallel processing count (default: 5, max: 20)
`language`	text	Transcript language code (empty = auto-detect)
`translateTo`	text	Translate to this language (empty = no translation)
`outputFormat`	select	timestamped/text/srt/webvtt
`chunkSize`	number	RAG chunk size in words (0 = disabled, default: 0)
`chunkOverlap`	number	RAG chunk overlap in words (default: 50)
`includeMetadata`	checkbox	Enable rich metadata (25+ fields)
`listAvailableLanguages`	checkbox	Show all available languages
`since`	date	Only videos after this date
`until`	date	Only videos before this date
`proxyUrl`	text	Custom proxy URL (empty = auto Apify proxy)

Examples

# Get English transcript with full metadata
urls: https://youtube.com/watch?v=dQw4w9WgXcQ
language: en

# Get Japanese transcript and translate to English
urls: https://youtube.com/watch?v=dQw4w9WgXcQ
language: ja
translateTo: en

# Get SRT subtitles for all videos from a channel
mode: channel
channels: MrBeast
outputFormat: srt
maxVideos: 10

# Extract transcripts from a YouTube playlist
mode: playlist
playlists: https://www.youtube.com/playlist?list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf
maxVideos: 50

# List all available languages
urls: https://youtube.com/watch?v=dQw4w9WgXcQ
listAvailableLanguages: true

# Batch extract with metadata (views, likes, geo-restriction)
urls: https://youtube.com/watch?v=abc123
      https://youtube.com/watch?v=def456
includeMetadata: true

# RAG chunking — split transcript into 200-word chunks with 50-word overlap
urls: https://youtube.com/watch?v=dQw4w9WgXcQ
chunkSize: 200
chunkOverlap: 50

# High-concurrency batch run
mode: channel
channels: MrBeast
maxVideos: 100
maxConcurrency: 10

Pricing

$2.00 per 1,000 transcripts — 60% cheaper than pintostudio ($10/1K)
Default run (20 videos) costs $0.04

Technical Details

Built on youtube-transcript-api v1.2.4
Rich metadata via ytInitialPlayerResponse + ytInitialData extraction
Supports auto-generated and manual captions
Translation uses YouTube's built-in translation API
Channel videos discovered via scrapetube
Playlist videos discovered via scrapetube.get_playlist()
Automatic Apify residential proxy for full metadata
Concurrent processing with asyncio.Semaphore (configurable parallelism)
Automatic retry (3 attempts, 2-second delay) for transcript fetch failures
Per-video error tracking and partial result recovery

Limitations

Some videos may not have transcripts available
Auto-translation quality varies by language
Full metadata (views, likes, geo-restriction) requires proxy on cloud IPs
Channel mode gets recent videos only (not full archive)
Geo-restriction detection may not cover all cases

📝 YouTube Transcript Scraper - Captions to Text

benthepythondev/youtube-transcript-scraper

Extract transcripts from any YouTube video with captions. Supports 100+ languages, auto-generated captions, and translation. Output as plain text, SRT, VTT, or JSON with timestamps. Includes video metadata (title, channel, views). Perfect for content repurposing and AI training.

Ben

145

YouTube Transcript Scraper

automation-lab/youtube-transcript

Extract timestamped transcripts & subtitles from any public YouTube video. Batch hundreds of URLs, 100+ languages with auto-fallback, no API key required. Returns full video metadata + segments. Export JSON, CSV, Excel.

Stas Persiianenko

339

YouTube Transcript & Subtitle Scraper

abotapi/youtube-transcript-scraper

Extract transcripts and subtitles from YouTube videos in bulk using video, playlist, channel URLs, or keyword search. Returns timed transcript segments, plain text, SRT, and WebVTT subtitle files, with optional auto-translation to other languages.

Abot API

YouTube Transcript Scraper

apt_marble/youtube-transcript-scraper

🎥 Extract transcripts, subtitles, and captions from any YouTube video. Get timestamped text, auto-translate to 100+ languages, and export in SRT, WEBVTT, XML, or plaintext formats.

Hamza

YouTube Transcript Generator - Bulk Captions & Subtitles

convertfleetdotonline/youtube-transcript-scraper

YouTube transcript generator and extractor - pull full transcripts, captions and SRT subtitles in bulk, with timestamps, in 100+ languages. No API key, no quota. Paste many YouTube URLs at once.

Hasnain Nisar

YouTube Shorts/Videos - SRT/VTT/Whisper AI Fallback/Translate

memo23/youtube-video-details-scraper

💰$12 per month only. Extract YouTube transcripts as SRT, VTT, JSON or plain text from any video, Short or embed. 4-source extraction (captions → DownSub → yt-dlp → Whisper AI for caption-less videos), auto-translate to 100+ languages, plus full metadata, engagement stats & optional MP4 download.

Muhamed Didovic

208

5.0

(4)

YouTube Transcript Scraper - Subtitles, Captions & Channels

scrapesage/youtube-transcript-scraper

Extract YouTube transcripts, subtitles & captions in bulk — from video URLs, entire channels, playlists, or search. Timestamped segments, SRT/VTT, auto + manual captions, translation to 100+ languages, rich video metadata, and monitoring for new uploads. No login, no API key, no browser.

Scrape Sage

YouTube Video 360 Intelligence — Bundle Scraper

sian.agency/youtube-video-360-intelligence

The most complete YouTube video snapshot on Apify. Metadata, transcript, related videos, and subtitle languages in one parallel run. Built for AI training, journalism, video SEO, and content intelligence. For comments, pair with our dedicated comments actor.

SIÁN OÜ

YouTube Transcript Scraper - JSON, SRT, VTT, RAG

jamhimself/youtube-transcript-extractor

Extract transcripts from YouTube videos. Input: video URLs or IDs + language preferences. Output: plain text, timestamped segments, SRT/VTT subtitles, and RAG chunks with deep links. 100+ languages, no API key. $0.0075 per delivered transcript.

Jaime Martinez

YouTube Transcript Scraper (Multiple Language)

dead00/youtube-transcript-scraper-multiple-language

A powerful actor that extracts transcripts/captions from YouTube videos with built-in translation support for 100+ languages.

Dead

YouTube Transcript Scraper - Captions, Subtitles & RAG API

meilimiao/yt-transcript-captions-api

Extract YouTube transcripts, captions, subtitles, timestamps, metadata, SRT, WebVTT, Markdown, and RAG-ready chunks from videos, Shorts, channels, and playlists. Works with bulk URLs and residential proxy support.