YouTube Transcript Scraper API | Captions & Subtitles
Pricing
from $1.00 / 1,000 results
YouTube Transcript Scraper API | Captions & Subtitles
Extract YouTube transcripts, captions, subtitles, and timestamped transcript segments from YouTube URLs, Shorts, or video IDs. No YouTube API key required. Residential proxy enabled. Use from Apify Console or API.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer
Inus Grobler
Maintained by CommunityActor stats
1
Bookmarked
12
Total users
2
Monthly active users
6 days ago
Last modified
Categories
Share
YouTube Transcript Scraper API | Extract YouTube Captions, Subtitles, and Transcript Segments
Extract YouTube transcripts, captions, subtitles, and timestamped text segments from YouTube video URLs, Shorts URLs, or video IDs. This Apify Actor is built for SEO research, content repurposing, podcast and webinar analysis, AI training workflows, lead research, media monitoring, and any workflow that needs clean YouTube transcript data without using the official YouTube Data API.
The Actor runs with residential proxy routing on Apify and uses multiple transcript sources to improve reliability when YouTube blocks direct requests.
Actor ID
Use this Actor ID in Apify API integrations:
thescrapelab/Apify-YouTube-Transcript-Scraper-2-0
What This YouTube Transcript Scraper Does
- Extracts timestamped transcript segments from YouTube videos.
- Supports YouTube watch URLs, Shorts URLs, and raw video IDs.
- Prioritizes your preferred transcript languages, such as
enoren-us. - Returns one dataset item per transcript segment for easy filtering and downstream processing.
- Works as a YouTube transcript API for Python, JavaScript, no-code automations, and data pipelines.
- Automatically retries transient transcript-source failures.
- Uses Apify residential proxy routing by default in cloud runs.
- Does not require a YouTube API key.
Common Use Cases
- Build datasets from YouTube captions and subtitles.
- Summarize YouTube videos with LLMs.
- Analyze competitor videos, podcasts, webinars, and creator content.
- Extract quotes and timestamps for content repurposing.
- Feed transcript segments into search, vector databases, or RAG pipelines.
- Monitor YouTube content for brand, product, or keyword mentions.
Quick Start
- Open the Actor in Apify Console.
- Add one or more YouTube URLs or video IDs to
videoUrls. - Choose preferred languages, for example
["en", "en-us"]. - Run the Actor.
- Download transcript segments from the dataset or read them through the Apify API.
Input
Provide one or more YouTube video targets.
| Field | Type | Required | Description |
|---|---|---|---|
videoUrls | array | Yes | YouTube watch URLs, Shorts URLs, or direct video IDs. |
preferredLanguages | array | No | Language codes to prioritize, for example ["en", "en-us"]. |
maxChars | integer | No | Maximum transcript length per video after normalization. |
fetchVideoMeta | boolean | No | Include available video title and channel metadata. |
youtubeCookies | string | No | Optional cookies for stricter bot-check cases. Most runs do not need this. |
Example input:
{"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],"preferredLanguages": ["en", "en-us"],"maxChars": 50000,"fetchVideoMeta": false,"youtubeCookies": ""}
Output
Dataset rows are transcript segments, not one row per video. A single YouTube video can produce many result rows because each caption or subtitle segment is returned separately.
Common output fields:
| Field | Description |
|---|---|
video_id | YouTube video ID. |
url | Canonical YouTube watch URL. |
title | Video title when metadata is enabled and available. |
channel_name | Channel name when metadata is enabled and available. |
status | found or missing. |
language | Transcript language code. |
source | Transcript source used by the Actor. |
piece_index | Transcript segment number. |
piece_count | Total segment count for the video. |
piece_start | Segment start time in seconds, when available. |
piece_dur | Segment duration in seconds, when available. |
text | Transcript segment text. |
word_count | Word count for the segment. |
transcript_word_count | Word count for the full transcript. |
error | Error code for missing transcripts. |
Example dataset item:
{"video_id": "dQw4w9WgXcQ","url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ","status": "found","language": "en","source": "youtubei_player_api","record_type": "transcript_piece","piece_index": 1,"piece_count": 61,"piece_start": 1.36,"piece_dur": 1.68,"text": "[...]","word_count": 1,"transcript_word_count": 487,"error": ""}
The key-value store record OUTPUT contains run metadata, totals, warnings, transcript-source timings, and billing details.
Pricing
Current pricing is:
$0.009per successful video transcript event.$0.001per dataset result row.- Actor start pricing may also apply according to the Apify pricing panel.
Because output is segment-based, longer videos can produce more result rows than short videos. For cost-sensitive workflows, keep input batches focused and test with a small number of videos first.
API Integration
You can use this Actor as a YouTube transcript API from any application that can call the Apify API. The API returns a run object with a defaultDatasetId; use that dataset ID to read transcript segment rows.
Python API Example
Install the Apify API client:
$pip install apify-client
Run the Actor and read transcript segments from the default dataset:
import osfrom apify_client import ApifyClientclient = ApifyClient(os.environ["APIFY_TOKEN"])run_input = {"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ",],"preferredLanguages": ["en", "en-us"],"maxChars": 50000,"fetchVideoMeta": False,}run = client.actor("thescrapelab/Apify-YouTube-Transcript-Scraper-2-0").call(run_input=run_input)dataset_id = run["defaultDatasetId"]items = client.dataset(dataset_id).list_items(clean=True).itemsfor item in items[:5]:print(item["video_id"], item["piece_start"], item["text"])
REST API Example
Start a run with the Apify REST API:
curl "https://api.apify.com/v2/acts/thescrapelab~Apify-YouTube-Transcript-Scraper-2-0/runs?token=$APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],"preferredLanguages": ["en", "en-us"],"maxChars": 50000,"fetchVideoMeta": false}'
After the run finishes, read transcript rows from the run's default dataset:
$curl "https://api.apify.com/v2/datasets/YOUR_DATASET_ID/items?clean=true&token=$APIFY_TOKEN"
Reliability Notes
- Transcript availability depends on whether captions or subtitles are available for the video.
- Some private, removed, region-restricted, age-restricted, or bot-check-protected videos may not return transcripts.
- The Actor does not use the official YouTube Data API.
- Residential proxy routing is always enabled for Apify cloud runs because direct YouTube requests are frequently blocked.
- Request timeout, concurrency, proxy country, proxy pool size, and heavy fallback behavior are managed internally for reliability.
Best Practices
- Start with a small test batch before running large lists.
- Use
preferredLanguagesto prioritize the transcript language you need. - Keep
fetchVideoMetadisabled for fastest transcript-only extraction. - Use
youtubeCookiesonly when you know the target videos require authenticated or stricter bot-check handling.