YouTube Transcript Scraper API | Captions & Subtitles
Pricing
from $9.00 / 1,000 successful video transcripts
YouTube Transcript Scraper API | Captions & Subtitles
Extract YouTube transcripts, captions, subtitles, and timestamped transcript segments from YouTube URLs, Shorts, or video IDs. No YouTube API key required. Residential proxy enabled. Use from Apify Console or API.
Pricing
from $9.00 / 1,000 successful video transcripts
Rating
0.0
(0)
Developer
Inus Grobler
Maintained by CommunityActor stats
1
Bookmarked
16
Total users
2
Monthly active users
a day ago
Last modified
Categories
Share
YouTube Transcript Scraper API
At a glance: what it does is extract timestamped YouTube transcripts and captions; input examples are YouTube watch, Shorts, youtu.be URLs, or raw video IDs; output examples are transcript segment rows and an OUTPUT summary; use cases include SEO, RAG, and content repurposing; limitations, troubleshooting, and pricing/cost notes are covered below.
Extract timestamped YouTube transcripts, captions, subtitles, and transcript segments from YouTube video URLs, Shorts URLs, or video IDs. This Actor is for SEO research, AI summaries, content repurposing, media monitoring, RAG pipelines, and teams that need YouTube transcript data without using the official YouTube Data API.
The Actor returns clean segment-level rows with timestamps when captions expose timing data. It uses multiple transcript sources, Apify proxy routing on cloud runs, streaming dataset writes, and per-video retries to improve reliability while keeping memory and compute cost low.
Main Use Cases
- Build YouTube caption and subtitle datasets.
- Summarize YouTube videos with LLMs.
- Analyze podcasts, webinars, interviews, courses, and creator content.
- Extract quotes, timestamps, and text segments for articles or clips.
- Feed transcript segments into search, vector databases, or RAG workflows.
- Monitor YouTube videos for brand, product, topic, or keyword mentions.
What Data You Get
Each dataset item is one transcript segment. A single video can produce many rows.
| Field | Description |
|---|---|
video_id | YouTube video ID. |
url | Canonical YouTube watch URL. |
title | Video title when metadata is enabled and available. |
channel_name | Channel name when metadata is enabled and available. |
status | found or missing. |
language | Transcript language code. |
source | Transcript source that returned the result. |
piece_index | Segment number within the transcript. |
piece_count | Total segment count for the video. |
piece_start | Segment start time in seconds, when available. |
piece_dur | Segment duration in seconds, when available. |
text | Transcript segment text. |
word_count | Word count for this segment. |
transcript_word_count | Word count for the full transcript. |
error | Error code when a transcript is missing. |
The key-value store record OUTPUT contains totals, warnings, source timing summaries, billing details, and whether streaming output was enabled.
Input
| Field | Required | Description |
|---|---|---|
videoUrls | Yes | YouTube watch URLs, Shorts URLs, youtu.be links, or raw video IDs. |
preferredLanguages | No | Language codes to prioritize, such as en, en-us, or es. |
maxChars | No | Maximum normalized transcript characters per video. |
fetchVideoMeta | No | Adds available title and channel metadata. Leave off for lowest cost. |
youtubeCookies | No | Optional cookie string or Netscape cookie file content for stricter bot-check cases. Most runs do not need this. |
Example input:
{"videoUrls": ["https://www.youtube.com/watch?v=9bZkp7q19f0"],"preferredLanguages": ["en", "en-us"],"maxChars": 1000,"fetchVideoMeta": false}
Example Output
{"video_id": "9bZkp7q19f0","url": "https://www.youtube.com/watch?v=9bZkp7q19f0","status": "found","language": "en","source": "youtubei_player_api","record_type": "transcript_piece","piece_index": 2,"piece_count": 61,"piece_start": 18.64,"piece_dur": 3.24,"text": "Transcript segment text appears here.","word_count": 5,"transcript_word_count": 487,"error": ""}
How To Run On Apify
- Open the Actor in Apify Console.
- Add one or more YouTube URLs or video IDs.
- Keep
fetchVideoMetaoff for transcript-only runs. - Run the Actor.
- Open the Dataset tab to preview, filter, or export transcript rows.
Exporting Results
Download results from the Dataset tab as JSON, JSONL, CSV, Excel, XML, or RSS. API users can read rows from the run's defaultDatasetId.
import osfrom apify_client import ApifyClientclient = ApifyClient(os.environ["APIFY_TOKEN"])run = client.actor("thescrapelab/Apify-YouTube-Transcript-Scraper-2-0").call(run_input={"videoUrls": ["https://www.youtube.com/watch?v=9bZkp7q19f0"],"preferredLanguages": ["en", "en-us"],"maxChars": 1000,"fetchVideoMeta": False,})items = client.dataset(run["defaultDatasetId"]).list_items(clean=True).itemsfor item in items[:5]:print(item["video_id"], item["piece_start"], item["text"])
Limits And Caveats
- Transcripts depend on caption/subtitle availability for each video.
- Private, removed, region-restricted, age-restricted, live, or bot-check-protected videos may return missing rows.
- Metadata fetching adds extra requests and can increase runtime and proxy cost.
- Duplicate input videos are not re-scraped, but duplicate rows are still returned for compatibility.
- The Actor does not use the official YouTube Data API.
Pricing
Recommended pricing is pay per event: charge once per successful video transcript. Dataset row pricing should be set to zero in Apify Console because one useful video result can produce many transcript segment rows.
Measured cost-optimized setting:
- Memory:
128 MB - Timeout:
120-240 seconds, depending on batch size - Processing mode: bounded internal parallel processing with streaming dataset writes
- Recommended event:
video-transcript - Recommended event price:
$0.009per successful video transcript
Current Store pricing may still show dataset item pricing if it has not been updated in Apify Console. For predictable client costs, use the successful-video event as the primary billable unit.
Troubleshooting
| Problem | What to try |
|---|---|
| No transcript found | Confirm the video has captions or subtitles available. |
| Run is slow | Keep fetchVideoMeta disabled and split very large video lists into smaller batches. |
| Some videos are missing | Try preferred language fallbacks such as ["en", "en-us", "es"]. |
| Bot-check errors | Provide youtubeCookies only if the target videos require authenticated access. |
| Unexpected duplicate rows | Remove duplicate video URLs from input if you only want one copy of each transcript. |
FAQ
Can this Actor scrape YouTube transcripts without a YouTube API key?
Yes. It extracts captions and transcript data from public web transcript sources and does not require a YouTube Data API key.
Does it work with YouTube Shorts?
Yes. You can provide Shorts URLs, watch URLs, youtu.be links, or raw video IDs.
Why does one video return many dataset rows?
The Actor returns timestamped transcript segments. This makes filtering, search, and downstream AI processing easier than a single large text blob.
Can I get one row per video?
The primary dataset is segment-based for compatibility. Use video_id to group rows into a full transcript in your downstream workflow.
What is the best low-cost setting?
Use 128 MB and keep fetchVideoMeta off unless title and channel metadata are required.