YouTube Transcript Scraper API | Captions & Subtitles avatar

YouTube Transcript Scraper API | Captions & Subtitles

Pricing

from $9.00 / 1,000 successful video transcripts

Go to Apify Store
YouTube Transcript Scraper API | Captions & Subtitles

YouTube Transcript Scraper API | Captions & Subtitles

Extract YouTube transcripts, captions, subtitles, and timestamped transcript segments from YouTube URLs, Shorts, or video IDs. No YouTube API key required. Residential proxy enabled. Use from Apify Console or API.

Pricing

from $9.00 / 1,000 successful video transcripts

Rating

0.0

(0)

Developer

Inus Grobler

Inus Grobler

Maintained by Community

Actor stats

1

Bookmarked

16

Total users

2

Monthly active users

a day ago

Last modified

Share

YouTube Transcript Scraper API

At a glance: what it does is extract timestamped YouTube transcripts and captions; input examples are YouTube watch, Shorts, youtu.be URLs, or raw video IDs; output examples are transcript segment rows and an OUTPUT summary; use cases include SEO, RAG, and content repurposing; limitations, troubleshooting, and pricing/cost notes are covered below.

Extract timestamped YouTube transcripts, captions, subtitles, and transcript segments from YouTube video URLs, Shorts URLs, or video IDs. This Actor is for SEO research, AI summaries, content repurposing, media monitoring, RAG pipelines, and teams that need YouTube transcript data without using the official YouTube Data API.

The Actor returns clean segment-level rows with timestamps when captions expose timing data. It uses multiple transcript sources, Apify proxy routing on cloud runs, streaming dataset writes, and per-video retries to improve reliability while keeping memory and compute cost low.

Main Use Cases

  • Build YouTube caption and subtitle datasets.
  • Summarize YouTube videos with LLMs.
  • Analyze podcasts, webinars, interviews, courses, and creator content.
  • Extract quotes, timestamps, and text segments for articles or clips.
  • Feed transcript segments into search, vector databases, or RAG workflows.
  • Monitor YouTube videos for brand, product, topic, or keyword mentions.

What Data You Get

Each dataset item is one transcript segment. A single video can produce many rows.

FieldDescription
video_idYouTube video ID.
urlCanonical YouTube watch URL.
titleVideo title when metadata is enabled and available.
channel_nameChannel name when metadata is enabled and available.
statusfound or missing.
languageTranscript language code.
sourceTranscript source that returned the result.
piece_indexSegment number within the transcript.
piece_countTotal segment count for the video.
piece_startSegment start time in seconds, when available.
piece_durSegment duration in seconds, when available.
textTranscript segment text.
word_countWord count for this segment.
transcript_word_countWord count for the full transcript.
errorError code when a transcript is missing.

The key-value store record OUTPUT contains totals, warnings, source timing summaries, billing details, and whether streaming output was enabled.

Input

FieldRequiredDescription
videoUrlsYesYouTube watch URLs, Shorts URLs, youtu.be links, or raw video IDs.
preferredLanguagesNoLanguage codes to prioritize, such as en, en-us, or es.
maxCharsNoMaximum normalized transcript characters per video.
fetchVideoMetaNoAdds available title and channel metadata. Leave off for lowest cost.
youtubeCookiesNoOptional cookie string or Netscape cookie file content for stricter bot-check cases. Most runs do not need this.

Example input:

{
"videoUrls": [
"https://www.youtube.com/watch?v=9bZkp7q19f0"
],
"preferredLanguages": ["en", "en-us"],
"maxChars": 1000,
"fetchVideoMeta": false
}

Example Output

{
"video_id": "9bZkp7q19f0",
"url": "https://www.youtube.com/watch?v=9bZkp7q19f0",
"status": "found",
"language": "en",
"source": "youtubei_player_api",
"record_type": "transcript_piece",
"piece_index": 2,
"piece_count": 61,
"piece_start": 18.64,
"piece_dur": 3.24,
"text": "Transcript segment text appears here.",
"word_count": 5,
"transcript_word_count": 487,
"error": ""
}

How To Run On Apify

  1. Open the Actor in Apify Console.
  2. Add one or more YouTube URLs or video IDs.
  3. Keep fetchVideoMeta off for transcript-only runs.
  4. Run the Actor.
  5. Open the Dataset tab to preview, filter, or export transcript rows.

Exporting Results

Download results from the Dataset tab as JSON, JSONL, CSV, Excel, XML, or RSS. API users can read rows from the run's defaultDatasetId.

import os
from apify_client import ApifyClient
client = ApifyClient(os.environ["APIFY_TOKEN"])
run = client.actor("thescrapelab/Apify-YouTube-Transcript-Scraper-2-0").call(
run_input={
"videoUrls": ["https://www.youtube.com/watch?v=9bZkp7q19f0"],
"preferredLanguages": ["en", "en-us"],
"maxChars": 1000,
"fetchVideoMeta": False,
}
)
items = client.dataset(run["defaultDatasetId"]).list_items(clean=True).items
for item in items[:5]:
print(item["video_id"], item["piece_start"], item["text"])

Limits And Caveats

  • Transcripts depend on caption/subtitle availability for each video.
  • Private, removed, region-restricted, age-restricted, live, or bot-check-protected videos may return missing rows.
  • Metadata fetching adds extra requests and can increase runtime and proxy cost.
  • Duplicate input videos are not re-scraped, but duplicate rows are still returned for compatibility.
  • The Actor does not use the official YouTube Data API.

Pricing

Recommended pricing is pay per event: charge once per successful video transcript. Dataset row pricing should be set to zero in Apify Console because one useful video result can produce many transcript segment rows.

Measured cost-optimized setting:

  • Memory: 128 MB
  • Timeout: 120-240 seconds, depending on batch size
  • Processing mode: bounded internal parallel processing with streaming dataset writes
  • Recommended event: video-transcript
  • Recommended event price: $0.009 per successful video transcript

Current Store pricing may still show dataset item pricing if it has not been updated in Apify Console. For predictable client costs, use the successful-video event as the primary billable unit.

Troubleshooting

ProblemWhat to try
No transcript foundConfirm the video has captions or subtitles available.
Run is slowKeep fetchVideoMeta disabled and split very large video lists into smaller batches.
Some videos are missingTry preferred language fallbacks such as ["en", "en-us", "es"].
Bot-check errorsProvide youtubeCookies only if the target videos require authenticated access.
Unexpected duplicate rowsRemove duplicate video URLs from input if you only want one copy of each transcript.

FAQ

Can this Actor scrape YouTube transcripts without a YouTube API key?

Yes. It extracts captions and transcript data from public web transcript sources and does not require a YouTube Data API key.

Does it work with YouTube Shorts?

Yes. You can provide Shorts URLs, watch URLs, youtu.be links, or raw video IDs.

Why does one video return many dataset rows?

The Actor returns timestamped transcript segments. This makes filtering, search, and downstream AI processing easier than a single large text blob.

Can I get one row per video?

The primary dataset is segment-based for compatibility. Use video_id to group rows into a full transcript in your downstream workflow.

What is the best low-cost setting?

Use 128 MB and keep fetchVideoMeta off unless title and channel metadata are required.