YouTube Transcript Scraper API | Captions & Subtitles avatar

YouTube Transcript Scraper API | Captions & Subtitles

Pricing

from $1.00 / 1,000 results

Go to Apify Store
YouTube Transcript Scraper API | Captions & Subtitles

YouTube Transcript Scraper API | Captions & Subtitles

Extract YouTube transcripts, captions, subtitles, and timestamped transcript segments from YouTube URLs, Shorts, or video IDs. No YouTube API key required. Residential proxy enabled. Use from Apify Console or API.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Inus Grobler

Inus Grobler

Maintained by Community

Actor stats

1

Bookmarked

12

Total users

2

Monthly active users

6 days ago

Last modified

Share

YouTube Transcript Scraper API | Extract YouTube Captions, Subtitles, and Transcript Segments

Extract YouTube transcripts, captions, subtitles, and timestamped text segments from YouTube video URLs, Shorts URLs, or video IDs. This Apify Actor is built for SEO research, content repurposing, podcast and webinar analysis, AI training workflows, lead research, media monitoring, and any workflow that needs clean YouTube transcript data without using the official YouTube Data API.

The Actor runs with residential proxy routing on Apify and uses multiple transcript sources to improve reliability when YouTube blocks direct requests.

Actor ID

Use this Actor ID in Apify API integrations:

thescrapelab/Apify-YouTube-Transcript-Scraper-2-0

What This YouTube Transcript Scraper Does

  • Extracts timestamped transcript segments from YouTube videos.
  • Supports YouTube watch URLs, Shorts URLs, and raw video IDs.
  • Prioritizes your preferred transcript languages, such as en or en-us.
  • Returns one dataset item per transcript segment for easy filtering and downstream processing.
  • Works as a YouTube transcript API for Python, JavaScript, no-code automations, and data pipelines.
  • Automatically retries transient transcript-source failures.
  • Uses Apify residential proxy routing by default in cloud runs.
  • Does not require a YouTube API key.

Common Use Cases

  • Build datasets from YouTube captions and subtitles.
  • Summarize YouTube videos with LLMs.
  • Analyze competitor videos, podcasts, webinars, and creator content.
  • Extract quotes and timestamps for content repurposing.
  • Feed transcript segments into search, vector databases, or RAG pipelines.
  • Monitor YouTube content for brand, product, or keyword mentions.

Quick Start

  1. Open the Actor in Apify Console.
  2. Add one or more YouTube URLs or video IDs to videoUrls.
  3. Choose preferred languages, for example ["en", "en-us"].
  4. Run the Actor.
  5. Download transcript segments from the dataset or read them through the Apify API.

Input

Provide one or more YouTube video targets.

FieldTypeRequiredDescription
videoUrlsarrayYesYouTube watch URLs, Shorts URLs, or direct video IDs.
preferredLanguagesarrayNoLanguage codes to prioritize, for example ["en", "en-us"].
maxCharsintegerNoMaximum transcript length per video after normalization.
fetchVideoMetabooleanNoInclude available video title and channel metadata.
youtubeCookiesstringNoOptional cookies for stricter bot-check cases. Most runs do not need this.

Example input:

{
"videoUrls": [
"https://www.youtube.com/watch?v=dQw4w9WgXcQ"
],
"preferredLanguages": ["en", "en-us"],
"maxChars": 50000,
"fetchVideoMeta": false,
"youtubeCookies": ""
}

Output

Dataset rows are transcript segments, not one row per video. A single YouTube video can produce many result rows because each caption or subtitle segment is returned separately.

Common output fields:

FieldDescription
video_idYouTube video ID.
urlCanonical YouTube watch URL.
titleVideo title when metadata is enabled and available.
channel_nameChannel name when metadata is enabled and available.
statusfound or missing.
languageTranscript language code.
sourceTranscript source used by the Actor.
piece_indexTranscript segment number.
piece_countTotal segment count for the video.
piece_startSegment start time in seconds, when available.
piece_durSegment duration in seconds, when available.
textTranscript segment text.
word_countWord count for the segment.
transcript_word_countWord count for the full transcript.
errorError code for missing transcripts.

Example dataset item:

{
"video_id": "dQw4w9WgXcQ",
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"status": "found",
"language": "en",
"source": "youtubei_player_api",
"record_type": "transcript_piece",
"piece_index": 1,
"piece_count": 61,
"piece_start": 1.36,
"piece_dur": 1.68,
"text": "[...]",
"word_count": 1,
"transcript_word_count": 487,
"error": ""
}

The key-value store record OUTPUT contains run metadata, totals, warnings, transcript-source timings, and billing details.

Pricing

Current pricing is:

  • $0.009 per successful video transcript event.
  • $0.001 per dataset result row.
  • Actor start pricing may also apply according to the Apify pricing panel.

Because output is segment-based, longer videos can produce more result rows than short videos. For cost-sensitive workflows, keep input batches focused and test with a small number of videos first.

API Integration

You can use this Actor as a YouTube transcript API from any application that can call the Apify API. The API returns a run object with a defaultDatasetId; use that dataset ID to read transcript segment rows.

Python API Example

Install the Apify API client:

$pip install apify-client

Run the Actor and read transcript segments from the default dataset:

import os
from apify_client import ApifyClient
client = ApifyClient(os.environ["APIFY_TOKEN"])
run_input = {
"videoUrls": [
"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
],
"preferredLanguages": ["en", "en-us"],
"maxChars": 50000,
"fetchVideoMeta": False,
}
run = client.actor("thescrapelab/Apify-YouTube-Transcript-Scraper-2-0").call(
run_input=run_input
)
dataset_id = run["defaultDatasetId"]
items = client.dataset(dataset_id).list_items(clean=True).items
for item in items[:5]:
print(item["video_id"], item["piece_start"], item["text"])

REST API Example

Start a run with the Apify REST API:

curl "https://api.apify.com/v2/acts/thescrapelab~Apify-YouTube-Transcript-Scraper-2-0/runs?token=$APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
"preferredLanguages": ["en", "en-us"],
"maxChars": 50000,
"fetchVideoMeta": false
}'

After the run finishes, read transcript rows from the run's default dataset:

$curl "https://api.apify.com/v2/datasets/YOUR_DATASET_ID/items?clean=true&token=$APIFY_TOKEN"

Reliability Notes

  • Transcript availability depends on whether captions or subtitles are available for the video.
  • Some private, removed, region-restricted, age-restricted, or bot-check-protected videos may not return transcripts.
  • The Actor does not use the official YouTube Data API.
  • Residential proxy routing is always enabled for Apify cloud runs because direct YouTube requests are frequently blocked.
  • Request timeout, concurrency, proxy country, proxy pool size, and heavy fallback behavior are managed internally for reliability.

Best Practices

  • Start with a small test batch before running large lists.
  • Use preferredLanguages to prioritize the transcript language you need.
  • Keep fetchVideoMeta disabled for fastest transcript-only extraction.
  • Use youtubeCookies only when you know the target videos require authenticated or stricter bot-check handling.