YouTube Transcript Scraper-m2 avatar

YouTube Transcript Scraper-m2

Pricing

Pay per usage

Go to Apify Store
YouTube Transcript Scraper-m2

YouTube Transcript Scraper-m2

Extracts full transcripts from YouTube videos using Crawlee. Provide a video URL or ID.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Mahir Sutar

Mahir Sutar

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

8 days ago

Last modified

Categories

Share

YouTube Transcript Scraper

Extracts transcripts from YouTube videos using Crawlee + Playwright.
Works locally and deploys directly to Apify with zero changes.

How it works

Three-strategy waterfall — stops at the first success:

  1. ytInitialPlayerResponse — parses the JS blob YouTube embeds in every page; extracts the caption track URL and fetches it directly (fastest, no UI interaction needed)
  2. Network intercept — opens the transcript panel and captures the /api/timedtext response as it fires
  3. DOM scraping — reads the rendered transcript panel segments as a last resort

No third-party transcript libraries → no proxy issues, no IP blocks from PyPI-style packages.


Local usage

1. Install

npm install
npx playwright install chromium # one-time browser download

2. Run

# Full URL
YT_VIDEO_URL="https://www.youtube.com/watch?v=dQw4w9WgXcQ" npm start
# Short URL
YT_VIDEO_URL="https://youtu.be/dQw4w9WgXcQ" npm start
# Bare video ID
YT_VIDEO_ID="dQw4w9WgXcQ" npm start

Output is printed to the console and saved to ./storage/datasets/default/.


Apify deployment

Option A — Apify CLI

npm install -g apify-cli
apify login
apify push

Then run the Actor from the Apify Console with input:

{ "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" }

Option B — Apify Console UI

  1. Create a new Actor → choose "Empty project"
  2. Upload this folder (or connect your GitHub repo)
  3. Set the Dockerfile path to Dockerfile
  4. Build and run

Output schema

Each run saves one JSON object to the dataset:

{
"videoId": "dQw4w9WgXcQ",
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"title": "Rick Astley - Never Gonna Give You Up",
"language": "en",
"source": "json3",
"segmentCount": 142,
"segments": [
{ "start": "0:00", "startMs": 0, "duration": "0:03", "durationMs": 3000, "text": "We're no strangers to love" },
...
],
"fullText": "We're no strangers to love ...",
"fetchedAt": "2024-01-01T00:00:00.000Z"
}

Supported input formats

FormatExample
Full watch URLhttps://www.youtube.com/watch?v=dQw4w9WgXcQ
Short URLhttps://youtu.be/dQw4w9WgXcQ
Shorts URLhttps://www.youtube.com/shorts/dQw4w9WgXcQ
Embed URLhttps://www.youtube.com/embed/dQw4w9WgXcQ
Bare IDdQw4w9WgXcQ
With timestamphttps://youtu.be/dQw4w9WgXcQ?t=42

Troubleshooting

No transcript found — The video may have captions disabled or only auto-generated captions in a non-English language. The scraper prefers English tracks but will fall back to the first available track.

Playwright browser not found — Run npx playwright install chromium.

Rate limiting — Add a delay between runs or use Apify's built-in proxy pool when deploying.