YouTube Transcript Scraper-m2
Pricing
Pay per usage
YouTube Transcript Scraper-m2
Extracts full transcripts from YouTube videos using Crawlee. Provide a video URL or ID.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Mahir Sutar
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
8 days ago
Last modified
Categories
Share
YouTube Transcript Scraper
Extracts transcripts from YouTube videos using Crawlee + Playwright.
Works locally and deploys directly to Apify with zero changes.
How it works
Three-strategy waterfall — stops at the first success:
- ytInitialPlayerResponse — parses the JS blob YouTube embeds in every page; extracts the caption track URL and fetches it directly (fastest, no UI interaction needed)
- Network intercept — opens the transcript panel and captures the
/api/timedtextresponse as it fires - DOM scraping — reads the rendered transcript panel segments as a last resort
No third-party transcript libraries → no proxy issues, no IP blocks from PyPI-style packages.
Local usage
1. Install
npm installnpx playwright install chromium # one-time browser download
2. Run
# Full URLYT_VIDEO_URL="https://www.youtube.com/watch?v=dQw4w9WgXcQ" npm start# Short URLYT_VIDEO_URL="https://youtu.be/dQw4w9WgXcQ" npm start# Bare video IDYT_VIDEO_ID="dQw4w9WgXcQ" npm start
Output is printed to the console and saved to ./storage/datasets/default/.
Apify deployment
Option A — Apify CLI
npm install -g apify-cliapify loginapify push
Then run the Actor from the Apify Console with input:
{ "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" }
Option B — Apify Console UI
- Create a new Actor → choose "Empty project"
- Upload this folder (or connect your GitHub repo)
- Set the Dockerfile path to
Dockerfile - Build and run
Output schema
Each run saves one JSON object to the dataset:
{"videoId": "dQw4w9WgXcQ","url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ","title": "Rick Astley - Never Gonna Give You Up","language": "en","source": "json3","segmentCount": 142,"segments": [{ "start": "0:00", "startMs": 0, "duration": "0:03", "durationMs": 3000, "text": "We're no strangers to love" },...],"fullText": "We're no strangers to love ...","fetchedAt": "2024-01-01T00:00:00.000Z"}
Supported input formats
| Format | Example |
|---|---|
| Full watch URL | https://www.youtube.com/watch?v=dQw4w9WgXcQ |
| Short URL | https://youtu.be/dQw4w9WgXcQ |
| Shorts URL | https://www.youtube.com/shorts/dQw4w9WgXcQ |
| Embed URL | https://www.youtube.com/embed/dQw4w9WgXcQ |
| Bare ID | dQw4w9WgXcQ |
| With timestamp | https://youtu.be/dQw4w9WgXcQ?t=42 |
Troubleshooting
No transcript found — The video may have captions disabled or only auto-generated captions in a non-English language. The scraper prefers English tracks but will fall back to the first available track.
Playwright browser not found — Run npx playwright install chromium.
Rate limiting — Add a delay between runs or use Apify's built-in proxy pool when deploying.