Youtube Video Transcript Scraper
Pricing
$5.99/month + usage
Youtube Video Transcript Scraper
A powerful YouTube Video Transcript Scraper that instantly pulls clean, accurate captions from any video — perfect for creators, researchers, and AI workflows. Fast, reliable, and built to save your time.
Pricing
$5.99/month + usage
Rating
0.0
(0)
Developer

Neuro Scraper
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
🌟 YouTube Video Transcript Scraper
Accurate, timestamped transcripts for full-length YouTube videos — chapters, subtitles, and multi-language support.
📖 Overview
This Actor extracts clean, timestamped transcripts from full-length YouTube videos (standard watch URLs, youtu.be, and /watch?v= formats). It's designed for longer content: works with multiple caption tracks, handles chapters, and produces export-ready captions (SRT/VTT) alongside structured JSON suitable for analytics and LLM pipelines.
💡 Why Full-Video Focus?
- Full videos often contain chapters, multiple speakers, and longer dialogues — transcripts must preserve timing and structure.
- Supports official captions when available and high-quality ASR fallbacks when not.
- Produces SRT/VTT, plain text, and structured JSON for downstream processing.
🔧 Key Features
- ✅ Multi-format URL normalization (
/watch?v=,youtu.be). - ✅ Prefer official caption tracks; fallback to ASR extraction when captions are missing.
- ✅ Preserve chapters and video metadata (title, duration, thumbnails).
- ✅ Export as JSON, plain text, SRT, and VTT.
- ✅ Optional speaker diarization and language detection.
- ✅ Configurable chunking for very long videos and resume/retry support.
- ✅ Proxy-compatible and production-ready for large-scale jobs.
⚡ Quick Start — Console
- Open the Actor on Apify Console.
- Paste one or more YouTube video URLs into the input (watch links or youtu.be links accepted).
- Click Run — results appear in the Dataset and Files (SRT/VTT) tabs.
⚙️ Quick Start — CLI & Python
CLI
$apify call neuro-scraper/youtube-transcript-fetcher --input ./videos_input.json
Python (apify-client)
from apify_client import ApifyClientclient = ApifyClient('<APIFY_TOKEN>')run = client.actor('neuro-scraper/youtube-transcript-fetcher').call(run_input={"startUrls": [{"url": "https://www.youtube.com/watch?v=EXAMPLE"}],"workers": 3,"exportFormats": ["json","srt","vtt"]})for item in client.dataset(run['defaultDatasetId']).list_items()['items']:print(item['Transcript']['plain_text'][:400])
📝 Inputs (Video-focused)
| Name | Type | Required | Default | Example | Notes |
|---|---|---|---|---|---|
startUrls | array | Yes | [] | [{"url":"https://www.youtube.com/watch?v=abcd1234"}] | List of YouTube video URLs |
workers | integer | Optional | 5 | 10 | Max concurrent fetches |
exportFormats | array | Optional | ["json"] | ["json","srt","vtt"] | Output formats to generate |
speakerDiarization | boolean | Optional | false | true | Enable speaker detection (best-effort) |
language | string | Optional | null | "en" | Force output language (ISO code) |
proxyConfiguration | object | Optional | {} | {"useApifyProxy": true} | Proxy settings |
Example input (Console JSON):
{"startUrls": [{"url": "https://www.youtube.com/watch?v=abcd1234"},{"url": "https://youtu.be/abcd1234"}],"workers": 5,"exportFormats": ["json","srt","vtt"],"speakerDiarization": true,"proxyConfiguration": {"useApifyProxy": true}}
📄 Outputs
Each Dataset item contains rich metadata and multiple transcript representations. Example:
{"inputUrl": "https://www.youtube.com/watch?v=abcd1234","fetchedAt": "2025-11-04T10:00:00Z","success": true,"video": {"title": "Example Video","duration": 3720,"chapters": [{"title": "Intro", "start": 0},{"title": "Main topic", "start": 60}]},"Transcript": {"plain_text": "Full transcript text...","with_timestamps": [{"text": "Hello and welcome to the show.", "start": 0.2, "end": 4.5},{"text": "Today we'll talk about...", "start": 5.0, "end": 9.3}],"speaker_segments": [{"speaker": "Speaker 1", "start": 0.2, "end": 4.5, "text": "Hello and welcome to the show."}]},"files": {"srt": "runs/<runId>/files/abcd1234.srt","vtt": "runs/<runId>/files/abcd1234.vtt"}}
Notes: Files (SRT/VTT) are attached to the run and accessible from the Files tab for easy download.
🔑 Environment Variables
APIFY_TOKEN— required for authentication.HTTP_PROXY,HTTPS_PROXY— optional custom proxies.APIFY_PROXY_PASSWORD— use with Apify Proxy.
Store credentials securely as secrets — never in plaintext.
▶️ How to Run (short checklist)
- Open Apify Console → Actors → YouTube Transcript Fetcher.
- Provide video URLs (watch or youtu.be), set desired export formats, and toggle options.
- Run and inspect Dataset and Files tabs for JSON/SRT/VTT outputs.
🛠 Logs & Troubleshooting
- No transcript available — video may lack captions and audio quality may be too poor for ASR.
- Partial transcripts — long videos may be chunked; check run logs for retry or chunk status.
- Timeouts / failures — lower
workersor increase timeouts; enable proxy if region-restricted.
Monitor real-time logs in the Console Run Log panel for detailed error messages.
⏱ Scheduling & Webhooks
- Schedule daily or weekly runs for channel-level ingestion.
- Use Webhooks to push transcript files or Dataset updates to downstream systems (storage, search index, or ML pipelines).
🔟 Changelog
- 1.0.0 — 2025-11-04: Initial release — full-video support.
📝 Notes & TODO
- TODO: Add example of chapter-aware summarization pipeline.
- TODO: Improve speaker diarization accuracy with optional external ASR.
✅ Final note
This README is designed for researchers, media teams, and engineers who need robust, exportable transcripts from full-length YouTube videos — suitable for analytics, captioning, and training data generation.