Youtube Video Transcript Scraper avatar
Youtube Video Transcript Scraper

Pricing

$5.99/month + usage

Go to Apify Store
Youtube Video Transcript Scraper

Youtube Video Transcript Scraper

A powerful YouTube Video Transcript Scraper that instantly pulls clean, accurate captions from any video — perfect for creators, researchers, and AI workflows. Fast, reliable, and built to save your time.

Pricing

$5.99/month + usage

Rating

0.0

(0)

Developer

Neuro Scraper

Neuro Scraper

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

🌟 YouTube Video Transcript Scraper

Build Version License Apify

Accurate, timestamped transcripts for full-length YouTube videos — chapters, subtitles, and multi-language support.


📖 Overview

This Actor extracts clean, timestamped transcripts from full-length YouTube videos (standard watch URLs, youtu.be, and /watch?v= formats). It's designed for longer content: works with multiple caption tracks, handles chapters, and produces export-ready captions (SRT/VTT) alongside structured JSON suitable for analytics and LLM pipelines.


💡 Why Full-Video Focus?

  • Full videos often contain chapters, multiple speakers, and longer dialogues — transcripts must preserve timing and structure.
  • Supports official captions when available and high-quality ASR fallbacks when not.
  • Produces SRT/VTT, plain text, and structured JSON for downstream processing.

🔧 Key Features

  • ✅ Multi-format URL normalization (/watch?v=, youtu.be).
  • ✅ Prefer official caption tracks; fallback to ASR extraction when captions are missing.
  • ✅ Preserve chapters and video metadata (title, duration, thumbnails).
  • ✅ Export as JSON, plain text, SRT, and VTT.
  • ✅ Optional speaker diarization and language detection.
  • ✅ Configurable chunking for very long videos and resume/retry support.
  • ✅ Proxy-compatible and production-ready for large-scale jobs.

⚡ Quick Start — Console

  1. Open the Actor on Apify Console.
  2. Paste one or more YouTube video URLs into the input (watch links or youtu.be links accepted).
  3. Click Run — results appear in the Dataset and Files (SRT/VTT) tabs.

⚙️ Quick Start — CLI & Python

CLI

$apify call neuro-scraper/youtube-transcript-fetcher --input ./videos_input.json

Python (apify-client)

from apify_client import ApifyClient
client = ApifyClient('<APIFY_TOKEN>')
run = client.actor('neuro-scraper/youtube-transcript-fetcher').call(
run_input={
"startUrls": [{"url": "https://www.youtube.com/watch?v=EXAMPLE"}],
"workers": 3,
"exportFormats": ["json","srt","vtt"]
}
)
for item in client.dataset(run['defaultDatasetId']).list_items()['items']:
print(item['Transcript']['plain_text'][:400])

📝 Inputs (Video-focused)

NameTypeRequiredDefaultExampleNotes
startUrlsarrayYes[][{"url":"https://www.youtube.com/watch?v=abcd1234"}]List of YouTube video URLs
workersintegerOptional510Max concurrent fetches
exportFormatsarrayOptional["json"]["json","srt","vtt"]Output formats to generate
speakerDiarizationbooleanOptionalfalsetrueEnable speaker detection (best-effort)
languagestringOptionalnull"en"Force output language (ISO code)
proxyConfigurationobjectOptional{}{"useApifyProxy": true}Proxy settings

Example input (Console JSON):

{
"startUrls": [
{"url": "https://www.youtube.com/watch?v=abcd1234"},
{"url": "https://youtu.be/abcd1234"}
],
"workers": 5,
"exportFormats": ["json","srt","vtt"],
"speakerDiarization": true,
"proxyConfiguration": {"useApifyProxy": true}
}

📄 Outputs

Each Dataset item contains rich metadata and multiple transcript representations. Example:

{
"inputUrl": "https://www.youtube.com/watch?v=abcd1234",
"fetchedAt": "2025-11-04T10:00:00Z",
"success": true,
"video": {
"title": "Example Video",
"duration": 3720,
"chapters": [
{"title": "Intro", "start": 0},
{"title": "Main topic", "start": 60}
]
},
"Transcript": {
"plain_text": "Full transcript text...",
"with_timestamps": [
{"text": "Hello and welcome to the show.", "start": 0.2, "end": 4.5},
{"text": "Today we'll talk about...", "start": 5.0, "end": 9.3}
],
"speaker_segments": [
{"speaker": "Speaker 1", "start": 0.2, "end": 4.5, "text": "Hello and welcome to the show."}
]
},
"files": {
"srt": "runs/<runId>/files/abcd1234.srt",
"vtt": "runs/<runId>/files/abcd1234.vtt"
}
}

Notes: Files (SRT/VTT) are attached to the run and accessible from the Files tab for easy download.


🔑 Environment Variables

  • APIFY_TOKEN — required for authentication.
  • HTTP_PROXY, HTTPS_PROXY — optional custom proxies.
  • APIFY_PROXY_PASSWORD — use with Apify Proxy.

Store credentials securely as secrets — never in plaintext.


▶️ How to Run (short checklist)

  1. Open Apify Console → Actors → YouTube Transcript Fetcher.
  2. Provide video URLs (watch or youtu.be), set desired export formats, and toggle options.
  3. Run and inspect Dataset and Files tabs for JSON/SRT/VTT outputs.

🛠 Logs & Troubleshooting

  • No transcript available — video may lack captions and audio quality may be too poor for ASR.
  • Partial transcripts — long videos may be chunked; check run logs for retry or chunk status.
  • Timeouts / failures — lower workers or increase timeouts; enable proxy if region-restricted.

Monitor real-time logs in the Console Run Log panel for detailed error messages.


⏱ Scheduling & Webhooks

  • Schedule daily or weekly runs for channel-level ingestion.
  • Use Webhooks to push transcript files or Dataset updates to downstream systems (storage, search index, or ML pipelines).

🔟 Changelog

  • 1.0.0 — 2025-11-04: Initial release — full-video support.

📝 Notes & TODO

  • TODO: Add example of chapter-aware summarization pipeline.
  • TODO: Improve speaker diarization accuracy with optional external ASR.

✅ Final note

This README is designed for researchers, media teams, and engineers who need robust, exportable transcripts from full-length YouTube videos — suitable for analytics, captioning, and training data generation.