Youtube Transcript Scraper avatar
Youtube Transcript Scraper

Pricing

$9.00/month + usage

Go to Apify Store
Youtube Transcript Scraper

Youtube Transcript Scraper

⚑ Meet the Ultimate YouTube Transcript Hunter! ⚑ This Apify Actor dives deep into YouTube πŸŽ₯, extracts every word 🧠, and even revives lost subtitles like a digital sorcerer πŸͺ„. Fast. Smart. Unstoppable. Ready to fuel your next data breakthrough πŸš€πŸ€–πŸ”₯

Pricing

$9.00/month + usage

Rating

0.0

(0)

Developer

Neuro Scraper

Neuro Scraper

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

16 days ago

Last modified

Share

🌟 YouTube Transcript Fetcher Actor

Build Version License Apify

Instantly extract accurate YouTube transcripts β€” fast, secure, and production-ready.


πŸ“– Summary

This actor automatically fetches YouTube video transcripts (including Shorts), returning clean, timestamped text data. It uses a dual-source strategy to ensure transcripts are delivered even when standard captions are unavailable.

Key benefits:

  • ⚑ Get transcripts instantly from multiple YouTube URLs.
  • πŸ” Smart fallback ensures reliable results.
  • 🧠 Normalizes Shorts and youtu.be links automatically.
  • πŸ”’ Privacy-safe, proxy-compatible, and production-ready.

πŸ’‘ Use Cases

  • πŸ“° Generate blog summaries or subtitles from videos.
  • πŸŽ“ Extract transcripts for research or educational analysis.
  • πŸ“Š Analyze large-scale YouTube datasets for content insights.
  • 🧾 Auto-generate closed captions for your platform.
  • 🧠 Power AI models or LLM pipelines with real spoken text.

⚑ Quick Start (Console β€” One Click)

Apify Console Screenshot

  1. Open the Actor on Apify Console.
  2. Paste YouTube video URLs into the Input field.
  3. Click Run β€” results appear instantly in your Dataset.

βš™οΈ Quick Start (CLI + API)

CLI:

$apify call neuro-scraper/youtube-transcript-fetcher --input ./input.example.json

Python (apify-client):

from apify_client import ApifyClient
client = ApifyClient('<APIFY_TOKEN>')
run = client.actor('neuro-scraper/youtube-transcript-fetcher').call(
run_input={"startUrls": [{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}]}
)
for item in client.dataset(run['defaultDatasetId']).list_items()['items']:
print(item['Transcript']['plain_text'])

πŸ“ Inputs

πŸ”‘ NameπŸ“ Type❓ Requiredβš™οΈ DefaultπŸ“Œ Example🧠 Notes
startUrlsarrayβœ… Yes[][{"url": "https://www.youtube.com/watch?v=abcd1234"}]List of YouTube video URLs
workersintegerβš™οΈ Optional510Max concurrent fetches
proxyConfigurationobjectβš™οΈ Optional{}{"useApifyProxy": true}Proxy settings if needed

Example input (Console JSON):

{
"startUrls": [
{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"},
{"url": "https://youtu.be/example123"}
],
"workers": 5,
"proxyConfiguration": {"useApifyProxy": true}
}

πŸ“„ Outputs

Each item in the Dataset contains:

{
"video_id": "AWBsoArakNY",
"title": "Who Has The Fastest Reaction Time?",
"url": "https://youtube.com/AWBsoArakNY?si=8ThAJzdEEA1PnZRk",
"lang": "en",
"format": "vtt",
"segments": [
{
"start": 3.919,
"end": 5.99,
"text": "I [screaming] WON. I WON. OH, I SHOULD",
"duration": 2.071,
"duration_seconds": 2,
"duration_milliseconds": 71,
"duration_seconds_with_ms": "2.071",
"duration_minutes": 0.03451666666666667,
"start_ts": "00:00:03.919",
"end_ts": "00:00:05.990",
"display": "Transcripts:\nStart: 00:00:03.919 End: 00:00:05.990\nDuration: 2.071 seconds (0.034517 minutes)\n\nNo transcript added for this duration."
}
]
}

Results are stored in the default Dataset for easy export (JSON, CSV, Excel).


πŸ”‘ Environment Variables

VariableDescription
APIFY_TOKENRequired for authentication
HTTP_PROXY, HTTPS_PROXYOptional custom proxies
APIFY_PROXY_PASSWORDUse with Apify Proxy

Store all credentials securely as secrets, not plaintext.


▢️ How to Run

  1. Open Apify Console.
  2. Navigate to Actors β†’ YouTube Transcript Fetcher.
  3. Paste input JSON or fill the input form.
  4. Click Run.
  5. View results in the Dataset tab.

⏰ Scheduling & Webhooks

  • Schedule periodic runs (e.g., daily or hourly) from the Schedule tab.
  • Configure Webhooks to trigger a custom workflow or send notifications on completion.

πŸ•Ύ Logs & Troubleshooting

  • Monitor real-time logs in the Console Run Log panel.

  • Common issues:

    • ❌ No transcript available: Video may lack captions.
    • ⚠️ Timeout errors: Increase workers or adjust proxy settings.

πŸ”’ Permissions & Storage

  • Uses Dataset for storing transcript results.
  • Uses RequestQueue internally for managing URL processing.
  • Fully privacy-safe: no personal data stored or shared.

πŸ”Ÿ Changelog

VersionDateNotes
1.0.02025-11-04Initial release β€” stable and production-ready

πŸ–Œ Notes / TODOs

  • TODO: Confirm output schema for advanced use-cases.
  • TODO: Add demo GIF of console run for better UX.

🌍 Proxy Configuration

Enable Apify Proxy directly in the Console for easy network routing.

Custom proxy example:

{
"proxyConfiguration": {
"proxyUrls": ["http://<PROXY_USER:PASS@HOST:PORT>"]
}
}

Or use environment variables:

export HTTP_PROXY=http://<PROXY_USER:PASS@HOST:PORT>
export HTTPS_PROXY=http://<PROXY_USER:PASS@HOST:PORT>

Best practice: Store proxy credentials as secrets.

TODO: Consider proxy rotation for large-scale scraping.


πŸ“š References


πŸ€” Inferred from main.py

  • Fetches data from external YouTube transcript APIs.
  • Supports fallback transcript extraction.
  • Uses proxy handling and retry logic for stability.
  • Exports formatted text with timestamps.

βœ… Why this Actor

YouTube Transcript Fetcher is built for professionals who need transcripts fast, reliably, and at scale β€” ideal for analysts, educators, and developers.

Run this Actor on Apify Console β€” get instant transcripts in seconds.