Youtube Transcript Scraper avatar

Youtube Transcript Scraper

Pricing

$9.00/month + usage

Go to Apify Store
Youtube Transcript Scraper

Youtube Transcript Scraper

โšก Meet the Ultimate YouTube Transcript Hunter! โšก This Apify Actor dives deep into YouTube ๐ŸŽฅ, extracts every word ๐Ÿง , and even revives lost subtitles like a digital sorcerer ๐Ÿช„. Fast. Smart. Unstoppable. Ready to fuel your next data breakthrough ๐Ÿš€๐Ÿค–๐Ÿ”ฅ

Pricing

$9.00/month + usage

Rating

0.0

(0)

Developer

Neuro Scraper

Neuro Scraper

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

5 months ago

Last modified

Share

๐ŸŒŸ YouTube Transcript Fetcher Actor

Build Version License Apify

Instantly extract accurate YouTube transcripts โ€” fast, secure, and production-ready.


๐Ÿ“– Summary

This actor automatically fetches YouTube video transcripts (including Shorts), returning clean, timestamped text data. It uses a dual-source strategy to ensure transcripts are delivered even when standard captions are unavailable.

Key benefits:

  • โšก Get transcripts instantly from multiple YouTube URLs.
  • ๐Ÿ” Smart fallback ensures reliable results.
  • ๐Ÿง  Normalizes Shorts and youtu.be links automatically.
  • ๐Ÿ”’ Privacy-safe, proxy-compatible, and production-ready.

๐Ÿ’ก Use Cases

  • ๐Ÿ“ฐ Generate blog summaries or subtitles from videos.
  • ๐ŸŽ“ Extract transcripts for research or educational analysis.
  • ๐Ÿ“Š Analyze large-scale YouTube datasets for content insights.
  • ๐Ÿงพ Auto-generate closed captions for your platform.
  • ๐Ÿง  Power AI models or LLM pipelines with real spoken text.

โšก Quick Start (Console โ€” One Click)

Apify Console Screenshot

  1. Open the Actor on Apify Console.
  2. Paste YouTube video URLs into the Input field.
  3. Click Run โ€” results appear instantly in your Dataset.

โš™๏ธ Quick Start (CLI + API)

CLI:

$apify call neuro-scraper/youtube-transcript-fetcher --input ./input.example.json

Python (apify-client):

from apify_client import ApifyClient
client = ApifyClient('<APIFY_TOKEN>')
run = client.actor('neuro-scraper/youtube-transcript-fetcher').call(
run_input={"startUrls": [{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}]}
)
for item in client.dataset(run['defaultDatasetId']).list_items()['items']:
print(item['Transcript']['plain_text'])

๐Ÿ“ Inputs

๐Ÿ”‘ Name๐Ÿ“ Typeโ“ Requiredโš™๏ธ Default๐Ÿ“Œ Example๐Ÿง  Notes
startUrlsarrayโœ… Yes[][{"url": "https://www.youtube.com/watch?v=abcd1234"}]List of YouTube video URLs
workersintegerโš™๏ธ Optional510Max concurrent fetches
proxyConfigurationobjectโš™๏ธ Optional{}{"useApifyProxy": true}Proxy settings if needed

Example input (Console JSON):

{
"startUrls": [
{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"},
{"url": "https://youtu.be/example123"}
],
"workers": 5,
"proxyConfiguration": {"useApifyProxy": true}
}

๐Ÿ“„ Outputs

Each item in the Dataset contains:

{
"video_id": "AWBsoArakNY",
"title": "Who Has The Fastest Reaction Time?",
"url": "https://youtube.com/AWBsoArakNY?si=8ThAJzdEEA1PnZRk",
"lang": "en",
"format": "vtt",
"segments": [
{
"start": 3.919,
"end": 5.99,
"text": "I [screaming] WON. I WON. OH, I SHOULD",
"duration": 2.071,
"duration_seconds": 2,
"duration_milliseconds": 71,
"duration_seconds_with_ms": "2.071",
"duration_minutes": 0.03451666666666667,
"start_ts": "00:00:03.919",
"end_ts": "00:00:05.990",
"display": "Transcripts:\nStart: 00:00:03.919 End: 00:00:05.990\nDuration: 2.071 seconds (0.034517 minutes)\n\nNo transcript added for this duration."
}
]
}

Results are stored in the default Dataset for easy export (JSON, CSV, Excel).


๐Ÿ”‘ Environment Variables

VariableDescription
APIFY_TOKENRequired for authentication
HTTP_PROXY, HTTPS_PROXYOptional custom proxies
APIFY_PROXY_PASSWORDUse with Apify Proxy

Store all credentials securely as secrets, not plaintext.


โ–ถ๏ธ How to Run

  1. Open Apify Console.
  2. Navigate to Actors โ†’ YouTube Transcript Fetcher.
  3. Paste input JSON or fill the input form.
  4. Click Run.
  5. View results in the Dataset tab.

โฐ Scheduling & Webhooks

  • Schedule periodic runs (e.g., daily or hourly) from the Schedule tab.
  • Configure Webhooks to trigger a custom workflow or send notifications on completion.

๐Ÿ•พ Logs & Troubleshooting

  • Monitor real-time logs in the Console Run Log panel.

  • Common issues:

    • โŒ No transcript available: Video may lack captions.
    • โš ๏ธ Timeout errors: Increase workers or adjust proxy settings.

๐Ÿ”’ Permissions & Storage

  • Uses Dataset for storing transcript results.
  • Uses RequestQueue internally for managing URL processing.
  • Fully privacy-safe: no personal data stored or shared.

๐Ÿ”Ÿ Changelog

VersionDateNotes
1.0.02025-11-04Initial release โ€” stable and production-ready

๐Ÿ–Œ Notes / TODOs

  • TODO: Confirm output schema for advanced use-cases.
  • TODO: Add demo GIF of console run for better UX.

๐ŸŒ Proxy Configuration

Enable Apify Proxy directly in the Console for easy network routing.

Custom proxy example:

{
"proxyConfiguration": {
"proxyUrls": ["http://<PROXY_USER:PASS@HOST:PORT>"]
}
}

Or use environment variables:

export HTTP_PROXY=http://<PROXY_USER:PASS@HOST:PORT>
export HTTPS_PROXY=http://<PROXY_USER:PASS@HOST:PORT>

Best practice: Store proxy credentials as secrets.

TODO: Consider proxy rotation for large-scale scraping.


๐Ÿ“š References


๐Ÿค” Inferred from main.py

  • Fetches data from external YouTube transcript APIs.
  • Supports fallback transcript extraction.
  • Uses proxy handling and retry logic for stability.
  • Exports formatted text with timestamps.

โœ… Why this Actor

YouTube Transcript Fetcher is built for professionals who need transcripts fast, reliably, and at scale โ€” ideal for analysts, educators, and developers.

Run this Actor on Apify Console โ€” get instant transcripts in seconds.