Youtube Transcript Scraper
Pricing
Pay per event
Youtube Transcript Scraper
Pull transcripts from any YouTube video at scale! Extract full subtitles with timestamps in SRT and plain text, plus titles, channels, descriptions, view counts, upload dates, tags, and thumbnails. Perfect for content research, SEO, summarization, and video analytics. Start extracting today!
Pricing
Pay per event
Rating
0.0
(0)
Developer
ParseForge
Actor stats
0
Bookmarked
6
Total users
5
Monthly active users
4 days ago
Last modified
Categories
Share

🎬 YouTube Transcript Scraper
🚀 Extract full transcripts from any YouTube video in seconds. Timestamped segments, SRT export, and 17 metadata fields (title, channel, views, likes, upload date, tags, thumbnail) per video. No API key, no registration, no YouTube Data API quota.
🕒 Last updated: 2026-04-24 · 📊 17 fields per video · 🌐 100+ languages · ⚡ 30 videos in parallel · 📜 SRT + plain text output
The YouTube Transcript Scraper turns any YouTube URL into a structured record with the full transcript, segment timestamps, and 17 metadata fields. It handles human-authored captions and auto-generated ones across 100+ languages. Each record ships with the plain-text transcript, an SRT file for subtitle overlay, and a segment array for timestamp-precise search.
Metadata covers title, channel ID, channel name, channel URL, description, duration, view count, like count, comment count, upload date, tags, categories, thumbnail URL, and the list of all available subtitle languages. Concurrent extraction keeps 30 videos processing in parallel, so a queue of 100 clips finishes in a couple of minutes. Residential proxy is required because YouTube has cracked down hard on datacenter IPs.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| AI app developers, researchers, content creators, language learners, accessibility engineers, journalists | RAG video indexing, LLM summarization, captions datasets, language learning, accessibility tools |
📋 What the YouTube Transcript Scraper does
Six transcript workflows in a single run:
- 📝 Full transcript. Timestamped segments with start, duration, and text per line.
- 💬 Plain text transcript. Flat string ready for LLM ingestion.
- 🎞️ SRT export. Standards-compliant subtitle file for video apps.
- 🌐 Language picker. Choose your preferred caption language with fallback to defaults.
- 🎬 Video metadata. Title, channel info, views, likes, comments, upload date, tags, categories.
- 🌍 Available languages. Full list of manual and auto-generated caption languages per video.
Each record also includes the thumbnail URL and an isAutoGenerated flag so you can filter out auto captions when you need human-quality transcripts.
💡 Why it matters: video is the largest untapped dataset in the world. Transcripts make it searchable, summarizable, and indexable. DIY transcript fetchers break every time YouTube changes their API. This Actor uses yt-dlp under the hood, which is actively maintained.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough of transcript-powered video search.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
startUrls | array of URLs | required if no videoIds | YouTube video URLs (youtube.com/watch?v=, youtu.be/, shorts). |
videoIds | array of strings | required if no startUrls | Raw YouTube video IDs (11 chars). |
language | string | "" | Preferred ISO language code (en, es, fr). |
includeAutoGenerated | boolean | true | Fall back to auto-generated captions when no manual ones exist. |
maxItems | integer | 10 | Videos processed. Free plan caps at 10, paid plan at 1,000,000. |
proxyConfiguration | object | RESIDENTIAL | Residential proxy required. |
Example: transcribe a TED talk.
{"startUrls": [{ "url": "https://www.youtube.com/watch?v=UyyjU8fzEYU" }],"language": "en","includeAutoGenerated": true,"maxItems": 1,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Example: batch transcribe a playlist of videos.
{"videoIds": ["dQw4w9WgXcQ","kJQP7kiw5Fk","9bZkp7q19f0"],"language": "en","maxItems": 100}
⚠️ Good to Know: YouTube now blocks datacenter IPs for transcript fetching. Apify residential proxy is included on paid plans and is strongly recommended. Videos without captions return a record with an
errorfield explaining "No subtitles available."
📊 Output
Each record contains 17 fields. Download the dataset as CSV, Excel, JSON, or XML.
🧾 Schema
| Field | Type | Example |
|---|---|---|
🆔 videoId | string | "UyyjU8fzEYU" |
🔗 url | string | "https://www.youtube.com/watch?v=UyyjU8fzEYU" |
🏷️ title | string | null | `"My stroke of insight |
🆔 channelId | string | null | "UCAuUUnT6oDeKwE6v1NGQxug" |
🔗 channelUrl | string | null | "https://www.youtube.com/channel/..." |
🧑 channelName | string | null | "TED" |
📝 description | string | null | "Neuroanatomist Jill Bolte Taylor..." |
⏱️ durationSeconds | number | null | 1141 |
👁️ viewCount | number | null | 8688914 |
👍 likeCount | number | null | 122000 |
💬 commentCount | number | null | 4800 |
📅 uploadDate | string | null | "2008-03-13" |
🏷️ tags | string[] | ["TED Talk", "brain", "science"] |
🗂️ categories | string[] | ["Science & Technology"] |
🌐 language | string | null | "en" |
🌍 availableSubtitleLanguages | string[] | ["en", "es", "fr"] |
🤖 availableAutoCaptionLanguages | string[] | ["en"] |
🤖 isAutoGenerated | boolean | false |
📜 transcript | array | [{"start": 12.3, "duration": 4.2, "text": "..."}] |
💬 transcriptPlainText | string | "I grew up to study the brain..." |
🎞️ transcriptSrt | string | "1\n00:00:12,300 --> ...\n..." |
🔢 wordCount | number | 2703 |
🖼️ thumbnailUrl | string | null | "https://i.ytimg.com/vi/.../maxresdefault.jpg" |
🕒 scrapedAt | ISO 8601 | "2026-04-21T12:00:00.000Z" |
❗ error | string | null | "No subtitles available" on failure |
📦 Sample records
✨ Why choose this Actor
| Capability | |
|---|---|
| 📜 | Full transcript + SRT. Three output formats: segments, plain text, subtitle file. |
| 🌐 | 100+ languages. Manual captions and auto-generated captions supported. |
| 📊 | 17 metadata fields. Title, channel, views, likes, comments, tags, upload date. |
| ⚡ | Concurrent. 30 videos processing in parallel on a single run. |
| 🔁 | Actively maintained. Uses yt-dlp under the hood, which tracks YouTube's changes. |
| 🚫 | No YouTube Data API quota. Unlimited captions without Google Cloud project. |
| 🔌 | Integrations. Drops into RAG pipelines, language-learning apps, and subtitle tools. |
📊 Every transcript is a searchable index point. Indexing video at scale unlocks insights, summaries, and accessibility features that would be impossible to build manually.
📈 How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| ⭐ YouTube Transcript Scraper (this Actor) | $5 free credit, then pay-per-use | Any public video | Live per run | language, auto/manual, list | ⚡ 2 min |
| YouTube Data API | Free (quota) | Metadata only | Real-time | Strict quota | ⏳ Variable |
| DIY yt-dlp scripts | Free | Whatever you code | Your schedule | Whatever you build | 🐢 Days |
| Paid transcription APIs | $0.04+/min | Any audio | Real-time | Custom filters | ⏳ Hours |
Pick this Actor when you want reliable YouTube transcripts without quota limits or custom infrastructure.
🚀 How to use
- 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
- 🌐 Open the Actor. Go to the YouTube Transcript Scraper page on the Apify Store.
- 🎯 Add video URLs. Paste URLs or video IDs and pick a preferred language.
- 🚀 Run it. Click Start and let the Actor transcribe.
- 📥 Download. Grab your dataset as CSV, Excel, JSON, or XML.
⏱️ Total time from signup to downloaded transcripts: 3-5 minutes. No coding required.
💼 Business use cases
🔌 Automating YouTube Transcript Scraper
Control the scraper programmatically for scheduled runs and pipeline integrations:
- 🟢 Node.js. Install the
apify-clientNPM package. - 🐍 Python. Use the
apify-clientPyPI package. - 📚 See the Apify API documentation for full details.
The Apify Schedules feature lets you trigger this Actor on any cron interval. Daily transcription of a channel's latest uploads keeps a RAG index current.
🌟 Beyond business use cases
Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.
🤖 Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- 💬 ChatGPT
- 🧠 Claude
- 🔍 Perplexity
- 🅒 Copilot
❓ Frequently Asked Questions
🧩 How does it work?
The Actor wraps yt-dlp, which fetches metadata and subtitle files from YouTube. Transcripts are parsed into structured segments, then flattened into plain text and SRT formats. Each run processes up to 30 videos in parallel.
📏 How accurate are the transcripts?
Human-authored captions are highly accurate. Auto-generated captions depend on the audio quality and language; English auto captions are typically 85-95% accurate.
🌐 Which languages are supported?
Every language for which YouTube publishes captions or auto-captions (100+ languages). Pass any ISO code to language or leave empty for the video's default.
🔁 Why do I need residential proxy?
YouTube now challenges datacenter IPs with "Sign in to confirm you're not a bot" when fetching metadata or subtitles. Residential proxy is included on paid Apify plans and bypasses this cleanly.
⏰ Can I schedule regular runs?
Yes. Use Apify Schedules to transcribe new uploads on any cron interval.
⚖️ Is it legal?
Transcript extraction from publicly available videos is generally fine for research, indexing, and AI use. Commercial redistribution of transcripts may require rights clearance from the video owner.
💼 Can I use this commercially?
Yes for internal search, RAG, and summarization. Redistribution of full transcripts requires respecting copyright and YouTube's terms of service.
💳 Do I need a paid Apify plan to use this Actor?
The free plan covers testing (10 videos per run). A paid plan lifts the limit AND gives you residential proxy access, which is required for reliable YouTube transcript fetching.
🔁 What happens if a run fails?
Apify retries transient errors. Per-video failures (no captions, geo-blocked, private) are logged in the error field. Partial datasets are preserved.
🎞️ Can I download the video file?
This Actor focuses on transcripts and metadata. For video files, use a dedicated YouTube Video Downloader actor.
📺 Does it work on shorts, live streams, and age-restricted videos?
YouTube Shorts work. Live streams and age-restricted videos are not supported (age-restricted requires sign-in; live streams have no final transcript until the stream ends).
🆘 What if I need help?
Our team is available through the Apify platform and the Tally form below.
🔌 Integrate with any app
YouTube Transcript Scraper connects to any cloud service via Apify integrations:
- Make - Auto-transcribe new uploads
- Zapier - Push transcripts to Notion or Airtable
- Slack - Share TL;DRs in team channels
- Airbyte - Pipe transcripts into your warehouse
- GitHub - Trigger runs from commits
- Google Drive - Save transcripts to Docs or Sheets
You can also use webhooks to push transcripts into vector databases, RAG stacks, or subtitle tools.
🔗 Recommended Actors
- 📺 YouTube Channel Scraper - Channel metadata and video lists
- 🎞️ YouTube Shorts Scraper - Shorts data and metadata
- 💬 YouTube Comments Scraper - Comments and replies
- 🏷️ YouTube Hashtag Scraper - Videos matching a hashtag
- 🤖 RAG Web Browser - Search the web with LLM-ready output
💡 Pro Tip: browse the complete ParseForge collection for more video and audio tools.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with Google, YouTube, or Alphabet. It accesses only publicly available video metadata and caption tracks. Respect YouTube's terms of service and copyright when using transcripts commercially.