Youtube Transcript Scraper avatar

Youtube Transcript Scraper

Pricing

Pay per event

Go to Apify Store
Youtube Transcript Scraper

Youtube Transcript Scraper

Pull transcripts from any YouTube video at scale! Extract full subtitles with timestamps in SRT and plain text, plus titles, channels, descriptions, view counts, upload dates, tags, and thumbnails. Perfect for content research, SEO, summarization, and video analytics. Start extracting today!

Pricing

Pay per event

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

6

Total users

5

Monthly active users

4 days ago

Last modified

Share

ParseForge Banner

🎬 YouTube Transcript Scraper

🚀 Extract full transcripts from any YouTube video in seconds. Timestamped segments, SRT export, and 17 metadata fields (title, channel, views, likes, upload date, tags, thumbnail) per video. No API key, no registration, no YouTube Data API quota.

🕒 Last updated: 2026-04-24 · 📊 17 fields per video · 🌐 100+ languages · ⚡ 30 videos in parallel · 📜 SRT + plain text output

The YouTube Transcript Scraper turns any YouTube URL into a structured record with the full transcript, segment timestamps, and 17 metadata fields. It handles human-authored captions and auto-generated ones across 100+ languages. Each record ships with the plain-text transcript, an SRT file for subtitle overlay, and a segment array for timestamp-precise search.

Metadata covers title, channel ID, channel name, channel URL, description, duration, view count, like count, comment count, upload date, tags, categories, thumbnail URL, and the list of all available subtitle languages. Concurrent extraction keeps 30 videos processing in parallel, so a queue of 100 clips finishes in a couple of minutes. Residential proxy is required because YouTube has cracked down hard on datacenter IPs.

🎯 Target Audience💡 Primary Use Cases
AI app developers, researchers, content creators, language learners, accessibility engineers, journalistsRAG video indexing, LLM summarization, captions datasets, language learning, accessibility tools

📋 What the YouTube Transcript Scraper does

Six transcript workflows in a single run:

  • 📝 Full transcript. Timestamped segments with start, duration, and text per line.
  • 💬 Plain text transcript. Flat string ready for LLM ingestion.
  • 🎞️ SRT export. Standards-compliant subtitle file for video apps.
  • 🌐 Language picker. Choose your preferred caption language with fallback to defaults.
  • 🎬 Video metadata. Title, channel info, views, likes, comments, upload date, tags, categories.
  • 🌍 Available languages. Full list of manual and auto-generated caption languages per video.

Each record also includes the thumbnail URL and an isAutoGenerated flag so you can filter out auto captions when you need human-quality transcripts.

💡 Why it matters: video is the largest untapped dataset in the world. Transcripts make it searchable, summarizable, and indexable. DIY transcript fetchers break every time YouTube changes their API. This Actor uses yt-dlp under the hood, which is actively maintained.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough of transcript-powered video search.


⚙️ Input

InputTypeDefaultBehavior
startUrlsarray of URLsrequired if no videoIdsYouTube video URLs (youtube.com/watch?v=, youtu.be/, shorts).
videoIdsarray of stringsrequired if no startUrlsRaw YouTube video IDs (11 chars).
languagestring""Preferred ISO language code (en, es, fr).
includeAutoGeneratedbooleantrueFall back to auto-generated captions when no manual ones exist.
maxItemsinteger10Videos processed. Free plan caps at 10, paid plan at 1,000,000.
proxyConfigurationobjectRESIDENTIALResidential proxy required.

Example: transcribe a TED talk.

{
"startUrls": [
{ "url": "https://www.youtube.com/watch?v=UyyjU8fzEYU" }
],
"language": "en",
"includeAutoGenerated": true,
"maxItems": 1,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Example: batch transcribe a playlist of videos.

{
"videoIds": [
"dQw4w9WgXcQ",
"kJQP7kiw5Fk",
"9bZkp7q19f0"
],
"language": "en",
"maxItems": 100
}

⚠️ Good to Know: YouTube now blocks datacenter IPs for transcript fetching. Apify residential proxy is included on paid plans and is strongly recommended. Videos without captions return a record with an error field explaining "No subtitles available."


📊 Output

Each record contains 17 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🆔 videoIdstring"UyyjU8fzEYU"
🔗 urlstring"https://www.youtube.com/watch?v=UyyjU8fzEYU"
🏷️ titlestring | null`"My stroke of insight
🆔 channelIdstring | null"UCAuUUnT6oDeKwE6v1NGQxug"
🔗 channelUrlstring | null"https://www.youtube.com/channel/..."
🧑 channelNamestring | null"TED"
📝 descriptionstring | null"Neuroanatomist Jill Bolte Taylor..."
⏱️ durationSecondsnumber | null1141
👁️ viewCountnumber | null8688914
👍 likeCountnumber | null122000
💬 commentCountnumber | null4800
📅 uploadDatestring | null"2008-03-13"
🏷️ tagsstring[]["TED Talk", "brain", "science"]
🗂️ categoriesstring[]["Science & Technology"]
🌐 languagestring | null"en"
🌍 availableSubtitleLanguagesstring[]["en", "es", "fr"]
🤖 availableAutoCaptionLanguagesstring[]["en"]
🤖 isAutoGeneratedbooleanfalse
📜 transcriptarray[{"start": 12.3, "duration": 4.2, "text": "..."}]
💬 transcriptPlainTextstring"I grew up to study the brain..."
🎞️ transcriptSrtstring"1\n00:00:12,300 --> ...\n..."
🔢 wordCountnumber2703
🖼️ thumbnailUrlstring | null"https://i.ytimg.com/vi/.../maxresdefault.jpg"
🕒 scrapedAtISO 8601"2026-04-21T12:00:00.000Z"
errorstring | null"No subtitles available" on failure

📦 Sample records


✨ Why choose this Actor

Capability
📜Full transcript + SRT. Three output formats: segments, plain text, subtitle file.
🌐100+ languages. Manual captions and auto-generated captions supported.
📊17 metadata fields. Title, channel, views, likes, comments, tags, upload date.
Concurrent. 30 videos processing in parallel on a single run.
🔁Actively maintained. Uses yt-dlp under the hood, which tracks YouTube's changes.
🚫No YouTube Data API quota. Unlimited captions without Google Cloud project.
🔌Integrations. Drops into RAG pipelines, language-learning apps, and subtitle tools.

📊 Every transcript is a searchable index point. Indexing video at scale unlocks insights, summaries, and accessibility features that would be impossible to build manually.


📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ YouTube Transcript Scraper (this Actor)$5 free credit, then pay-per-useAny public videoLive per runlanguage, auto/manual, list⚡ 2 min
YouTube Data APIFree (quota)Metadata onlyReal-timeStrict quota⏳ Variable
DIY yt-dlp scriptsFreeWhatever you codeYour scheduleWhatever you build🐢 Days
Paid transcription APIs$0.04+/minAny audioReal-timeCustom filters⏳ Hours

Pick this Actor when you want reliable YouTube transcripts without quota limits or custom infrastructure.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the YouTube Transcript Scraper page on the Apify Store.
  3. 🎯 Add video URLs. Paste URLs or video IDs and pick a preferred language.
  4. 🚀 Run it. Click Start and let the Actor transcribe.
  5. 📥 Download. Grab your dataset as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded transcripts: 3-5 minutes. No coding required.


💼 Business use cases

🧠 AI & RAG

  • Index videos in a searchable knowledge base
  • Feed transcripts to GPT, Claude, or Gemini summaries
  • Build video-aware chatbots
  • Generate research datasets for LLMs

🎓 Education & Learning

  • Side-by-side bilingual transcripts
  • Study notes from lecture videos
  • Language-learning flashcards from music
  • Accessibility captions for students

📰 Media & Journalism

  • Extract quotes from interview videos
  • Fact-check statements at scale
  • Build transcripts for podcast archives
  • Monitor public-figure statements

🛠️ Developer Tooling

  • SRT files for video players
  • Transcripts for video search engines
  • Subtitle generation for app content
  • Dataset assembly for speech models

🔌 Automating YouTube Transcript Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Daily transcription of a channel's latest uploads keeps a RAG index current.

🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Empirical datasets for papers, thesis work, and coursework
  • Longitudinal studies tracking changes across snapshots
  • Reproducible research with cited, versioned data pulls
  • Classroom exercises on data analysis and ethical scraping

🎨 Personal and creative

  • Side projects, portfolio demos, and indie app launches
  • Data visualizations, dashboards, and infographics
  • Content research for bloggers, YouTubers, and podcasters
  • Hobbyist collections and personal trackers

🤝 Non-profit and civic

  • Transparency reporting and accountability projects
  • Advocacy campaigns backed by public-interest data
  • Community-run databases for local issues
  • Investigative journalism on public records

🧪 Experimentation

  • Prototype AI and machine-learning pipelines with real data
  • Validate product-market hypotheses before engineering spend
  • Train small domain-specific models on niche corpora
  • Test dashboard concepts with live input

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:



❓ Frequently Asked Questions

🧩 How does it work?

The Actor wraps yt-dlp, which fetches metadata and subtitle files from YouTube. Transcripts are parsed into structured segments, then flattened into plain text and SRT formats. Each run processes up to 30 videos in parallel.

📏 How accurate are the transcripts?

Human-authored captions are highly accurate. Auto-generated captions depend on the audio quality and language; English auto captions are typically 85-95% accurate.

🌐 Which languages are supported?

Every language for which YouTube publishes captions or auto-captions (100+ languages). Pass any ISO code to language or leave empty for the video's default.

🔁 Why do I need residential proxy?

YouTube now challenges datacenter IPs with "Sign in to confirm you're not a bot" when fetching metadata or subtitles. Residential proxy is included on paid Apify plans and bypasses this cleanly.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to transcribe new uploads on any cron interval.

Transcript extraction from publicly available videos is generally fine for research, indexing, and AI use. Commercial redistribution of transcripts may require rights clearance from the video owner.

💼 Can I use this commercially?

Yes for internal search, RAG, and summarization. Redistribution of full transcripts requires respecting copyright and YouTube's terms of service.

💳 Do I need a paid Apify plan to use this Actor?

The free plan covers testing (10 videos per run). A paid plan lifts the limit AND gives you residential proxy access, which is required for reliable YouTube transcript fetching.

🔁 What happens if a run fails?

Apify retries transient errors. Per-video failures (no captions, geo-blocked, private) are logged in the error field. Partial datasets are preserved.

🎞️ Can I download the video file?

This Actor focuses on transcripts and metadata. For video files, use a dedicated YouTube Video Downloader actor.

📺 Does it work on shorts, live streams, and age-restricted videos?

YouTube Shorts work. Live streams and age-restricted videos are not supported (age-restricted requires sign-in; live streams have no final transcript until the stream ends).

🆘 What if I need help?

Our team is available through the Apify platform and the Tally form below.


🔌 Integrate with any app

YouTube Transcript Scraper connects to any cloud service via Apify integrations:

  • Make - Auto-transcribe new uploads
  • Zapier - Push transcripts to Notion or Airtable
  • Slack - Share TL;DRs in team channels
  • Airbyte - Pipe transcripts into your warehouse
  • GitHub - Trigger runs from commits
  • Google Drive - Save transcripts to Docs or Sheets

You can also use webhooks to push transcripts into vector databases, RAG stacks, or subtitle tools.


💡 Pro Tip: browse the complete ParseForge collection for more video and audio tools.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with Google, YouTube, or Alphabet. It accesses only publicly available video metadata and caption tracks. Respect YouTube's terms of service and copyright when using transcripts commercially.