Pricing

$19.99/month + usage

Youtube Transcript Scraper

✨ YouTube Transcript Scraper to extract video transcripts quickly and at scale. Collect captions, timestamps, and spoken content with accuracy. Ideal for research, SEO, and content analysis. Features: ⚡ fast extraction • 📊 clean output • 🔍 detailed insights • 🌍 scalable automation

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

ScrapeLabs

Actor stats

Bookmarked

Total users

Monthly active users

18 days ago

Last modified

Youtube Transcript Scraper

The Youtube Transcript Scraper is a fast, scalable YouTube transcript extractor that converts public captions into structured data for analysis and reuse. It solves the tedious task of manual transcription by letting you paste one or more video URLs and automatically returns clean transcripts as plain text or timestamped captions. Built for marketers, developers, data analysts, and researchers, this YouTube transcript downloader helps you export YouTube captions at scale and power SEO, content repurposing, and NLP workflows.

What data / output can you get?

Below are the exact fields this actor saves to the Apify dataset for each processed URL. The structure supports both timestamped captions and plain-text transcripts.

Data type	Description	Example value
id	YouTube video ID extracted from the input URL	"dQw4w9WgXcQ"
url	Canonical watch URL built from the video ID	"https://www.youtube.com/watch?v=dQw4w9WgXcQ"
input	The original input string you provided	"https://youtu.be/dQw4w9WgXcQ"
transcripts	Array of transcripts by language (filtered by settings)	[ { "language": "English (auto-generated)", "content": [...] } ]
transcripts[].language	The caption track language label returned by YouTube	"English (auto-generated)"
transcripts[].content (timestamp)	When outputFormat = "timestamp": array of caption entries	[ { "startMs": 520, "endMs": 2120, "startTime": "0:00", "text": "We're no strangers to love" } ]
transcripts[].content[].startMs	Caption start time in milliseconds	520
transcripts[].content[].endMs	Caption end time in milliseconds	2120
transcripts[].content[].startTime	Human-readable start time (MM:SS)	"0:00"
transcripts[].content[].text	Caption text for the time slice	"We're no strangers to love"
transcripts[].content (text)	When outputFormat = "text": the full transcript as a single string	"We're no strangers to love You know the rules and so do I ..."

Notes:

Multiple languages can be returned for a single video, depending on your includeEnglishAG and includeNonEnglish settings.
Results stream live to the Apify dataset as each URL finishes, so you can download partial results mid-run.
You can export results from the dataset in JSON, CSV, or Excel for downstream use.

Key features

⚙️ Flexible transcript formats (timestamp or text) — Choose between detailed timestamped captions (startMs, endMs, startTime, text) or a single plain-text transcript with outputFormat = "timestamp" or "text".
🌐 Language-aware filtering — Control inclusion of English auto-generated captions and non-English transcripts via includeEnglishAG and includeNonEnglish to fine-tune your dataset.
📦 Batch URL processing — Supply multiple YouTube URLs in urls and process them in a single run for a bulk YouTube closed captions scraper workflow.
📤 Live dataset streaming — Each completed URL is appended immediately to the dataset, enabling faster feedback loops and incremental exports.
🛡️ Smart proxy defaults — Uses the Apify RESIDENTIAL proxy by default to reduce YouTube IP blocks; you can customize proxyConfiguration or let it auto-configure.
🧰 Developer-friendly & automation-ready — Works seamlessly with the Apify API and Python runtime, making it easy to integrate as a YouTube transcript API component in ETL pipelines and automation tools.
🔎 Reliable extraction backbone — Powered by youtube-transcript-api under the hood for consistent YouTube transcript parsing, including support to download auto-generated YouTube subtitles when allowed by your settings.

How to use Youtube Transcript Scraper - step by step

Create or log in to your Apify account.
Open the “youtube-transcript-scraper” actor in Apify.
Add input URLs:
- Paste one or more YouTube video links into urls (supports both youtube.com/watch?v=... and youtu.be/...).
Configure language and format:
- includeEnglishAG: include or exclude English auto-generated captions.
- includeNonEnglish: include or exclude non-English transcripts.
- outputFormat: choose "timestamp" for detailed caption entries or "text" for a single plain-text transcript.
Set proxy (optional):
- Leave proxyConfiguration empty to use Apify RESIDENTIAL by default, or supply your own proxy settings.
Start the run:
- The actor fetches transcripts per URL and streams each result to the dataset as soon as it completes.
Download your results:
- Open the run’s Dataset tab to export JSON, CSV, or Excel and feed the data into your SEO, research, or analytics tools.

Pro tip: Use the "timestamp" format when you need precise caption timings for AI/LLM pipelines, video indexing, or searchable archives, and switch to "text" for quick content repurposing and summaries.

Use cases

Use case name	Description
SEO teams — export YouTube captions for content repurposing	Extract transcripts to create blog posts, meta descriptions, and keyword-rich articles using a scalable YouTube subtitle to text converter.
Research & academia — analyze video content at scale	Collect clean text for topic modeling, sentiment analysis, and qualitative studies with a dependable YouTube transcript parser.
Accessibility — enhance caption availability	Retrieve public captions to improve accessibility documentation and review, using a compliant YouTube caption downloader workflow.
Developers — build a transcripts API pipeline	Integrate the dataset into your backend via Apify API as a practical YouTube transcript API component for search and chatbots.
Social & marketing — snippet extraction for campaigns	Quickly find quotes and highlights from videos with a bulk YouTube transcript downloader and export ready-to-use snippets.
Data analysts — structure video speech for NLP	Use timestamped captions to align text with audio/video timelines for ML feature engineering and downstream analytics.

Why choose Youtube Transcript Scraper?

The Youtube Transcript Scraper focuses on precision, automation, and reliability for extracting public YouTube captions into structured, analysis-ready data.

✅ Accurate, structured output — Get clean transcripts with consistent keys and optional timestamps for dependable downstream use.
🌍 Multilingual support & filters — Include English auto-generated or non-English tracks on demand to fit your language strategy.
🚀 Built for scale — Process multiple URLs at once and stream results live to the dataset for faster iteration.
💻 Developer-ready — Integrate via the Apify platform and Python runtime to automate transcript extraction in your apps and pipelines.
🛡️ Robust proxy defaults — Uses Apify RESIDENTIAL proxies by default to mitigate IP blocking and improve run stability.
🔄 Better than extensions — Avoid brittle browser plugins; run a production-ready YouTube transcript extractor with repeatable outputs and infrastructure support.
💸 Cost-effective automation — Operate reliably within Apify’s usage-based environment and scale as your workloads grow.

In short: a production-ready YouTube closed captions scraper that outperforms fragile alternatives with cleaner data, more control, and automation built in.

Is it legal / ethical to use Youtube Transcript Scraper?

Yes — when done responsibly. This actor automates access to public caption tracks available on YouTube and does not access private or authenticated content.

Guidelines for responsible use:

Review and comply with YouTube’s Terms of Service and your local regulations.
Use only publicly available transcripts and respect content owners’ rights for redistribution or commercial reuse.
Avoid scraping private, paywalled, or region-restricted videos.
For edge cases or commercial redistribution, consult your legal team to ensure compliance with applicable laws (e.g., GDPR/CCPA) and platform terms.

Input parameters & output format

Example JSON input

{
  "urls": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://youtu.be/5NV6Rdv1a3I"
  ],
  "includeEnglishAG": true,
  "includeNonEnglish": false,
  "outputFormat": "timestamp",
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": ["RESIDENTIAL"]
  }
}

Input parameter reference

Field	Type	Required	Default	Description
urls	array	Yes	[]	One or more YouTube video URLs to process. Each completed URL is appended immediately to the dataset.
includeEnglishAG	boolean	No	true	Whether to include English auto-generated transcripts.
includeNonEnglish	boolean	No	false	Whether to include non-English transcripts.
outputFormat	string ("timestamp" or "text")	No	"text"	Format of transcript output: "timestamp" returns detailed timestamps, "text" returns plain text.
proxyConfiguration	object	No	{}	Proxy configuration. Uses Apify RESIDENTIAL proxy by default to bypass YouTube IP blocking. If not configured, will try to use Apify proxy automatically.

Notes:

Runtime behavior: if includeNonEnglish isn’t provided, the current implementation treats it as true internally; use the explicit value you prefer to avoid surprises.
Provide valid video URLs (youtube.com/watch?v=... or youtu.be/...) — both are supported.

Example JSON output (timestamp format)

{
  "id": "dQw4w9WgXcQ",
  "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "input": "https://youtu.be/dQw4w9WgXcQ",
  "transcripts": [
    {
      "language": "English (auto-generated)",
      "content": [
        {
          "startMs": 520,
          "endMs": 2120,
          "startTime": "0:00",
          "text": "We're no strangers to love"
        },
        {
          "startMs": 2120,
          "endMs": 3640,
          "startTime": "0:02",
          "text": "You know the rules and so do I"
        }
      ]
    }
  ]
}

When outputFormat = "text", the content becomes a single string:

{
  "id": "dQw4w9WgXcQ",
  "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "input": "https://youtu.be/dQw4w9WgXcQ",
  "transcripts": [
    {
      "language": "English (auto-generated)",
      "content": "We're no strangers to love You know the rules and so do I ..."
    }
  ]
}

Field behavior:

transcripts may be an empty array if a video has no available transcripts or if filters exclude all tracks.
startMs/endMs/startTime/text fields appear only when outputFormat = "timestamp".

FAQ

Can it extract auto-generated YouTube captions?

Yes. Set includeEnglishAG to true to include English auto-generated tracks. If set to false, English auto-generated captions will be excluded.

Does it support non-English transcripts?

Yes. Set includeNonEnglish to true to include non-English transcripts. When false, only English tracks are considered (subject to availability and filters).

Can I use short youtu.be links?

Yes. The actor extracts the video ID from both youtube.com/watch?v=... and youtu.be/... links, making it easy to get YouTube transcript online from any standard video URL.

How do I download plain text vs timestamped captions?

Choose the outputFormat input: use "text" for a single plain-text transcript or "timestamp" to export YouTube captions with startMs/endMs and startTime for each segment.

How many videos can I process in one run?

You can provide multiple URLs in the urls array for bulk processing. Practical limits depend on your Apify plan and runtime limits. Results stream to the dataset as each URL completes.

Do I need to configure a proxy?

It’s optional. If not provided, the actor attempts to use the Apify RESIDENTIAL proxy by default to reduce IP blocks. You can customize proxyConfiguration if needed.

Does it return video metadata like title or views?

No. This actor focuses on transcripts only and outputs id, url, input, and transcripts. It operates as a focused YouTube transcript extractor rather than a full video metadata scraper.

Can I integrate this with my app or workflow?

Yes. Use the Apify API to pull dataset items into your systems. It’s great for building a YouTube transcript API pipeline, automations, and exports to CSV/JSON for further processing.

Final thoughts

The Youtube Transcript Scraper is built to convert public YouTube captions into clean, structured text for analysis and reuse. With flexible formats, language-aware filters, bulk URL support, and smart proxy defaults, it’s ideal for marketers, developers, data analysts, and researchers. Use the Apify API to automate your YouTube transcript downloader pipeline or export datasets for SEO and analytics. Start extracting smarter, scalable insights from video content today.

Facebook Video Transcript Extractor

scrapebase/facebook-video-transcript-extractor

✨ Facebook Video Transcript Extractor to extract transcripts from public videos quickly and at scale. Collect captions, timestamps, and spoken content with ease. Ideal for research, SEO, and content analysis. Features: ⚡ fast extraction • 📊 clean output • 🔍 insights • 🌍 scalable automation

ScrapeBase

Youtube Transcript Scraper

ecomscrape/youtube-transcript-scraper-ppe

YouTube Transcript Scraper automates extraction of video transcripts and subtitles in multiple languages. Efficiently collect spoken content data from YouTube videos for content analysis, SEO research, accessibility services, and multilingual video intelligence.

ecomscrape

Youtube Transcript Scraper

scrapeengine/youtube-transcript-scraper

🎬 YouTube Transcript Scraper (youtube-transcript-scraper) pulls clean video transcripts/captions with timestamps, multi-language, and batch export (JSON/CSV). 🔎 Ideal for SEO, keyword research, summaries, accessibility, and content repurposing. ⚡ Fast, reliable, API-ready.

ScrapeEngine

Youtube Transcript Scraper

scraply/youtube-transcript-scraper

🎬 YouTube Transcript Scraper (youtube-transcript-scraper) quickly pulls video captions/transcripts — with timestamps, multi-language support & exports (TXT, SRT, JSON). 🔎 Ideal for SEO, content repurposing, research, subtitles & accessibility. ⚡ Fast, developer-friendly.

Scraply

Youtube Transcript Scraper

easyapi/youtube-transcript-scraper

Extract YouTube video transcripts and captions effortlessly using multiple transcript services. Perfect for content analysis, subtitles extraction, and video accessibility.

EasyApi

Youtube Transcript Scraper

scraper-engine/youtube-transcript-scraper

YouTube Transcript Scraper extracts full transcripts from public YouTube videos with ease. Quickly retrieve spoken content for research, summarization, SEO, or accessibility—just enter a video URL and get clean, structured text. No login or API key required.

Scraper Engine

263

5.0

YouTube Transcript Scraper

devscrapper/youtube-transcript-scraper

YouTube Transcript Scraper – Fast, Clean & Reliable

Oussama Production

Youtube Transcript

canadesk/youtube-transcript

Extract transcripts (with timestamps) from YouTube videos.

Canadesk Support

Youtube Transcript Scraper

scrapepilotapi/youtube-transcript-scraper

Extract YouTube video transcripts in seconds 🎥📝 Scrape captions, subtitles, timestamps, and spoken text from videos with ease. Perfect for SEO research, content repurposing, sentiment analysis, and audience insights. Turn video speech into useful data fast 🚀

ScrapePilot

Youtube Transcript Scraper

scrapapi/youtube-transcript-scraper

🎥 YouTube Transcript Scraper (youtube-transcript-scraper) extracts clean video transcripts & captions—timestamps, languages, and more. ⚡ Bulk scrape playlists/channels, export JSON/CSV for SEO, research, summarization & AI. 🔎 Perfect for repurposing and indexing.