Youtube Transcript Scraper avatar

Youtube Transcript Scraper

Pricing

$19.99/month + usage

Go to Apify Store
Youtube Transcript Scraper

Youtube Transcript Scraper

✨ YouTube Transcript Scraper to extract video transcripts quickly and at scale. Collect captions, timestamps, and spoken content with accuracy. Ideal for research, SEO, and content analysis. Features: ⚡ fast extraction • 📊 clean output • 🔍 detailed insights • 🌍 scalable automation

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

ScrapeLabs

ScrapeLabs

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

18 days ago

Last modified

Share

Youtube Transcript Scraper

The Youtube Transcript Scraper is a fast, scalable YouTube transcript extractor that converts public captions into structured data for analysis and reuse. It solves the tedious task of manual transcription by letting you paste one or more video URLs and automatically returns clean transcripts as plain text or timestamped captions. Built for marketers, developers, data analysts, and researchers, this YouTube transcript downloader helps you export YouTube captions at scale and power SEO, content repurposing, and NLP workflows.

What data / output can you get?

Below are the exact fields this actor saves to the Apify dataset for each processed URL. The structure supports both timestamped captions and plain-text transcripts.

Data typeDescriptionExample value
idYouTube video ID extracted from the input URL"dQw4w9WgXcQ"
urlCanonical watch URL built from the video ID"https://www.youtube.com/watch?v=dQw4w9WgXcQ"
inputThe original input string you provided"https://youtu.be/dQw4w9WgXcQ"
transcriptsArray of transcripts by language (filtered by settings)[ { "language": "English (auto-generated)", "content": [...] } ]
transcripts[].languageThe caption track language label returned by YouTube"English (auto-generated)"
transcripts[].content (timestamp)When outputFormat = "timestamp": array of caption entries[ { "startMs": 520, "endMs": 2120, "startTime": "0:00", "text": "We're no strangers to love" } ]
transcripts[].content[].startMsCaption start time in milliseconds520
transcripts[].content[].endMsCaption end time in milliseconds2120
transcripts[].content[].startTimeHuman-readable start time (MM:SS)"0:00"
transcripts[].content[].textCaption text for the time slice"We're no strangers to love"
transcripts[].content (text)When outputFormat = "text": the full transcript as a single string"We're no strangers to love You know the rules and so do I ..."

Notes:

  • Multiple languages can be returned for a single video, depending on your includeEnglishAG and includeNonEnglish settings.
  • Results stream live to the Apify dataset as each URL finishes, so you can download partial results mid-run.
  • You can export results from the dataset in JSON, CSV, or Excel for downstream use.

Key features

  • ⚙️ Flexible transcript formats (timestamp or text) — Choose between detailed timestamped captions (startMs, endMs, startTime, text) or a single plain-text transcript with outputFormat = "timestamp" or "text".
  • 🌐 Language-aware filtering — Control inclusion of English auto-generated captions and non-English transcripts via includeEnglishAG and includeNonEnglish to fine-tune your dataset.
  • 📦 Batch URL processing — Supply multiple YouTube URLs in urls and process them in a single run for a bulk YouTube closed captions scraper workflow.
  • 📤 Live dataset streaming — Each completed URL is appended immediately to the dataset, enabling faster feedback loops and incremental exports.
  • 🛡️ Smart proxy defaults — Uses the Apify RESIDENTIAL proxy by default to reduce YouTube IP blocks; you can customize proxyConfiguration or let it auto-configure.
  • 🧰 Developer-friendly & automation-ready — Works seamlessly with the Apify API and Python runtime, making it easy to integrate as a YouTube transcript API component in ETL pipelines and automation tools.
  • 🔎 Reliable extraction backbone — Powered by youtube-transcript-api under the hood for consistent YouTube transcript parsing, including support to download auto-generated YouTube subtitles when allowed by your settings.

How to use Youtube Transcript Scraper - step by step

  1. Create or log in to your Apify account.
  2. Open the “youtube-transcript-scraper” actor in Apify.
  3. Add input URLs:
    • Paste one or more YouTube video links into urls (supports both youtube.com/watch?v=... and youtu.be/...).
  4. Configure language and format:
    • includeEnglishAG: include or exclude English auto-generated captions.
    • includeNonEnglish: include or exclude non-English transcripts.
    • outputFormat: choose "timestamp" for detailed caption entries or "text" for a single plain-text transcript.
  5. Set proxy (optional):
    • Leave proxyConfiguration empty to use Apify RESIDENTIAL by default, or supply your own proxy settings.
  6. Start the run:
    • The actor fetches transcripts per URL and streams each result to the dataset as soon as it completes.
  7. Download your results:
    • Open the run’s Dataset tab to export JSON, CSV, or Excel and feed the data into your SEO, research, or analytics tools.

Pro tip: Use the "timestamp" format when you need precise caption timings for AI/LLM pipelines, video indexing, or searchable archives, and switch to "text" for quick content repurposing and summaries.

Use cases

Use case nameDescription
SEO teams — export YouTube captions for content repurposingExtract transcripts to create blog posts, meta descriptions, and keyword-rich articles using a scalable YouTube subtitle to text converter.
Research & academia — analyze video content at scaleCollect clean text for topic modeling, sentiment analysis, and qualitative studies with a dependable YouTube transcript parser.
Accessibility — enhance caption availabilityRetrieve public captions to improve accessibility documentation and review, using a compliant YouTube caption downloader workflow.
Developers — build a transcripts API pipelineIntegrate the dataset into your backend via Apify API as a practical YouTube transcript API component for search and chatbots.
Social & marketing — snippet extraction for campaignsQuickly find quotes and highlights from videos with a bulk YouTube transcript downloader and export ready-to-use snippets.
Data analysts — structure video speech for NLPUse timestamped captions to align text with audio/video timelines for ML feature engineering and downstream analytics.

Why choose Youtube Transcript Scraper?

The Youtube Transcript Scraper focuses on precision, automation, and reliability for extracting public YouTube captions into structured, analysis-ready data.

  • ✅ Accurate, structured output — Get clean transcripts with consistent keys and optional timestamps for dependable downstream use.
  • 🌍 Multilingual support & filters — Include English auto-generated or non-English tracks on demand to fit your language strategy.
  • 🚀 Built for scale — Process multiple URLs at once and stream results live to the dataset for faster iteration.
  • 💻 Developer-ready — Integrate via the Apify platform and Python runtime to automate transcript extraction in your apps and pipelines.
  • 🛡️ Robust proxy defaults — Uses Apify RESIDENTIAL proxies by default to mitigate IP blocking and improve run stability.
  • 🔄 Better than extensions — Avoid brittle browser plugins; run a production-ready YouTube transcript extractor with repeatable outputs and infrastructure support.
  • 💸 Cost-effective automation — Operate reliably within Apify’s usage-based environment and scale as your workloads grow.

In short: a production-ready YouTube closed captions scraper that outperforms fragile alternatives with cleaner data, more control, and automation built in.

Yes — when done responsibly. This actor automates access to public caption tracks available on YouTube and does not access private or authenticated content.

Guidelines for responsible use:

  • Review and comply with YouTube’s Terms of Service and your local regulations.
  • Use only publicly available transcripts and respect content owners’ rights for redistribution or commercial reuse.
  • Avoid scraping private, paywalled, or region-restricted videos.
  • For edge cases or commercial redistribution, consult your legal team to ensure compliance with applicable laws (e.g., GDPR/CCPA) and platform terms.

Input parameters & output format

Example JSON input

{
"urls": [
"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"https://youtu.be/5NV6Rdv1a3I"
],
"includeEnglishAG": true,
"includeNonEnglish": false,
"outputFormat": "timestamp",
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Input parameter reference

FieldTypeRequiredDefaultDescription
urlsarrayYes[]One or more YouTube video URLs to process. Each completed URL is appended immediately to the dataset.
includeEnglishAGbooleanNotrueWhether to include English auto-generated transcripts.
includeNonEnglishbooleanNofalseWhether to include non-English transcripts.
outputFormatstring ("timestamp" or "text")No"text"Format of transcript output: "timestamp" returns detailed timestamps, "text" returns plain text.
proxyConfigurationobjectNo{}Proxy configuration. Uses Apify RESIDENTIAL proxy by default to bypass YouTube IP blocking. If not configured, will try to use Apify proxy automatically.

Notes:

  • Runtime behavior: if includeNonEnglish isn’t provided, the current implementation treats it as true internally; use the explicit value you prefer to avoid surprises.
  • Provide valid video URLs (youtube.com/watch?v=... or youtu.be/...) — both are supported.

Example JSON output (timestamp format)

{
"id": "dQw4w9WgXcQ",
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"input": "https://youtu.be/dQw4w9WgXcQ",
"transcripts": [
{
"language": "English (auto-generated)",
"content": [
{
"startMs": 520,
"endMs": 2120,
"startTime": "0:00",
"text": "We're no strangers to love"
},
{
"startMs": 2120,
"endMs": 3640,
"startTime": "0:02",
"text": "You know the rules and so do I"
}
]
}
]
}

When outputFormat = "text", the content becomes a single string:

{
"id": "dQw4w9WgXcQ",
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"input": "https://youtu.be/dQw4w9WgXcQ",
"transcripts": [
{
"language": "English (auto-generated)",
"content": "We're no strangers to love You know the rules and so do I ..."
}
]
}

Field behavior:

  • transcripts may be an empty array if a video has no available transcripts or if filters exclude all tracks.
  • startMs/endMs/startTime/text fields appear only when outputFormat = "timestamp".

FAQ

Can it extract auto-generated YouTube captions?

Yes. Set includeEnglishAG to true to include English auto-generated tracks. If set to false, English auto-generated captions will be excluded.

Does it support non-English transcripts?

Yes. Set includeNonEnglish to true to include non-English transcripts. When false, only English tracks are considered (subject to availability and filters).

Yes. The actor extracts the video ID from both youtube.com/watch?v=... and youtu.be/... links, making it easy to get YouTube transcript online from any standard video URL.

How do I download plain text vs timestamped captions?

Choose the outputFormat input: use "text" for a single plain-text transcript or "timestamp" to export YouTube captions with startMs/endMs and startTime for each segment.

How many videos can I process in one run?

You can provide multiple URLs in the urls array for bulk processing. Practical limits depend on your Apify plan and runtime limits. Results stream to the dataset as each URL completes.

Do I need to configure a proxy?

It’s optional. If not provided, the actor attempts to use the Apify RESIDENTIAL proxy by default to reduce IP blocks. You can customize proxyConfiguration if needed.

Does it return video metadata like title or views?

No. This actor focuses on transcripts only and outputs id, url, input, and transcripts. It operates as a focused YouTube transcript extractor rather than a full video metadata scraper.

Can I integrate this with my app or workflow?

Yes. Use the Apify API to pull dataset items into your systems. It’s great for building a YouTube transcript API pipeline, automations, and exports to CSV/JSON for further processing.

Final thoughts

The Youtube Transcript Scraper is built to convert public YouTube captions into clean, structured text for analysis and reuse. With flexible formats, language-aware filters, bulk URL support, and smart proxy defaults, it’s ideal for marketers, developers, data analysts, and researchers. Use the Apify API to automate your YouTube transcript downloader pipeline or export datasets for SEO and analytics. Start extracting smarter, scalable insights from video content today.