Youtube Transcript Scraper avatar

Youtube Transcript Scraper

Pricing

$19.99/month + usage

Go to Apify Store
Youtube Transcript Scraper

Youtube Transcript Scraper

🎬 YouTube Transcript Scraper (youtube-transcript-scraper) pulls clean video transcripts/captions with timestamps, multi-language, and batch export (JSON/CSV). 🔎 Ideal for SEO, keyword research, summaries, accessibility, and content repurposing. ⚡ Fast, reliable, API-ready.

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

ScrapeEngine

ScrapeEngine

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Share

Youtube Transcript Scraper

Youtube Transcript Scraper is a fast, reliable YouTube transcript extractor that turns public captions into clean, structured data — no copy-paste, no extensions. It solves manual transcription by letting you download YouTube transcripts as plain text or detailed timestamped captions you can export to JSON/CSV. Built for marketers, developers, data analysts, and researchers, this YouTube transcript scraper tool scales from single videos to bulk lists with proxy-backed reliability so you can get YouTube video transcripts at speed and feed them into your workflows.

What data / output can you get?

The actor streams results to your Apify dataset as each URL finishes. Here are the exact fields you’ll see in the output, with examples.

Data typeDescriptionExample value
idYouTube video ID extracted from the input URL4KbrxIpQgkM
urlCanonical video URL composed from the IDhttps://www.youtube.com/watch?v=4KbrxIpQgkM
inputThe original input URL you providedhttps://youtu.be/4KbrxIpQgkM
transcriptsArray of transcript variants by language[…]
transcripts[].languageHuman‑readable language label from YouTubeEnglish
transcripts[].content (text)Full transcript merged into a single string when outputFormat="text"Hello everyone and welcome…
transcripts[].content[] (timestamp)Array of caption segments when outputFormat="timestamp"[…]
transcripts[].content[].startMsSegment start time in milliseconds1250
transcripts[].content[].endMsSegment end time in milliseconds4280
transcripts[].content[].startTimeSegment start time in mm:ss0:01
transcripts[].content[].textCaption text for the segmentWelcome to the channel.

Notes:

  • Set outputFormat to "text" for one merged string per language, or "timestamp" for structured caption timing.
  • Export results from the Apify dataset in JSON or CSV for downstream analysis, SEO, accessibility, or automation.

Key features

  • ⚡ Fast, flexible transcript output: Choose outputFormat="text" for a single merged transcript or "timestamp" for detailed time‑coded captions.
  • 🗣️ Language filters you control: Toggle includeEnglishAG to include English auto‑generated captions and includeNonEnglish to include non‑English transcripts.
  • 📦 Batch processing at scale: Provide multiple YouTube URLs and save each completed item immediately — perfect for a bulk YouTube transcript downloader workflow.
  • 🔒 Proxy‑backed reliability: Uses Apify RESIDENTIAL proxy by default (when enabled) to reduce IP blocks and stabilize large runs.
  • 💻 Developer‑friendly & API‑ready: Access via the Apify API; outputs are clean JSON for pipelines, integrations, and automation.
  • 🔄 Dataset‑first workflow: Every video’s result is pushed as soon as it finishes so you can stream, monitor, and export during long jobs.

How to use Youtube Transcript Scraper - step by step

  1. Sign in to Apify and open the Youtube Transcript Scraper actor.
  2. Paste one or more video links into urls (accepts both youtube.com/watch?v=… and youtu.be/… formats).
  3. Choose your outputFormat:
    • text for a single merged transcript per language.
    • timestamp for per‑segment timing with startMs, endMs, startTime, text.
  4. Set language filters as needed:
    • includeEnglishAG to include English auto‑generated captions.
    • includeNonEnglish to include non‑English transcripts.
  5. (Optional) Configure proxyConfiguration. By default, the actor attempts to use the Apify RESIDENTIAL proxy to mitigate blocking when enabled.
  6. Start the run. Each processed URL is appended to the dataset immediately as a separate item.
  7. Export the dataset as JSON or CSV and feed it into your analytics, SEO, or automation pipeline.

Pro tip: Orchestrate runs via the Apify API and pipe results to Make, n8n, or Python for an automated YouTube transcript API workflow.

Use cases

Use case nameDescription
SEO + Content repurposingConvert YouTube subtitles to text to create blogs, social captions, and keyword‑rich summaries.
Research & academiaScrape YouTube transcripts for topic modeling, text mining, and qualitative analysis at scale.
Accessibility workflowsDownload auto‑generated YouTube captions and official subtitles to improve accessibility or QA.
Marketing & social teamsExtract quotes and highlights to accelerate campaign assets and video summaries.
Developer pipelines (API)Use the structured JSON output to power chatbots, search indexes, or RAG systems.
Competitive & trend analysisBulk YouTube transcript downloader for channels or lists to analyze messaging and themes.

Why choose Youtube Transcript Scraper?

This production‑ready YouTube caption downloader focuses on precision, scale, and automation — not brittle, manual alternatives.

  • 🎯 Accurate, structured output: Choose simple text or rich timestamps with startMs/endMs/startTime/text.
  • 🌍 Multilingual control: Include English auto‑generated and/or non‑English transcripts when available.
  • 📈 Built for scale: Process many URLs per run and stream results to your dataset as they complete.
  • 💻 Developer access: Clean JSON fits directly into APIs, Python scripts, and automation tools.
  • 🛡️ Reliable vs. extensions: Apify runtime with optional RESIDENTIAL proxy reduces friction compared to browser add‑ons.
  • 💸 Export‑ready: Download your results in JSON/CSV without extra formatting.

Yes — when done responsibly. This actor automates access to transcripts and captions available on YouTube. Use it for analysis, accessibility, or internal research in line with platform terms and applicable laws.

Guidelines:

  • Scrape only public videos and captions you’re allowed to use.
  • Respect YouTube’s Terms of Service and local regulations (e.g., GDPR/CCPA where applicable).
  • Don’t redistribute transcripts commercially without rights from content owners.
  • Do not attempt to access private or restricted content.
  • Consult your legal team for edge cases or commercial redistribution.

Input parameters & output format

Example JSON input

{
"urls": [
"https://www.youtube.com/watch?v=4KbrxIpQgkM",
"https://youtu.be/dQw4w9WgXcQ"
],
"includeEnglishAG": true,
"includeNonEnglish": false,
"outputFormat": "text",
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Parameter reference

FieldTypeRequiredDefaultDescription
urlsarrayYes[]One or more YouTube video URLs to process. Each completed URL is appended immediately to the dataset.
includeEnglishAGbooleanNotrueWhether to include English auto-generated transcripts.
includeNonEnglishbooleanNofalseWhether to include non-English transcripts.
outputFormatstring (enum: "timestamp","text")No"text"Format of transcript output: "timestamp" returns detailed timestamps, "text" returns plain text.
proxyConfigurationobjectNo{}Proxy configuration. Uses Apify RESIDENTIAL proxy by default to bypass YouTube IP blocking. If not configured, will try to use Apify proxy automatically.

Notes on defaults and filtering:

  • UI defaults come from the input schema above.
  • If you omit fields in a raw API call, runtime fallbacks apply in code: includeEnglishAG defaults to false, includeNonEnglish defaults to true, and outputFormat defaults to "text". To avoid surprises, explicitly set these flags.

Example JSON output (outputFormat="text")

{
"id": "4KbrxIpQgkM",
"url": "https://www.youtube.com/watch?v=4KbrxIpQgkM",
"input": "https://youtu.be/4KbrxIpQgkM",
"transcripts": [
{
"language": "English",
"content": "Hello everyone and welcome to our video. Today we will cover..."
}
]
}

Example JSON output (outputFormat="timestamp")

{
"id": "dQw4w9WgXcQ",
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"input": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"transcripts": [
{
"language": "English",
"content": [
{
"startMs": 0,
"endMs": 2140,
"startTime": "0:00",
"text": "We're no strangers to love"
},
{
"startMs": 2140,
"endMs": 4280,
"startTime": "0:02",
"text": "You know the rules and so do I"
}
]
}
]
}

Output field notes:

  • transcripts[].content is either a string (when "text") or an array of segments with startMs, endMs, startTime, text (when "timestamp").
  • The actor pushes one dataset item per input URL as soon as it completes.

FAQ

Do I need a YouTube transcript Chrome extension to use this?

No. This runs on Apify’s infrastructure and exposes a dataset/API, so you can get YouTube video transcripts without installing any browser extension.

Can I scrape YouTube transcripts in bulk?

Yes. Add multiple video links to urls and the actor will process each, saving results to the dataset as they finish — ideal for a bulk YouTube transcript downloader flow.

Does it support auto-generated captions?

Yes. Set includeEnglishAG to true to include English auto‑generated captions. You can also enable includeNonEnglish to include non‑English transcripts when available.

Can I extract subtitles as plain text or with timestamps?

Yes. Set outputFormat to "text" for one merged transcript per language, or "timestamp" for detailed, per‑segment timings.

Can I export SRT files?

Not directly. The actor outputs either plain text or timestamped segments in JSON. You can convert the timestamped JSON to SRT in a post‑processing step if needed.

Is there a YouTube transcript API for developers?

You can run this actor via the Apify API and receive structured JSON/CSV outputs, making it a practical YouTube transcript API alternative for programmatic pipelines.

Will it work without proxies?

It can run without a proxy, but the actor is designed to use the Apify RESIDENTIAL proxy by default (when enabled) to reduce YouTube IP blocking, especially for large jobs.

How are results exported?

All results are pushed to an Apify dataset. From there, you can download in JSON or CSV and integrate into analytics or automation workflows.

Does it handle non-English transcripts?

Yes. Enable includeNonEnglish to include non‑English transcripts when YouTube provides them.

How does this differ from a YouTube transcript Chrome extension?

It’s infrastructure‑backed and API‑ready. You can automate, run in bulk, and export clean JSON/CSV without a browser, making it more robust than extension‑based tools.

Final thoughts

Youtube Transcript Scraper is built to turn YouTube captions into clean, structured data for analysis and reuse. With configurable language filters, plain text or timestamped outputs, batch processing, and proxy‑backed reliability, it’s ideal for marketers, developers, analysts, and researchers. Use the Apify API to automate at scale, export to JSON/CSV, and plug results into your content, accessibility, or AI workflows. Start extracting smarter transcripts today.