Pricing

$10.00/month + usage

YouTube Transcript Scraper

Extract YouTube transcripts + rich metadata (views, likes, comments, duration, tags) in seconds. Choose from 5 formats: Markdown, TXT, JSON, SRT, CSV. Supports 45 languages, auto-generated captions, Shorts, and proxies. No API key needed.

Pricing

$10.00/month + usage

Rating

0.0

(0)

Developer

ML Data Solutions

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

YouTube Transcript Scraper — Fast, Stealth & Multi-Language

Extract YouTube transcripts in seconds. This Apify Actor retrieves subtitles and full video metadata from any YouTube video or Shorts link — with stealth browser impersonation to avoid detection, multi-language support, and up to six ready-to-use output formats you can select per run.

What does YouTube Transcript Scraper do?

YouTube Transcript Scraper extracts the full subtitles of any YouTube video and delivers them in your choice of formats: Markdown, plain text, JSON, SRT, and CSV — all accessible via clickable links directly in the Output tab. It always saves a metadata.json file with complete video metadata and a fetch log. It fetches rich video metadata (title, channel, views, duration, category, tags, and more) via the YouTube Data API v3, and handles both standard videos and YouTube Shorts.

Why use YouTube Transcript Scraper?

Whether you are building an AI application, training an LLM, conducting market research, or repurposing video content as text — you need a fast and reliable way to extract spoken content from videos. The YouTube Transcript Scraper provides:

⚡ Extreme speed — short run time thanks to optimised Python logic; one summary row per video means near-instant dataset writes
🕵️ Stealth technology — uses curl_cffi to impersonate real Chrome browsers (131–142), bypassing YouTube's bot detection
🌍 Multi-language support — detects all available subtitle languages, lets you pick your preferred one, and falls back gracefully if it's not available
📋 Rich metadata — title, channel, publish date, duration, views, likes, comments, category, and tags via YouTube Data API v3
📄 Selectable output formats — choose only the formats you need (Markdown, plain text, JSON, SRT, CSV) to save run time; all linked in the Output tab
🔁 Auto-generated caption support — retrieves both manual and auto-generated (AI) subtitles
🔒 Proxy ready — supports Apify Residential and Datacenter proxies with automatic rotation and up to 3 retry attempts with exponential backoff

What data can you extract from YouTube?

Summary dataset (one row per video)

Field	Description	Example
`videoId`	YouTube video ID	`dQw4w9WgXcQ`
`videoUrl`	Full URL as provided	`https://youtube.com/watch?v=...`
`videoTitle`	Official video title	`"How to Build an AI Agent"`
`channel`	Channel / author name	`"TechWithMark"`
`publishDate`	Upload date (YYYY-MM-DD)	`2024-01-15`
`videoDuration`	Video length (H:MM:SS or MM:SS)	`1:02:03`
`viewCount`	Total view count	`1234567`
`likeCount`	Like count (if public)	`50000`
`commentCount`	Comment count (if enabled)	`1234`
`category`	YouTube category name	`Education`
`tags`	Comma-separated tags (up to 10)	`"python, ai, tutorial"`
`targetLanguage`	Language code you requested	`en`
`activeLanguage`	Language actually used	`"English"`
`activeLanguageCode`	Language code actually used	`en`
`isAutoGenerated`	Whether subtitles are auto-generated	`false`
`availableLanguages`	All available languages for re-runs	`"en (English), ru (Russian)"`
`segmentCount`	Number of subtitle segments	`342`
`status`	Run result: `success` or `error`	`success`
`transcript_md`	Link to `transcript.md` (if selected)	URL
`transcript_txt`	Link to `transcript.txt` (if selected)	URL
`transcript_json`	Link to `transcript.json` (if selected)	URL
`transcript_srt`	Link to `transcript.srt` (if selected)	URL
`transcript_csv`	Link to `transcript.csv` (if selected)	URL
`metadata_json`	Link to `metadata.json` (always present)	URL

Key-Value Store files

Each run saves a metadata.json file and a languages.json file on every run. Transcript files are saved only for the formats you selected in the Output Formats checkboxes:

File	Format	Metadata	Duration	Always saved
`metadata.json`	JSON	Full metadata + fetch log	—	✅
`languages.json`	JSON	—	—	✅
`transcript.md`	Markdown	Full header	✅ at end of each line	Optional
`transcript.txt`	Plain text	Full header	✅ at end of each line	Optional
`transcript.json`	JSON	Separate `metadata` object	✅ field	Optional
`transcript.srt`	SRT subtitles	None (standard format)	✅ encoded as end time	Optional
`transcript.csv`	CSV	None	✅ column	Optional

How to scrape YouTube transcripts

Go to the Input tab of this Actor.
Paste the YouTube video link into the Video URL field. Standard, Shorts, embed, and youtu.be links are all supported.
(Optional) Select a Language from the dropdown (default: English). If the selected language isn't available for the video, the Actor will automatically use the first available alternative and notify you.
(Optional) Check the Output Formats you need. Markdown is selected by default. Uncheck formats you don't need to reduce run time. At least one format must remain selected.
(Optional) Configure a Proxy. Residential proxies are pre-selected by default and give the best reliability. Datacenter proxies are cheaper but may be blocked by YouTube.
Click Start and wait 7–10 seconds.
Open the Output tab — clickable download links for all saved files are shown directly there. No need to navigate to Storage manually.

Pricing — how much does it cost to scrape YouTube transcripts?

YouTube Transcript Scraper is one of the most cost-efficient tools on the Apify Store. Because each run completes in under 10 seconds, it consumes minimal Compute Units (CU).

Average run time: ~8 seconds
Estimated cost: Hundreds of videos can be processed on the Apify Free plan
Scalability: Ideal for bulk pipelines (1,000+ videos) — the fast architecture keeps costs very low at scale

Input

Field	Type	Required	Description
`videoUrl`	String	✅ Yes	Full YouTube URL. Validated before run. Supports `youtube.com/watch`, `youtu.be`, `/shorts/`, `/embed/`
`languageCode`	String	No	Subtitle language, selected from a dropdown (45 languages). Defaults to English (`en`)
`includeMd`	Boolean	No	Save `transcript.md`. Default: `true`
`includeTxt`	Boolean	No	Save `transcript.txt`. Default: `false`
`includeJson`	Boolean	No	Save `transcript.json`. Default: `false`
`includeSrt`	Boolean	No	Save `transcript.srt`. Default: `false`
`includeCsv`	Boolean	No	Save `transcript.csv`. Default: `false`
`proxyConfig`	Object	No	Apify Proxy configuration. Residential proxy is pre-selected by default for best reliability

Supported URL formats:

https://www.youtube.com/watch?v=VIDEO_ID
https://youtu.be/VIDEO_ID
https://www.youtube.com/shorts/VIDEO_ID
https://www.youtube.com/embed/VIDEO_ID
youtube.com/watch?v=VIDEO_ID   (without https:// — also accepted)

Output

Dataset — Summary (one row per video)

The dataset contains a single summary row per run with full metadata and clickable links for each saved file:

{
  "videoId": "dQw4w9WgXcQ",
  "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "videoTitle": "How to Build an AI Agent",
  "channel": "TechWithMark",
  "publishDate": "2024-01-15",
  "videoDuration": "1:02:03",
  "viewCount": 1234567,
  "likeCount": 50000,
  "commentCount": 1234,
  "category": "Education",
  "tags": "python, ai, tutorial",
  "targetLanguage": "en",
  "activeLanguage": "English",
  "activeLanguageCode": "en",
  "isAutoGenerated": false,
  "availableLanguages": "en (English), ru (Russian (auto-generated))",
  "segmentCount": 342,
  "status": "success",
  "transcript_md": "https://api.apify.com/v2/key-value-stores/.../records/transcript.md",
  "metadata_json": "https://api.apify.com/v2/key-value-stores/.../records/metadata.json"
}

File link fields (transcript_md, transcript_txt, etc.) are only present in the row for formats that were actually saved.

Metadata (`metadata.json`)

Always saved on every run. Contains the full video metadata and a log of each fetch attempt (proxy type, exit IP, result):

{
  "Title": "How to Build an AI Agent",
  "Channel": "TechWithMark",
  "Video_id": "dQw4w9WgXcQ",
  "Video_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "Published_date": "2024-01-15",
  "Duration": "1:02:03",
  "View_count": 1234567,
  "Like_count": 50000,
  "Comment_count": 1234,
  "Category": "Education",
  "Tags": ["python", "ai", "tutorial"],
  "Target_language": "en",
  "Active_language": "English (en)",
  "Is_auto_generated": false,
  "Available_for_next_run": "en (English), ru (Russian (auto-generated))",
  "Segment_count": 342,
  "fetch_log": {
    "success": true,
    "attempts": [
      { "attempt": "1/3", "proxy": "RESIDENTIAL", "exit_ip": "197.119.27.127", "status": "success" }
    ]
  }
}

Transcript Markdown (`transcript.md`)

Human-readable report with full metadata header. Duration appears at the end of each line for clean reading:

# How to Build an AI Agent
**Channel:** TechWithMark
**Published:** 2024-01-15
**Duration:** 1:02:03
**Views:** 1,234,567  |  **Likes:** 50,000  |  **Comments:** 1,234
**Category:** Education
**Tags:** python, ai, tutorial
## Transcript: dQw4w9WgXcQ
**Target Language:** en
**Active Language:** English (en)
**Available for the next run:** en (English), ru (Russian (auto-generated))

----- 

**[00:00]** In this video, we are going to explore... *(4.1s)*
**[00:04]** ...the new features of the YouTube API. *(3.8s)*

This file is perfect for pasting directly into ChatGPT, Claude, Notion, or any LLM workflow.

Transcript Plain Text (`transcript.txt`)

Same metadata header as Markdown but without formatting symbols — optimised for LLMs, NLP pipelines, and translation tools:

# How to Build an AI Agent
Channel: TechWithMark
Published: 2024-01-15
Duration: 1:02:03
Views: 1,234,567  |  Likes: 50,000  |  Comments: 1,234
Category: Education
Tags: python, ai, tutorial

Transcript: dQw4w9WgXcQ
Target Language: en
Active Language: English (en)
Available for the next run: en (English), ru (Russian (auto-generated))

-----

[00:00] In this video, we are going to explore... (4.1s)
[00:04] ...the new features of the YouTube API. (3.8s)

Transcript JSON (`transcript.json`)

Structured output with a metadata object (all video fields) and a transcript array:

{
  "metadata": {
    "Title": "How to Build an AI Agent",
    "Channel": "TechWithMark",
    "Video_id": "dQw4w9WgXcQ",
    "Published_date": "2024-01-15",
    "Duration": "1:02:03",
    "View_count": 1234567,
    "Like_count": 50000,
    "Comment_count": 1234,
    "Category": "Education",
    "Tags": ["python", "ai", "tutorial"],
    "Target_language": "en",
    "Active_language": "English (en)",
    "Is_auto_generated": false,
    "Available_for_next_run": "en (English), ru (Russian (auto-generated))"
  },
  "transcript": [
    { "Timestamp": "00:00", "Subtitle": "In this video, we are going to explore...", "Duration (sec)": 4.1 },
    { "Timestamp": "00:04", "Subtitle": "...the new features of the YouTube API.", "Duration (sec)": 3.8 }
  ]
}

Transcript SRT (`transcript.srt`)

Standard subtitle file format — no metadata. Use directly in video editors (Premiere Pro, DaVinci Resolve) or media players:

1
00:00:00,000 --> 00:00:04,100
In this video, we are going to explore...

2
00:00:04,000 --> 00:00:07,800
...the new features of the YouTube API.

Transcript CSV (`transcript.csv`)

No metadata, only transcript data. Open directly in Excel or Google Sheets (UTF-8 BOM encoded for correct display):

Timestamp,Subtitle,Duration (sec)
00:00,"In this video, we are going to explore...",4.1
00:04,"...the new features of the YouTube API.",3.8

Languages (`languages.json`)

Always saved. Lists all available subtitle languages for the video, including an is_generated flag:

[
  { "code": "en", "name": "English", "is_generated": false },
  { "code": "ru", "name": "Russian", "is_generated": true }
]

FAQ & Troubleshooting

Is it legal to scrape YouTube transcripts?

This Actor only extracts publicly available subtitle data that YouTube itself exposes. It does not access private accounts, personal user data, or any information behind a login. For more context, see Apify's guide on the legality of web scraping.

What if a video has no subtitles?

If the video creator has disabled transcripts, or if the video is too new to have been processed by YouTube yet, the Actor will return with a clear "Subtitles not found for this video." status message and an error record in the dataset.

Can I get auto-generated captions?

Yes. The Actor retrieves all available transcript types — both manual and auto-generated. You can identify which type each language is using the isAutoGenerated field in the dataset row or the Is_auto_generated field in transcript.json.

My preferred language wasn't found — what happens?

If the language code you specified is not available for that video, the Actor will automatically fall back to the first available language, log a warning, and continue. The transcript header will clearly state which language was actually used and list all available alternatives for a re-run.

Can I run this for multiple videos at once?

The current version processes one video per run. For bulk processing, use Apify's scheduling or API to trigger multiple runs in parallel. The fast run time (8 seconds) makes parallel processing very cost-efficient.

Which proxy type should I use — Residential or Datacenter?

Residential proxies (default) use real home IP addresses, making requests look like they come from regular users. YouTube rarely blocks these. This is the recommended option for reliable results.

Datacenter proxies are significantly cheaper and work fine for many videos, but YouTube's bot detection can identify and block datacenter IP ranges — especially during high-volume runs or when scraping popular/restricted videos. If you get a CouldNotRetrieveTranscript error, switching to Residential proxies is usually the fix.

What happens when YouTube blocks a request?

The Actor automatically retries up to 3 times with a new proxy IP and fresh session on each attempt (exponential backoff: 3 s, 6 s, 12 s). Each retry uses a different exit IP and a different Chrome browser profile to minimise the chance of repeated blocks. Full details of every attempt (proxy type, exit IP, and result) are saved in metadata.json under the fetch_log key for later analysis.

Why are `likeCount` or `commentCount` sometimes missing?

YouTube channel owners can hide their like counts, and some videos have comments disabled. In those cases the YouTube Data API does not return those fields, and the Actor leaves them as null in the output.

How do I reduce run time?

Uncheck output formats you don't need in the Output Formats section of the Input tab. Each format adds time proportional to the transcript file size. Markdown is the fastest and most useful for most workflows; SRT and JSON are also lightweight. CSV with BOM encoding takes slightly longer for very long transcripts.

Support

If you encounter a bug, have a feature request, or need a custom solution based on this Actor, please open an issue in the Issues tab. Feedback is always welcome.

Resources

Youtube Transcript Scraper

yasmany.casanova/youtube-transcript-scraper

Extract transcripts and subtitles from YouTube videos, channels, and playlists. Supports multiple languages, auto-generated captions, translation, batch processing, and 5 export formats (JSON, SRT, VTT). No API key required.

Yasmany Grijalba Casanova

YouTube Transcript Scraper

elaborate_statue/youtube-transcript-scraper

Extract transcripts (captions) from YouTube videos with timestamps. Supports manual and auto-generated captions in 50+ languages. Outputs JSON, plain text, or SRT format.

Alex Kim

YouTube Transcript API

glassventures/youtube-transcript-api

Extract transcripts, captions, and subtitles from YouTube videos. Supports 100+ languages, auto-generated captions, SRT/VTT export, playlists, and channels.

Glass Ventures

YouTube Transcript Scraper

akash9078/youtube-transcript-scraper

YouTube Transcript Scraper & Extractor API — Extract transcripts, captions & subtitles from YouTube videos, Shorts & VODs without an API key. Supports auto-generated and manual captions in 100+ languages with translation, batch extraction & clean JSON for AI agents, RAG, SEO & automation.

Akash Kumar Naik

1.1K

4.9

📝 YouTube Transcript Scraper - Captions to Text

benthepythondev/youtube-transcript-scraper

Extract transcripts from any YouTube video with captions. Supports 100+ languages, auto-generated captions, and translation. Output as plain text, SRT, VTT, or JSON with timestamps. Includes video metadata (title, channel, views). Perfect for content repurposing and AI training.

ben

141

YouTube Transcript with Translation (VidScribe)

kelvinosse/youtube-transcript-with-translation

Extract subtitles & captions from any YouTube video in seconds. Supports auto-generated and manual captions in 150+ languages. Optionally translate transcripts into 30 languages using AI. Output as JSON, plain text, or SRT subtitle files.

Kelvin

YouTube Transcript Scraper

calm_builder/youtube-transcript-scraper

Download transcripts from public YouTube videos in JSON, TXT, or SRT format. Fetch multiple video transcripts in one run, choose preferred languages, and save each transcript file with clear video and language references.

Coder

5.0

YouTube Transcript Extractor

akash9078/youtube-transcript-extractor

Extract accurate YouTube video transcripts and timestamped captions without an API key. Supports YouTube Shorts, live stream VODs, Premieres, embedded videos, and auto-generated subtitles in 100+ languages. Fast YouTube transcript extractor for SEO, AI, research, and content creation.

Akash Kumar Naik

YouTube Transcript Scraper And Formatter

matthewjames/youtube-transcript-scraper-and-formatter

Extracts auto-generated YouTube transcripts from videos and formats them in plain text, SRT, and VTT format.

Matthew James

5.0

YouTube Transcript Scraper

cloud9_ai/youtube-transcript-scraper

Extract transcripts and captions from YouTube videos via InnerTube API. Support for auto-generated and manual captions in multiple languages. Get timestamped text segments.

cloud9

YouTube Transcript Scraper

YouTube Transcript Scraper — Fast, Stealth & Multi-Language

What does YouTube Transcript Scraper do?

Why use YouTube Transcript Scraper?

What data can you extract from YouTube?

Summary dataset (one row per video)

Key-Value Store files

How to scrape YouTube transcripts

Pricing — how much does it cost to scrape YouTube transcripts?

Input

Output

Dataset — Summary (one row per video)

Metadata (metadata.json)

Transcript Markdown (transcript.md)

Transcript Plain Text (transcript.txt)

Transcript JSON (transcript.json)

Transcript SRT (transcript.srt)

Transcript CSV (transcript.csv)

Languages (languages.json)

FAQ & Troubleshooting

Is it legal to scrape YouTube transcripts?

What if a video has no subtitles?

Can I get auto-generated captions?

My preferred language wasn't found — what happens?

Can I run this for multiple videos at once?

Which proxy type should I use — Residential or Datacenter?

What happens when YouTube blocks a request?

Why are likeCount or commentCount sometimes missing?

How do I reduce run time?

Support

Resources

You might also like

Youtube Transcript Scraper

YouTube Transcript Scraper

YouTube Transcript API

YouTube Transcript Scraper

📝 YouTube Transcript Scraper - Captions to Text

YouTube Transcript with Translation (VidScribe)

YouTube Transcript Scraper

YouTube Transcript Extractor

YouTube Transcript Scraper And Formatter

YouTube Transcript Scraper

Metadata (`metadata.json`)

Transcript Markdown (`transcript.md`)

Transcript Plain Text (`transcript.txt`)

Transcript JSON (`transcript.json`)

Transcript SRT (`transcript.srt`)

Transcript CSV (`transcript.csv`)

Languages (`languages.json`)

Why are `likeCount` or `commentCount` sometimes missing?