YouTube Transcript Scraper
Pricing
$10.00/month + usage
YouTube Transcript Scraper
Extract YouTube transcripts + rich metadata (views, likes, comments, duration, tags) in seconds. Choose from 5 formats: Markdown, TXT, JSON, SRT, CSV. Supports 45 languages, auto-generated captions, Shorts, and proxies. No API key needed.
Pricing
$10.00/month + usage
Rating
0.0
(0)
Developer

ML Data Solutions
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
YouTube Transcript Scraper — Fast, Stealth & Multi-Language
Extract YouTube transcripts in seconds. This Apify Actor retrieves subtitles and full video metadata from any YouTube video or Shorts link — with stealth browser impersonation to avoid detection, multi-language support, and up to six ready-to-use output formats you can select per run.
What does YouTube Transcript Scraper do?
YouTube Transcript Scraper extracts the full subtitles of any YouTube video and delivers them in your choice of formats: Markdown, plain text, JSON, SRT, and CSV — all accessible via clickable links directly in the Output tab. It always saves a metadata.json file with complete video metadata and a fetch log. It fetches rich video metadata (title, channel, views, duration, category, tags, and more) via the YouTube Data API v3, and handles both standard videos and YouTube Shorts.
Why use YouTube Transcript Scraper?
Whether you are building an AI application, training an LLM, conducting market research, or repurposing video content as text — you need a fast and reliable way to extract spoken content from videos. The YouTube Transcript Scraper provides:
- ⚡ Extreme speed — short run time thanks to optimised Python logic; one summary row per video means near-instant dataset writes
- 🕵️ Stealth technology — uses
curl_cffito impersonate real Chrome browsers (131–142), bypassing YouTube's bot detection - 🌍 Multi-language support — detects all available subtitle languages, lets you pick your preferred one, and falls back gracefully if it's not available
- 📋 Rich metadata — title, channel, publish date, duration, views, likes, comments, category, and tags via YouTube Data API v3
- 📄 Selectable output formats — choose only the formats you need (Markdown, plain text, JSON, SRT, CSV) to save run time; all linked in the Output tab
- 🔁 Auto-generated caption support — retrieves both manual and auto-generated (AI) subtitles
- 🔒 Proxy ready — supports Apify Residential and Datacenter proxies with automatic rotation and up to 3 retry attempts with exponential backoff
What data can you extract from YouTube?
Summary dataset (one row per video)
| Field | Description | Example |
|---|---|---|
videoId | YouTube video ID | dQw4w9WgXcQ |
videoUrl | Full URL as provided | https://youtube.com/watch?v=... |
videoTitle | Official video title | "How to Build an AI Agent" |
channel | Channel / author name | "TechWithMark" |
publishDate | Upload date (YYYY-MM-DD) | 2024-01-15 |
videoDuration | Video length (H:MM:SS or MM:SS) | 1:02:03 |
viewCount | Total view count | 1234567 |
likeCount | Like count (if public) | 50000 |
commentCount | Comment count (if enabled) | 1234 |
category | YouTube category name | Education |
tags | Comma-separated tags (up to 10) | "python, ai, tutorial" |
targetLanguage | Language code you requested | en |
activeLanguage | Language actually used | "English" |
activeLanguageCode | Language code actually used | en |
isAutoGenerated | Whether subtitles are auto-generated | false |
availableLanguages | All available languages for re-runs | "en (English), ru (Russian)" |
segmentCount | Number of subtitle segments | 342 |
status | Run result: success or error | success |
transcript_md | Link to transcript.md (if selected) | URL |
transcript_txt | Link to transcript.txt (if selected) | URL |
transcript_json | Link to transcript.json (if selected) | URL |
transcript_srt | Link to transcript.srt (if selected) | URL |
transcript_csv | Link to transcript.csv (if selected) | URL |
metadata_json | Link to metadata.json (always present) | URL |
Key-Value Store files
Each run saves a metadata.json file and a languages.json file on every run. Transcript files are saved only for the formats you selected in the Output Formats checkboxes:
| File | Format | Metadata | Duration | Always saved |
|---|---|---|---|---|
metadata.json | JSON | Full metadata + fetch log | — | ✅ |
languages.json | JSON | — | — | ✅ |
transcript.md | Markdown | Full header | ✅ at end of each line | Optional |
transcript.txt | Plain text | Full header | ✅ at end of each line | Optional |
transcript.json | JSON | Separate metadata object | ✅ field | Optional |
transcript.srt | SRT subtitles | None (standard format) | ✅ encoded as end time | Optional |
transcript.csv | CSV | None | ✅ column | Optional |
How to scrape YouTube transcripts
- Go to the Input tab of this Actor.
- Paste the YouTube video link into the Video URL field. Standard, Shorts, embed, and
youtu.belinks are all supported. - (Optional) Select a Language from the dropdown (default: English). If the selected language isn't available for the video, the Actor will automatically use the first available alternative and notify you.
- (Optional) Check the Output Formats you need. Markdown is selected by default. Uncheck formats you don't need to reduce run time. At least one format must remain selected.
- (Optional) Configure a Proxy. Residential proxies are pre-selected by default and give the best reliability. Datacenter proxies are cheaper but may be blocked by YouTube.
- Click Start and wait 7–10 seconds.
- Open the Output tab — clickable download links for all saved files are shown directly there. No need to navigate to Storage manually.
Pricing — how much does it cost to scrape YouTube transcripts?
YouTube Transcript Scraper is one of the most cost-efficient tools on the Apify Store. Because each run completes in under 10 seconds, it consumes minimal Compute Units (CU).
- Average run time: ~8 seconds
- Estimated cost: Hundreds of videos can be processed on the Apify Free plan
- Scalability: Ideal for bulk pipelines (1,000+ videos) — the fast architecture keeps costs very low at scale
Input
| Field | Type | Required | Description |
|---|---|---|---|
videoUrl | String | ✅ Yes | Full YouTube URL. Validated before run. Supports youtube.com/watch, youtu.be, /shorts/, /embed/ |
languageCode | String | No | Subtitle language, selected from a dropdown (45 languages). Defaults to English (en) |
includeMd | Boolean | No | Save transcript.md. Default: true |
includeTxt | Boolean | No | Save transcript.txt. Default: false |
includeJson | Boolean | No | Save transcript.json. Default: false |
includeSrt | Boolean | No | Save transcript.srt. Default: false |
includeCsv | Boolean | No | Save transcript.csv. Default: false |
proxyConfig | Object | No | Apify Proxy configuration. Residential proxy is pre-selected by default for best reliability |
Supported URL formats:
https://www.youtube.com/watch?v=VIDEO_IDhttps://youtu.be/VIDEO_IDhttps://www.youtube.com/shorts/VIDEO_IDhttps://www.youtube.com/embed/VIDEO_IDyoutube.com/watch?v=VIDEO_ID (without https:// — also accepted)
Output
Dataset — Summary (one row per video)
The dataset contains a single summary row per run with full metadata and clickable links for each saved file:
{"videoId": "dQw4w9WgXcQ","videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ","videoTitle": "How to Build an AI Agent","channel": "TechWithMark","publishDate": "2024-01-15","videoDuration": "1:02:03","viewCount": 1234567,"likeCount": 50000,"commentCount": 1234,"category": "Education","tags": "python, ai, tutorial","targetLanguage": "en","activeLanguage": "English","activeLanguageCode": "en","isAutoGenerated": false,"availableLanguages": "en (English), ru (Russian (auto-generated))","segmentCount": 342,"status": "success","transcript_md": "https://api.apify.com/v2/key-value-stores/.../records/transcript.md","metadata_json": "https://api.apify.com/v2/key-value-stores/.../records/metadata.json"}
File link fields (
transcript_md,transcript_txt, etc.) are only present in the row for formats that were actually saved.
Metadata (metadata.json)
Always saved on every run. Contains the full video metadata and a log of each fetch attempt (proxy type, exit IP, result):
{"Title": "How to Build an AI Agent","Channel": "TechWithMark","Video_id": "dQw4w9WgXcQ","Video_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ","Published_date": "2024-01-15","Duration": "1:02:03","View_count": 1234567,"Like_count": 50000,"Comment_count": 1234,"Category": "Education","Tags": ["python", "ai", "tutorial"],"Target_language": "en","Active_language": "English (en)","Is_auto_generated": false,"Available_for_next_run": "en (English), ru (Russian (auto-generated))","Segment_count": 342,"fetch_log": {"success": true,"attempts": [{ "attempt": "1/3", "proxy": "RESIDENTIAL", "exit_ip": "197.119.27.127", "status": "success" }]}}
Transcript Markdown (transcript.md)
Human-readable report with full metadata header. Duration appears at the end of each line for clean reading:
# How to Build an AI Agent**Channel:** TechWithMark**Published:** 2024-01-15**Duration:** 1:02:03**Views:** 1,234,567 | **Likes:** 50,000 | **Comments:** 1,234**Category:** Education**Tags:** python, ai, tutorial## Transcript: dQw4w9WgXcQ**Target Language:** en**Active Language:** English (en)**Available for the next run:** en (English), ru (Russian (auto-generated))-----**[00:00]** In this video, we are going to explore... *(4.1s)***[00:04]** ...the new features of the YouTube API. *(3.8s)*
This file is perfect for pasting directly into ChatGPT, Claude, Notion, or any LLM workflow.
Transcript Plain Text (transcript.txt)
Same metadata header as Markdown but without formatting symbols — optimised for LLMs, NLP pipelines, and translation tools:
# How to Build an AI AgentChannel: TechWithMarkPublished: 2024-01-15Duration: 1:02:03Views: 1,234,567 | Likes: 50,000 | Comments: 1,234Category: EducationTags: python, ai, tutorialTranscript: dQw4w9WgXcQTarget Language: enActive Language: English (en)Available for the next run: en (English), ru (Russian (auto-generated))-----[00:00] In this video, we are going to explore... (4.1s)[00:04] ...the new features of the YouTube API. (3.8s)
Transcript JSON (transcript.json)
Structured output with a metadata object (all video fields) and a transcript array:
{"metadata": {"Title": "How to Build an AI Agent","Channel": "TechWithMark","Video_id": "dQw4w9WgXcQ","Published_date": "2024-01-15","Duration": "1:02:03","View_count": 1234567,"Like_count": 50000,"Comment_count": 1234,"Category": "Education","Tags": ["python", "ai", "tutorial"],"Target_language": "en","Active_language": "English (en)","Is_auto_generated": false,"Available_for_next_run": "en (English), ru (Russian (auto-generated))"},"transcript": [{ "Timestamp": "00:00", "Subtitle": "In this video, we are going to explore...", "Duration (sec)": 4.1 },{ "Timestamp": "00:04", "Subtitle": "...the new features of the YouTube API.", "Duration (sec)": 3.8 }]}
Transcript SRT (transcript.srt)
Standard subtitle file format — no metadata. Use directly in video editors (Premiere Pro, DaVinci Resolve) or media players:
100:00:00,000 --> 00:00:04,100In this video, we are going to explore...200:00:04,000 --> 00:00:07,800...the new features of the YouTube API.
Transcript CSV (transcript.csv)
No metadata, only transcript data. Open directly in Excel or Google Sheets (UTF-8 BOM encoded for correct display):
Timestamp,Subtitle,Duration (sec)00:00,"In this video, we are going to explore...",4.100:04,"...the new features of the YouTube API.",3.8
Languages (languages.json)
Always saved. Lists all available subtitle languages for the video, including an is_generated flag:
[{ "code": "en", "name": "English", "is_generated": false },{ "code": "ru", "name": "Russian", "is_generated": true }]
FAQ & Troubleshooting
Is it legal to scrape YouTube transcripts?
This Actor only extracts publicly available subtitle data that YouTube itself exposes. It does not access private accounts, personal user data, or any information behind a login. For more context, see Apify's guide on the legality of web scraping.
What if a video has no subtitles?
If the video creator has disabled transcripts, or if the video is too new to have been processed by YouTube yet, the Actor will return with a clear "Subtitles not found for this video." status message and an error record in the dataset.
Can I get auto-generated captions?
Yes. The Actor retrieves all available transcript types — both manual and auto-generated. You can identify which type each language is using the isAutoGenerated field in the dataset row or the Is_auto_generated field in transcript.json.
My preferred language wasn't found — what happens?
If the language code you specified is not available for that video, the Actor will automatically fall back to the first available language, log a warning, and continue. The transcript header will clearly state which language was actually used and list all available alternatives for a re-run.
Can I run this for multiple videos at once?
The current version processes one video per run. For bulk processing, use Apify's scheduling or API to trigger multiple runs in parallel. The fast run time (8 seconds) makes parallel processing very cost-efficient.
Which proxy type should I use — Residential or Datacenter?
Residential proxies (default) use real home IP addresses, making requests look like they come from regular users. YouTube rarely blocks these. This is the recommended option for reliable results.
Datacenter proxies are significantly cheaper and work fine for many videos, but YouTube's bot detection can identify and block datacenter IP ranges — especially during high-volume runs or when scraping popular/restricted videos. If you get a CouldNotRetrieveTranscript error, switching to Residential proxies is usually the fix.
What happens when YouTube blocks a request?
The Actor automatically retries up to 3 times with a new proxy IP and fresh session on each attempt (exponential backoff: 3 s, 6 s, 12 s). Each retry uses a different exit IP and a different Chrome browser profile to minimise the chance of repeated blocks. Full details of every attempt (proxy type, exit IP, and result) are saved in metadata.json under the fetch_log key for later analysis.
Why are likeCount or commentCount sometimes missing?
YouTube channel owners can hide their like counts, and some videos have comments disabled. In those cases the YouTube Data API does not return those fields, and the Actor leaves them as null in the output.
How do I reduce run time?
Uncheck output formats you don't need in the Output Formats section of the Input tab. Each format adds time proportional to the transcript file size. Markdown is the fastest and most useful for most workflows; SRT and JSON are also lightweight. CSV with BOM encoding takes slightly longer for very long transcripts.
Support
If you encounter a bug, have a feature request, or need a custom solution based on this Actor, please open an issue in the Issues tab. Feedback is always welcome.