YouTube Transcript Extractor — AI-Ready Subtitles
Under maintenancePricing
Pay per usage
Go to Apify Store
YouTube Transcript Extractor — AI-Ready Subtitles
Under maintenanceExtracts subtitles/transcripts from YouTube videos. Input a video URL or ID, get clean text output with metadata. Ideal for AI training data collection, content analysis, and LLM training pipelines.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
陈俊杰
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
9 days ago
Last modified
Categories
Share
Extract clean subtitle/transcript text from any YouTube video with subtitles. Designed for AI training data pipelines, content analysis, and LLM training.
Features
- 🎯 Input a YouTube URL or bare video ID
- 🌐 Supports manual and auto-generated captions
- 🌍 Multi-language — specify any ISO 639-1 language code (default:
en) - ⏱ Optional
[MM:SS]timestamps in output - 🧹 Clean, join-transcript format
- 📊 Rich metadata: video_id, duration, word count, language
- 🛡️ Robust error handling with descriptive error messages
Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
video_url | string | ✅ | — | YouTube URL (any format) or bare video ID |
language | string | ❌ | en | ISO 639-1 language code |
include_timestamps | bool | ❌ | false | Add [MM:SS] before each subtitle line |
Output (one item per run)
| Field | Type | Description |
|---|---|---|
video_id | string | 11-char YouTube video ID |
title | string | Video title (if retrievable) |
duration | int | Approximate duration in seconds |
language | string | Language code of the transcript |
transcript_type | string | "manual" or "auto-generated" |
transcript | string | Full clean text of the subtitles |
word_count | int | Word count of the transcript |
url | string | Full YouTube URL |
Supported URL formats
https://www.youtube.com/watch?v=VIDEO_IDhttps://youtu.be/VIDEO_IDhttps://www.youtube.com/embed/VIDEO_IDhttps://www.youtube.com/shorts/VIDEO_ID- Bare
VIDEO_ID(11 characters)
Use Cases
- AI/LLM Training Data — collect natural language text from millions of YouTube videos
- Content Analysis — analyze video content at scale for SEO, research, or moderation
- Accessibility — extract captions for further processing or translation
- Dataset Building — build large text corpora from video subtitles
Built with youtube_transcript_api ❤️