YouTube Transcript Extractor — AI-Ready Subtitles avatar

YouTube Transcript Extractor — AI-Ready Subtitles

Under maintenance

Pricing

Pay per usage

Go to Apify Store
YouTube Transcript Extractor — AI-Ready Subtitles

YouTube Transcript Extractor — AI-Ready Subtitles

Under maintenance

Extracts subtitles/transcripts from YouTube videos. Input a video URL or ID, get clean text output with metadata. Ideal for AI training data collection, content analysis, and LLM training pipelines.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

陈俊杰

陈俊杰

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

9 days ago

Last modified

Share

Extract clean subtitle/transcript text from any YouTube video with subtitles. Designed for AI training data pipelines, content analysis, and LLM training.

Features

  • 🎯 Input a YouTube URL or bare video ID
  • 🌐 Supports manual and auto-generated captions
  • 🌍 Multi-language — specify any ISO 639-1 language code (default: en)
  • ⏱ Optional [MM:SS] timestamps in output
  • 🧹 Clean, join-transcript format
  • 📊 Rich metadata: video_id, duration, word count, language
  • 🛡️ Robust error handling with descriptive error messages

Input

FieldTypeRequiredDefaultDescription
video_urlstringYouTube URL (any format) or bare video ID
languagestringenISO 639-1 language code
include_timestampsboolfalseAdd [MM:SS] before each subtitle line

Output (one item per run)

FieldTypeDescription
video_idstring11-char YouTube video ID
titlestringVideo title (if retrievable)
durationintApproximate duration in seconds
languagestringLanguage code of the transcript
transcript_typestring"manual" or "auto-generated"
transcriptstringFull clean text of the subtitles
word_countintWord count of the transcript
urlstringFull YouTube URL

Supported URL formats

  • https://www.youtube.com/watch?v=VIDEO_ID
  • https://youtu.be/VIDEO_ID
  • https://www.youtube.com/embed/VIDEO_ID
  • https://www.youtube.com/shorts/VIDEO_ID
  • Bare VIDEO_ID (11 characters)

Use Cases

  • AI/LLM Training Data — collect natural language text from millions of YouTube videos
  • Content Analysis — analyze video content at scale for SEO, research, or moderation
  • Accessibility — extract captions for further processing or translation
  • Dataset Building — build large text corpora from video subtitles

Built with youtube_transcript_api ❤️