YouTube Transcript & Metadata Extractor (LLM-ready) avatar
YouTube Transcript & Metadata Extractor (LLM-ready)

Pricing

$2.00/month + usage

Go to Apify Store
YouTube Transcript & Metadata Extractor (LLM-ready)

YouTube Transcript & Metadata Extractor (LLM-ready)

Extract full YouTube transcripts and video metadata in one run. Includes LLM-ready full text, timestamped segments, and engagement stats — perfect for AI pipelines, automation, and content analysis. Fast, clean, and production-ready.

Pricing

$2.00/month + usage

Rating

5.0

(1)

Developer

Joca

Joca

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

5 days ago

Last modified

Share

YouTube Transcript & Metadata Extractor (LLM-ready) 🔥🎬

Extract video transcripts (subtitles/captions) and detailed metadata from YouTube URLs.

Reads input 📥 and writes structured transcript data 📦.

Common Use Cases 💡

  • Feed full transcripts into LLMs (summaries, embeddings, RAG)
  • Analyze YouTube content at scale
  • Build SEO or content research pipelines
  • Power n8n / Make / Zapier workflows

Features 🚀✨

  • 🎞️ Works with video
  • 🎯 Handles single or multiple YouTube URLs via Input
  • 📝 Fetches transcripts/captions when available
  • 📊 Outputs full text, metadata (views, likes, author), and raw transcript segments

Input Configuration 🛠️

Provide input as a JSON object:

  • startUrls (array): List of YouTube URLs to process.
  • videoIds (array): List of YouTube Video IDs to process.
  • lang (string, optional): Preferred language for transcripts (e.g., "en", "es").

Example Input

{
"startUrls": [
{ "url": "https://www.youtube.com/watch?v=BthfXVCRWEQ" }
],
"lang": "en"
}

Output 📦

For each video, the Actor pushes a result to the dataset containing:

  • video_id: The ID of the video
  • url: The input URL
  • title: Video title
  • description: Video description
  • views: View count
  • likes: Like count
  • author: Channel information
  • full_text: The complete transcript text joined together (Perfect for LLM context!)
  • segments: Array of transcript segments with timestamps

Example Output

{
"video_id": "BthfXVCRWEQ",
"url": "https://www.youtube.com/watch?v=BthfXVCRWEQ",
"title": "Hatching the World's Biggest Egg!",
"description": "I'm hatching 3 eggs to see which one makes the best pet.",
"views": 15000000,
"likes": 500000,
"author": {
"name": "Mark Rober",
"channelId": "UC...",
"url": "..."
},
"duration": 600,
"is_family_safe": true,
"full_text": "Here is the world's biggest egg... and here is one of the smallest. I'm hatching 3 to see which one makes the best pet. ...",
"segments": [
{
"start": 0.12,
"end": 5.2,
"text": "Here is the world's biggest egg...\nand here is one of the smallest."
},
{
"start": 5.72,
"end": 9.6,
"text": "I'm hatching 3 to see which one makes the\nbest pet."
},
...
]
}