Lightning YouTube Scraper (Transcript & Metadata) avatar

Lightning YouTube Scraper (Transcript & Metadata)

Pricing

from $1.50 / 1,000 results

Go to Apify Store
Lightning YouTube Scraper (Transcript & Metadata)

Lightning YouTube Scraper (Transcript & Metadata)

Extract full transcripts (subtitles) and metadata from YouTube videos instantly without opening a browser. Perfect for AI, LLMs, and content summarization.

Pricing

from $1.50 / 1,000 results

Rating

0.0

(0)

Developer

Tan Yegen

Tan Yegen

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

⚡ Lightning YouTube Scraper (Transcript & Metadata)

🤖 Copy to your AI assistant

Copy this block into ChatGPT, Claude, Cursor, or any LLM to start using this actor.

tyegen/universal-youtube-transcript-extractor on Apify. Call: ApifyClient("TOKEN").actor("tyegen/universal-youtube-transcript-extractor").call(run_input={"startUrls": ["URL_HERE"]}), then client.dataset(run["defaultDatasetId"]).list_items().items for results.

Unlock the hidden textual data of YouTube at unprecedented speeds. Extract full transcripts (subtitles) and rich metadata from any YouTube video instantly—without the overhead of browser automation, without battling official API quota limits, and at nearly $0.00 in Compute Unit costs.

🚀 The Game-Changing Technology (How it Works)

Most YouTube scrapers on the market rely on heavy, resource-intensive browser automation tools like Playwright or Puppeteer. They physically open a browser, load the heavy YouTube interface, scroll down to force elements to render, and scrape the DOM. This is slow, prone to breaking, and expensive. Alternatively, they use the official YouTube Data API, which requires API keys and imposes strict, costly quota limits.

This actor uses a hidden backdoor approach: It targets the internal ytInitialPlayerResponse JSON embedded directly within the raw HTML payload of a YouTube video page. It then talks directly to Google's backend caption servers to download the subtitle XML files.

✨ Unbeatable Features

  • Ultra-Lightning Speed: Extracts a 2-hour long podcast transcript in exactly 1 second.
  • No API Keys Needed: Completely bypasses the YouTube Data API limitations. Zero setup required.
  • Incredibly Cost-Effective: Uses pure, lightweight HTTP requests. It costs a fraction of a cent per video in Apify Compute Units.
  • Clean, Formatted Output: Automatically decodes messy XML entities (like & or ') and merges timestamped captions into a beautiful, readable, and continuous text block.
  • Rich Metadata Included: Alongside the transcript, it fetches the video title, author, view count, length in seconds, and high-res thumbnail URL.

🎯 Ideal Use Cases & Target Audience

  • AI & LLM Training (RAG Pipelines): Feed thousands of hours of rich, conversational podcast transcripts into your Retrieval-Augmented Generation pipelines or fine-tuning datasets.
  • Content Summarization Agents: Build automated workflows that grab video texts instantly and pass them to ChatGPT, Claude, or Gemini for rapid summarization and key-takeaway extraction.
  • Competitor & SEO Analysis: Extract text from competitor videos to analyze their spoken keywords, hooks, content structure, and pacing.
  • Content Repurposing: Instantly convert your own YouTube videos into blog posts, newsletters, or Twitter threads.

💰 Pricing & ROI

Pay-Per-Result: Only $1.50 per 1,000 videos. You get the full metadata PLUS the entire video transcript for a price no competitor can match. Your compute costs will remain near zero.


📥 Input Configuration

FieldTypeDescription
startUrlsArrayA list of YouTube video URLs (e.g., https://www.youtube.com/watch?v=dQw4w9WgXcQ).
proxyConfigurationObjectStandard Apify Datacenter proxies work flawlessly for this hidden API approach.

📤 Output Schema

For each video URL, the actor will produce a clean JSON object containing the metadata and transcript.

FieldTypeDescription
urlStringThe original YouTube video URL.
videoIdStringThe unique 11-character YouTube video ID.
titleStringThe title of the video.
authorStringThe name of the channel/creator.
viewsNumberTotal view count.
lengthSecondsNumberDuration of the video in seconds.
thumbnailStringURL to the highest resolution thumbnail available.
transcriptLanguageStringThe detected language of the transcript (e.g., "English").
transcriptStringThe full, cleaned text of the video's subtitles.
scrapedAtStringISO timestamp of when the extraction occurred.

💡 Output Example

{
"url": "https://www.youtube.com/watch?v=M98G...",
"videoId": "M98G...",
"title": "Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI",
"author": "Lex Fridman",
"views": 5400231,
"lengthSeconds": 8700,
"thumbnail": "https://i.ytimg.com/vi/M98G.../maxresdefault.jpg",
"transcriptLanguage": "English",
"transcript": "Hello and welcome to the Lex Fridman podcast. Today my guest is Sam Altman. We discuss the future of artificial intelligence... (thousands of words of clean text)",
"scrapedAt": "2026-04-30T18:00:00.000Z"
}

⚠️ Limitations & Good to Know

  • No Captions Available: If a video does not have auto-generated or manual captions enabled by the creator, the transcript field will return null.
  • Private/Age-Restricted Videos: Videos that require user login or age verification cannot be scraped by this actor.