# Lightning YouTube Scraper (Transcript & Metadata) (`tyegen/universal-youtube-transcript-extractor`) Actor

Extract full transcripts (subtitles) and metadata from YouTube videos instantly without opening a browser. Perfect for AI, LLMs, and content summarization.

- **URL**: https://apify.com/tyegen/universal-youtube-transcript-extractor.md
- **Developed by:** [Tan Yegen](https://apify.com/tyegen) (community)
- **Categories:** SEO tools, Social media, Videos
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $1.50 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## ⚡ Lightning YouTube Scraper (Transcript & Metadata)

### 🤖 Copy to your AI assistant
Copy this block into ChatGPT, Claude, Cursor, or any LLM to start using this actor.

```text
tyegen/universal-youtube-transcript-extractor on Apify. Call: ApifyClient("TOKEN").actor("tyegen/universal-youtube-transcript-extractor").call(run_input={"startUrls": ["URL_HERE"]}), then client.dataset(run["defaultDatasetId"]).list_items().items for results.
````

Unlock the hidden textual data of YouTube at unprecedented speeds. Extract full transcripts (subtitles) and rich metadata from any YouTube video instantly—without the overhead of browser automation, without battling official API quota limits, and at nearly $0.00 in Compute Unit costs.

### 🚀 The Game-Changing Technology (How it Works)

Most YouTube scrapers on the market rely on heavy, resource-intensive browser automation tools like Playwright or Puppeteer. They physically open a browser, load the heavy YouTube interface, scroll down to force elements to render, and scrape the DOM. This is slow, prone to breaking, and expensive. Alternatively, they use the official YouTube Data API, which requires API keys and imposes strict, costly quota limits.

**This actor uses a hidden backdoor approach:**
It targets the internal `ytInitialPlayerResponse` JSON embedded directly within the raw HTML payload of a YouTube video page. It then talks directly to Google's backend caption servers to download the subtitle XML files.

#### ✨ Unbeatable Features

- **Ultra-Lightning Speed:** Extracts a 2-hour long podcast transcript in exactly **1 second**.
- **No API Keys Needed:** Completely bypasses the YouTube Data API limitations. Zero setup required.
- **Incredibly Cost-Effective:** Uses pure, lightweight HTTP requests. It costs a fraction of a cent per video in Apify Compute Units.
- **Clean, Formatted Output:** Automatically decodes messy XML entities (like `&amp;` or `&#39;`) and merges timestamped captions into a beautiful, readable, and continuous text block.
- **Rich Metadata Included:** Alongside the transcript, it fetches the video title, author, view count, length in seconds, and high-res thumbnail URL.

### 🎯 Ideal Use Cases & Target Audience

- **AI & LLM Training (RAG Pipelines):** Feed thousands of hours of rich, conversational podcast transcripts into your Retrieval-Augmented Generation pipelines or fine-tuning datasets.
- **Content Summarization Agents:** Build automated workflows that grab video texts instantly and pass them to ChatGPT, Claude, or Gemini for rapid summarization and key-takeaway extraction.
- **Competitor & SEO Analysis:** Extract text from competitor videos to analyze their spoken keywords, hooks, content structure, and pacing.
- **Content Repurposing:** Instantly convert your own YouTube videos into blog posts, newsletters, or Twitter threads.

### 💰 Pricing & ROI

**Pay-Per-Result:** Only **$1.50 per 1,000 videos**.
You get the full metadata PLUS the entire video transcript for a price no competitor can match. Your compute costs will remain near zero.

***

### 📥 Input Configuration

| Field | Type | Description |
| ----- | ---- | ----------- |
| `startUrls` | Array | A list of YouTube video URLs (e.g., `https://www.youtube.com/watch?v=dQw4w9WgXcQ`). |
| `proxyConfiguration` | Object | Standard Apify Datacenter proxies work flawlessly for this hidden API approach. |

***

### 📤 Output Schema

For each video URL, the actor will produce a clean JSON object containing the metadata and transcript.

| Field | Type | Description |
| ----- | ---- | ----------- |
| `url` | String | The original YouTube video URL. |
| `videoId` | String | The unique 11-character YouTube video ID. |
| `title` | String | The title of the video. |
| `author` | String | The name of the channel/creator. |
| `views` | Number | Total view count. |
| `lengthSeconds` | Number | Duration of the video in seconds. |
| `thumbnail` | String | URL to the highest resolution thumbnail available. |
| `transcriptLanguage` | String | The detected language of the transcript (e.g., "English"). |
| `transcript` | String | The full, cleaned text of the video's subtitles. |
| `scrapedAt` | String | ISO timestamp of when the extraction occurred. |

#### 💡 Output Example

```json
{
  "url": "https://www.youtube.com/watch?v=M98G...",
  "videoId": "M98G...",
  "title": "Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI",
  "author": "Lex Fridman",
  "views": 5400231,
  "lengthSeconds": 8700,
  "thumbnail": "https://i.ytimg.com/vi/M98G.../maxresdefault.jpg",
  "transcriptLanguage": "English",
  "transcript": "Hello and welcome to the Lex Fridman podcast. Today my guest is Sam Altman. We discuss the future of artificial intelligence... (thousands of words of clean text)",
  "scrapedAt": "2026-04-30T18:00:00.000Z"
}
```

### ⚠️ Limitations & Good to Know

- **No Captions Available:** If a video does not have auto-generated or manual captions enabled by the creator, the `transcript` field will return `null`.
- **Private/Age-Restricted Videos:** Videos that require user login or age verification cannot be scraped by this actor.

# Actor input Schema

## `startUrls` (type: `array`):

Add one or more YouTube video URLs.

## `proxyConfiguration` (type: `object`):

Datacenter proxies work perfectly for this hidden API.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    }
  ],
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
        }
    ],
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("tyegen/universal-youtube-transcript-extractor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" }],
    "proxyConfiguration": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("tyegen/universal-youtube-transcript-extractor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    }
  ],
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}' |
apify call tyegen/universal-youtube-transcript-extractor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=tyegen/universal-youtube-transcript-extractor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Lightning YouTube Scraper (Transcript & Metadata)",
        "description": "Extract full transcripts (subtitles) and metadata from YouTube videos instantly without opening a browser. Perfect for AI, LLMs, and content summarization.",
        "version": "1.0",
        "x-build-id": "WDL6VmmTMrM9PwMyV"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/tyegen~universal-youtube-transcript-extractor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-tyegen-universal-youtube-transcript-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/tyegen~universal-youtube-transcript-extractor/runs": {
            "post": {
                "operationId": "runs-sync-tyegen-universal-youtube-transcript-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/tyegen~universal-youtube-transcript-extractor/run-sync": {
            "post": {
                "operationId": "run-sync-tyegen-universal-youtube-transcript-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "YouTube Video URLs",
                        "type": "array",
                        "description": "Add one or more YouTube video URLs.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Datacenter proxies work perfectly for this hidden API."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
