# YouTube Transcript Scraper (`api-ninja/youtube-transcript-scraper`) Actor

🎬 Turn YouTube videos into clean transcript data in seconds.

- **URL**: https://apify.com/api-ninja/youtube-transcript-scraper.md
- **Developed by:** [API ninja](https://apify.com/api-ninja) (community)
- **Categories:** AI, Developer tools, Automation
- **Stats:** 3 total users, 2 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

### What does YouTube Video Transcript do?

**YouTube Video Transcript** turns **YouTube video URLs or video IDs into structured transcript data** in a single run. Paste one or more video links from [YouTube](https://www.youtube.com/) or raw IDs, optionally choose a transcript language, and the Actor will return transcript lines, selected language details, and available language options in a clean dataset.

This Actor is built for the **Apify platform**, so you get more than just transcript extraction. You can run it manually, automate it via API, schedule recurring jobs, monitor logs, and plug the output into other workflows and integrations. It is designed to be simple for first-time users but reliable enough for repeated production runs.

#### What can this Actor do?

- 🎬 Accept full YouTube video URLs or plain video IDs.
- 🧠 Extract transcript data in a structured JSON format.
- 🌍 Request a specific transcript language when available.
- ⚡ Process multiple videos in one run.
- 🔁 Retry temporary failures automatically.
- 📦 Save all results to an Apify dataset for export or downstream automation.

### Why use YouTube Video Transcript?

If you need transcript data quickly, this Actor removes the repetitive setup work. You do not need to normalize video links, process items one by one, or build your own retry logic before you can start collecting usable output.

Common use cases:

- Content research and topic analysis
- LLM, RAG, and summarization pipelines
- Subtitle and transcript availability checks
- Internal knowledge base enrichment
- Monitoring transcripts across many videos over time

### How to get YouTube video transcripts

1. Open the Actor in Apify Console.
2. Paste one or more YouTube video URLs or raw video IDs into the `urls` field.
3. Optionally set the `language` field with a two-letter language code like `en` or `es`.
4. Start the run.
5. Open the dataset after the run finishes to review or download the results.

The Actor is suitable for both quick one-off runs and recurring automated jobs. Because it runs on Apify, you can connect it to schedules, webhooks, integrations, and API-based workflows without changing the input format.

### Input

YouTube Video Transcript has a small input schema designed for fast first runs. Use the **Input** tab in Apify Console for full field details.

```json
{
  "urls": [
    "https://www.youtube.com/watch?v=_ZW5o1VegRI",
    "_ZW5o1VegRI"
  ],
  "language": "en"
}
````

#### Input fields

- `urls` - Required. A list of YouTube video URLs or raw video IDs.
- `language` - Optional. A two-letter transcript language code such as `en`, `es`, `fr`, or `de`.

### Output

The Actor returns one dataset item per input value. Successful items include the transcript itself plus metadata about the selected language and other language options. If a video cannot be processed, the Actor still stores a failure record so you can easily see which inputs need attention.

You can download the dataset extracted by YouTube Video Transcript in various formats such as JSON, HTML, CSV, or Excel.

```json
[
  {
    "originalInput": "https://www.youtube.com/watch?v=_AbFXuGDRTs",
    "videoId": "_AbFXuGDRTs",
    "requestedLanguage": "en",
    "success": true,
    "transcriptCount": 42,
    "id": "_AbFXuGDRTs",
    "transcript": [
      {
        "text": "Hello and welcome",
        "start": 0.12,
        "duration": 1.84
      }
    ],
    "selected": {
      "language": "English"
    },
    "languageMenu": [
      {
        "language": "English",
        "languageCode": "en"
      }
    ]
  }
]
```

### Data table

| Field | Type | Description |
| --- | --- | --- |
| `originalInput` | string | Original input value from the run |
| `videoId` | string | Normalized YouTube video ID |
| `requestedLanguage` | string or null | Requested transcript language |
| `success` | boolean | Whether transcript retrieval succeeded |
| `transcriptCount` | number | Number of transcript lines returned |
| `error` | string or null | Error message for failed items |
| `transcript` | array | Transcript lines with timing data |
| `selected` | object or null | Selected transcript/language details |
| `languageMenu` | array | Other language options returned for the video |

### Pricing / Cost estimation

How much does it cost to extract YouTube transcripts?

This Actor is lightweight because it does not run a browser. That makes it a good fit for cost-efficient transcript extraction at scale. Your final usage cost depends mostly on how many videos you process per run and how often you run the Actor.

To keep runs efficient:

- Start with a small sample first.
- Only request a language when you need one.
- Avoid sending duplicate videos unless you intentionally want duplicate output.

### Tips or Advanced options

- Use raw video IDs if you already have them in a spreadsheet or database.
- Batch multiple videos into one run instead of launching many tiny runs.
- Review failed items in the dataset instead of guessing which inputs need a rerun.
- Use Apify scheduling and API access if you need recurring transcript collection.

### FAQ, disclaimers, and support

#### Does this Actor download the video or audio?

No. It only returns transcript-related data for the supplied YouTube videos.

#### What if a transcript is unavailable?

The Actor retries temporary failures automatically and records failed items in the dataset with `success: false` and an error message.

#### Is it legal to extract YouTube transcript data?

You are responsible for using this Actor in compliance with YouTube terms and applicable laws. If your output contains personal data, make sure you have a legitimate reason to process it and consult legal counsel if needed.

#### Where can I get help or request improvements?

Use the Issues tab on the Actor page for feedback, bug reports, and feature requests. If you need custom output fields or workflow changes, this Actor can be extended further.

# Actor input Schema

## `urls` (type: `array`):

Add one or more YouTube video URLs or raw video IDs. The Actor automatically extracts the video ID from full links.

## `language` (type: `string`):

Optional. Use a two-letter language code such as <code>en</code>, <code>es</code>, <code>fr</code>, or <code>de</code>. Leave empty to use the default transcript language available for the video.

## Actor input object example

```json
{
  "urls": [
    "https://www.youtube.com/watch?v=gN07gbipMoY",
    "gN07gbipMoY"
  ]
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        "https://www.youtube.com/watch?v=gN07gbipMoY",
        "gN07gbipMoY"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("api-ninja/youtube-transcript-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "urls": [
        "https://www.youtube.com/watch?v=gN07gbipMoY",
        "gN07gbipMoY",
    ] }

# Run the Actor and wait for it to finish
run = client.actor("api-ninja/youtube-transcript-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    "https://www.youtube.com/watch?v=gN07gbipMoY",
    "gN07gbipMoY"
  ]
}' |
apify call api-ninja/youtube-transcript-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=api-ninja/youtube-transcript-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "YouTube Transcript Scraper",
        "description": "🎬 Turn YouTube videos into clean transcript data in seconds.",
        "version": "0.0",
        "x-build-id": "OmpKJ4LjL9m5bXETW"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/api-ninja~youtube-transcript-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-api-ninja-youtube-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/api-ninja~youtube-transcript-scraper/runs": {
            "post": {
                "operationId": "runs-sync-api-ninja-youtube-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/api-ninja~youtube-transcript-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-api-ninja-youtube-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "urls"
                ],
                "properties": {
                    "urls": {
                        "title": "🔗 Video URLs or IDs",
                        "type": "array",
                        "description": "Add one or more YouTube video URLs or raw video IDs. The Actor automatically extracts the video ID from full links.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "language": {
                        "title": "🗣️ Transcript language",
                        "pattern": "^[a-z]{2}$",
                        "type": "string",
                        "description": "Optional. Use a two-letter language code such as <code>en</code>, <code>es</code>, <code>fr</code>, or <code>de</code>. Leave empty to use the default transcript language available for the video."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
