# YouTube Transcript Pro (`datavoid/youtube-transcript-pro`) Actor

Extract clean YouTube transcripts, closed captions, and rich video metadata into structured data. Perfect for AI pipelines, SEO, and content research. Cheap and no API key required!

- **URL**: https://apify.com/datavoid/youtube-transcript-pro.md
- **Developed by:** [Danny](https://apify.com/datavoid) (community)
- **Categories:** Automation, News, Videos
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

$1.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## YouTube Transcript Pro

Extract YouTube video transcripts, subtitles, captions, and video metadata in a clean Apify dataset. YouTube Transcript Pro is built for researchers, content teams, SEO workflows, AI pipelines, and anyone who needs YouTube spoken content as structured text at a competitive price.

Provide a YouTube video URL, choose the transcript language, and the Actor returns timestamped transcript segments, a plain text transcript, available caption languages, and useful video metadata such as title, uploader, views, duration, thumbnails, and publication date.

Simple pricing: $1.00 per 1,000 results.

### Key Features

- Extract complete YouTube transcripts from public caption tracks
- Get both timestamped transcript segments and one plain text transcript
- Choose a preferred transcript language from a readable dropdown, while the Actor receives clean codes such as `en`, `de`, `es`, or `pt-BR`
- Automatically falls back to an available caption track when the requested language is not available
- Supports creator-provided captions and auto-generated captions when YouTube exposes them
- Collects video metadata including title, description, uploader details, views, duration, thumbnails, and publish date
- Works with common YouTube URL formats, including watch URLs, youtu.be links, Shorts, embedded URLs, and youtube-nocookie URLs
- Saves results as structured JSON that can be exported from Apify as JSON, CSV, Excel, XML, RSS, or HTML
- No YouTube API key required

### Use Cases

- Content research: turn videos into text for review, analysis, and note-taking
- SEO and marketing: extract spoken keywords, topics, and competitor video metadata
- AI and automation: feed transcripts into summarizers, RAG systems, chatbots, or classification workflows
- Journalism and media monitoring: quickly search and reference spoken content
- Accessibility: create text versions of video content where captions are available
- Data analysis: build structured datasets from YouTube video captions
- Content repurposing: convert video transcripts into briefs, blog drafts, newsletters, or social posts

### How to Use

1. Open the Actor on Apify.
2. Paste a YouTube video URL into `youtube_url`.
3. Optionally choose a preferred transcript language from the dropdown.
4. Click Start.
5. Open the dataset and export the results in your preferred format.

### Input

Provide a YouTube video URL. The language field is optional.

| Field | Type | Description | Required |
| --- | --- | --- | --- |
| `youtube_url` | String | Full YouTube video URL. Supports regular videos, Shorts, youtu.be links, embedded URLs, live replay URLs, and youtube-nocookie URLs. | Yes for video runs |
| `language` | String | Optional preferred transcript language. The Apify UI shows readable language names and sends the language code to the Actor. Defaults to `en`. You can also type a custom YouTube/BCP-47 code if needed. | No |

#### Example Input

```json
{
  "youtube_url": "https://www.youtube.com/watch?v=CMNry4PE93Y",
  "language": "en"
}
````

### Output

The Actor stores one dataset item per processed video. A successful result includes transcript data, video metadata, language information, and status fields.

| Field | Type | Description |
| --- | --- | --- |
| `video_id` | String | YouTube video ID |
| `url` | String | Normalized YouTube video URL |
| `title` | String | Video title |
| `description` | String | Video description |
| `channel_id` | String | YouTube channel ID |
| `channel_name` | String | Channel name |
| `channel_username` | String | Channel handle or username when available |
| `channel_thumbnail` | String | Channel thumbnail URL when available |
| `subscriber_count` | Number | Subscriber count when visible |
| `view_count` | Number | Video view count |
| `like_count` | Number | Like count when visible |
| `comment_count` | Number | Comment count when visible |
| `duration_seconds` | Number | Video duration in seconds |
| `published_at` | String | Video publication date when available |
| `timestamp` | Number | Publication date as a Unix timestamp when available |
| `thumbnail` | String | Video thumbnail URL |
| `transcript` | Array | Timestamped transcript segments with `text`, `start`, `end`, and `duration` |
| `transcript_text` | String | Full transcript as a single plain text field |
| `language` | String | Language code used for the returned transcript |
| `available_languages` | Array | Caption languages found for the video |
| `selected_language` | String | Human-readable caption language selected by the Actor |
| `is_auto_generated` | Boolean | Whether the selected transcript is auto-generated |
| `geo_restrict` | String or null | Geographic restriction information when available |
| `status` | String | `success` or `error` |
| `message` | String | Human-readable status message |
| `error_code` | String | Error code, included only for error results |

Some YouTube metadata fields can be hidden, disabled, or unavailable on the public video page. When that happens, the Actor keeps the output shape stable by returning an empty string, `0`, or `null` depending on the field.

#### Sample Output

```json
{
  "channel_id": "UCPlpJXGoNKFYAGgqse4mQ2g",
  "channel_name": "CaptJax458",
  "channel_thumbnail": "https://yt3.ggpht.com/ytc/AIdro_mBVSntnsFTVjkUyckIXNURw8ap8E9xPAdPpx6O_vwnrg=s800-c-k-c0x00ffffff-no-rj",
  "channel_username": "CaptJax458",
  "subscriber_count": 20000,
  "comment_count": 0,
  "duration_seconds": 18,
  "language": "en",
  "like_count": 530179,
  "timestamp": 1181549064,
  "title": "Zombie Kid Likes Turtles",
  "transcript": [
    {
      "duration": 4.641,
      "end": 5.04,
      "start": 0.399,
      "text": "back here live at the waterfront village"
    },
    {
      "duration": 4.159,
      "end": 6.879,
      "start": 2.72,
      "text": "with my friend the zombie jonathan"
    }
  ],
  "transcript_text": "back here live at the waterfront village ...",
  "published_at": "2007-06-11T08:04:24Z",
  "url": "https://www.youtube.com/watch?v=CMNry4PE93Y",
  "video_id": "CMNry4PE93Y",
  "view_count": 76586974,
  "geo_restrict": null,
  "status": "success",
  "message": "Successfully fetched the transcript for the video with ID 'CMNry4PE93Y'",
  "available_languages": ["English (auto-generated)"],
  "selected_language": "English (auto-generated)",
  "is_auto_generated": true,
  "description": "",
  "thumbnail": "https://i.ytimg.com/vi/CMNry4PE93Y/hqdefault.jpg"
}
```

### Error Output

If the video cannot be processed, the Actor still writes a dataset item with `status: "error"` so your workflow can handle failures cleanly.

```json
{
  "url": "https://www.youtube.com/watch?v=example",
  "language": "en",
  "status": "error",
  "error_code": "transcript_not_available",
  "message": "No transcript languages are available for this video."
}
```

Common error codes include:

- `invalid_input`: the input URL or settings are not valid
- `not_found`: the video is unavailable or cannot be found
- `transcript_disabled`: captions are disabled for the video
- `transcript_not_available`: YouTube does not expose a transcript for the video
- `language_not_available`: the requested transcript language is not available
- `rate_limited`: YouTube temporarily limited transcript requests
- `api_error`: the transcript request failed unexpectedly

### Pricing

YouTube Transcript Pro uses simple pay-per-result pricing:

- $1.00 per 1,000 results
- $0.001 per processed video result
- One saved dataset item counts as one result
- Each run processes one YouTube video URL

Example: processing 1,000 videos costs $1.00. Processing 10,000 videos costs $10.00.

### Troubleshooting

#### Invalid URL

Use a public YouTube video URL, such as:

- `https://www.youtube.com/watch?v=VIDEO_ID`
- `https://youtu.be/VIDEO_ID`
- `https://www.youtube.com/shorts/VIDEO_ID`
- `https://www.youtube-nocookie.com/embed/VIDEO_ID`

#### No Transcript Returned

Not every YouTube video has captions. Check that the video has subtitles or auto-generated captions available on YouTube. Private, deleted, restricted, or caption-disabled videos may return an error result.

#### Requested Language Not Returned

The Actor tries to use your requested language first. If it is not available, it selects another available caption track. Check `available_languages`, `selected_language`, and `is_auto_generated` in the output to see exactly what was returned.

#### Missing Likes, Comments, or Subscriber Counts

YouTube does not always expose every metric publicly. Some creators hide counts, comments can be disabled, and YouTube page data can vary by region or video type. Missing numeric fields are returned as `0`.

### FAQ

#### Does it work with YouTube Shorts?

Yes, if YouTube exposes captions for the Short.

#### Does it work with auto-generated captions?

Yes. The output includes `is_auto_generated` so you can tell whether the selected transcript came from YouTube auto-captions.

#### Do I need a YouTube API key?

No. The Actor reads public caption tracks and public video metadata.

#### What can I export?

You can export the dataset from Apify in formats such as JSON, CSV, Excel, XML, RSS, and HTML.

# Actor input Schema

## `youtube_url` (type: `string`):

Full YouTube video URL. Supports regular videos, Shorts, youtu.be links, embedded URLs, live replay URLs, and youtube-nocookie URLs.

## `language` (type: `string`):

Optional preferred transcript language. Choose a readable language option from the dropdown, or type a custom YouTube/BCP-47 language code if needed. The Actor receives the code, for example en or pt-BR.

## Actor input object example

```json
{
  "youtube_url": "https://www.youtube.com/watch?v=CMNry4PE93Y",
  "language": "en"
}
```

# Actor output Schema

## `results` (type: `string`):

Dataset items produced by this run.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "youtube_url": "https://www.youtube.com/watch?v=CMNry4PE93Y",
    "language": "en"
};

// Run the Actor and wait for it to finish
const run = await client.actor("datavoid/youtube-transcript-pro").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "youtube_url": "https://www.youtube.com/watch?v=CMNry4PE93Y",
    "language": "en",
}

# Run the Actor and wait for it to finish
run = client.actor("datavoid/youtube-transcript-pro").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "youtube_url": "https://www.youtube.com/watch?v=CMNry4PE93Y",
  "language": "en"
}' |
apify call datavoid/youtube-transcript-pro --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=datavoid/youtube-transcript-pro",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "YouTube Transcript Pro",
        "description": "Extract clean YouTube transcripts, closed captions, and rich video metadata into structured data. Perfect for AI pipelines, SEO, and content research. Cheap and no API key required!",
        "version": "0.2",
        "x-build-id": "Ip300gFd37J6t9d98"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/datavoid~youtube-transcript-pro/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-datavoid-youtube-transcript-pro",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/datavoid~youtube-transcript-pro/runs": {
            "post": {
                "operationId": "runs-sync-datavoid-youtube-transcript-pro",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/datavoid~youtube-transcript-pro/run-sync": {
            "post": {
                "operationId": "run-sync-datavoid-youtube-transcript-pro",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "youtube_url"
                ],
                "properties": {
                    "youtube_url": {
                        "title": "Video URL",
                        "pattern": "^https?://([^/]+\\.)?(youtube\\.com|youtube-nocookie\\.com|youtu\\.be)/.+$",
                        "type": "string",
                        "description": "Full YouTube video URL. Supports regular videos, Shorts, youtu.be links, embedded URLs, live replay URLs, and youtube-nocookie URLs."
                    },
                    "language": {
                        "title": "Transcript Language",
                        "type": "string",
                        "description": "Optional preferred transcript language. Choose a readable language option from the dropdown, or type a custom YouTube/BCP-47 language code if needed. The Actor receives the code, for example en or pt-BR.",
                        "default": "en"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
