# YouTube Transcript Extractor (`junipr/youtube-transcript-extractor`) Actor

Extract YouTube transcripts in text, SRT, VTT, or JSON. Auto-generated and manual captions in any language. Video metadata: title, channel, views, duration. Innertube API with residential proxy. Batch videos.

- **URL**: https://apify.com/junipr/youtube-transcript-extractor.md
- **Developed by:** [junipr](https://apify.com/junipr) (community)
- **Categories:** Videos, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.25 / 1,000 transcript extracteds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## YouTube Transcript Extractor

Extract transcripts, captions, and subtitles from any YouTube video. Get full text, timed segments with timestamps, video metadata, channel info, and word counts. Supports multiple languages, auto-generated captions, and SRT/VTT/JSON/plain text output formats. Batch process up to 500 videos per run.

### Features

- Extract transcripts from any public YouTube video with captions enabled
- Multiple output formats: plain text, SRT subtitles, WebVTT subtitles, or raw JSON segments
- Timed segments with start time and duration for each text block
- Full video metadata: title, channel name, channel URL, duration, view count, publish date
- Multi-language support with automatic fallback to available languages
- Auto-generated caption support — falls back to YouTube's auto-captions when manual captions are unavailable
- Batch processing — extract transcripts from up to 500 videos in a single run
- Word count per video for content analysis
- Accepts all YouTube URL formats: standard, short (youtu.be), embed, and shorts
- Zero-config — works out of the box with sensible defaults
- Pay-per-event pricing — only pay for transcripts successfully extracted

### Proxy Requirements

This actor requires residential proxies because YouTube blocks datacenter IP addresses.

- **Paid Apify plan users** ($49+/month): Works automatically with the default residential proxy configuration. No setup needed.
- **Free plan users**: Provide your own residential proxy URL in the Proxy Configuration input field. Free Apify plans only include datacenter proxies, which YouTube will block.
- Without a residential proxy, the actor will exit with a clear error message explaining what to do.

### Input

All fields are optional. The actor runs with defaults and requires no configuration.

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `urls` | string[] | Example video | List of YouTube video URLs to extract transcripts from |
| `language` | string | `"en"` | Preferred transcript language code (e.g., en, es, fr, de, ja) |
| `includeTimestamps` | boolean | `true` | Include timed segments with start time and duration |
| `includeAutoGenerated` | boolean | `true` | Fall back to auto-generated captions if manual captions are unavailable |
| `outputFormat` | string | `"text"` | Output format: text, srt, vtt, or json |
| `maxVideos` | integer | `50` | Maximum videos to process (1-500) |
| `proxyConfiguration` | object | Apify residential | Proxy settings. Defaults to Apify residential proxy (requires paid plan) |

### Output

Each result is one video stored in the default dataset.

| Field | Type | Description |
|-------|------|-------------|
| `videoUrl` | string | Canonical YouTube URL |
| `videoId` | string | YouTube video ID |
| `title` | string or null | Video title |
| `channelName` | string or null | Channel name |
| `channelUrl` | string or null | Channel URL |
| `duration` | string or null | Video duration in ISO 8601 format (e.g., PT5M30S) |
| `viewCount` | number or null | Total view count |
| `publishedAt` | string or null | Publish date |
| `language` | string or null | Language code of the extracted transcript |
| `isAutoGenerated` | boolean or null | Whether the transcript is auto-generated by YouTube |
| `transcript` | string or null | Full transcript text in the requested format |
| `segments` | array or null | Timed segments with text, start, and duration (when includeTimestamps is true) |
| `wordCount` | number or null | Total word count of the transcript |
| `error` | string or null | Error message if extraction failed for this video |
| `scrapedAt` | string | ISO 8601 timestamp of extraction |

#### Output Example

```json
{
  "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "videoId": "dQw4w9WgXcQ",
  "title": "Rick Astley - Never Gonna Give You Up",
  "channelName": "Rick Astley",
  "channelUrl": "https://www.youtube.com/channel/UCuAXFkgsw1L7xaCfnd5JJOw",
  "duration": "PT3M33S",
  "viewCount": 1500000000,
  "publishedAt": "2009-10-25",
  "language": "en",
  "isAutoGenerated": true,
  "transcript": "We're no strangers to love You know the rules and so do I...",
  "segments": [
    { "text": "We're no strangers to love", "start": 18.0, "duration": 3.5 },
    { "text": "You know the rules and so do I", "start": 21.5, "duration": 3.0 }
  ],
  "wordCount": 287,
  "error": null,
  "scrapedAt": "2025-06-01T12:00:00.000Z"
}
````

### Use Cases

- **Content repurposing** — convert video content into blog posts, articles, or social media copy
- **SEO and keyword research** — analyze transcript text for keywords and topic coverage
- **Accessibility** — generate text versions of video content for hearing-impaired users
- **Research and analysis** — extract and analyze spoken content from educational or news videos
- **Subtitle generation** — export SRT or VTT files for use in video editors or players
- **AI and NLP pipelines** — feed transcript text into summarization, sentiment analysis, or embedding models
- **Content monitoring** — track what competitors or influencers are saying in their videos

### Integrations

Connect this actor with other tools in your workflow:

- **Apify API** — trigger runs programmatically and retrieve results via REST API
- **Webhooks** — get notified when extraction completes
- **Scheduling** — set up recurring runs to monitor new video transcripts
- **Apify integrations** — connect to Google Sheets, Slack, Zapier, Make, and more

### Pricing

This actor uses Pay-Per-Event (PPE) pricing: **$3.20 per 1,000 transcripts extracted** ($0.0032 per event).

Pricing includes all platform compute costs — no hidden fees.

### FAQ

#### What YouTube URL formats are supported?

The actor accepts standard watch URLs (youtube.com/watch?v=), short URLs (youtu.be/), embed URLs (youtube.com/embed/), shorts URLs (youtube.com/shorts/), and bare video IDs.

#### What if a video has no captions?

The actor reports a clear error message for that video and continues processing the remaining videos. You are not charged for videos without available captions.

#### Why does this actor need a residential proxy?

YouTube actively blocks requests from datacenter IP addresses. Residential proxies route requests through real ISP connections, which YouTube does not block. Without a residential proxy, requests will fail with 403 errors.

#### Can I use this on Apify's free plan?

Yes, but you need to provide your own residential proxy URL in the Proxy Configuration input. The free Apify plan only includes datacenter proxies, which YouTube blocks. The actor will exit with a helpful error message if no residential proxy is available.

# Actor input Schema

## `urls` (type: `array`):

List of YouTube video URLs to extract transcripts from. Supports standard (youtube.com/watch?v=), short (youtu.be/), and embed (youtube.com/embed/) URL formats.

## `language` (type: `string`):

Preferred transcript language code (e.g., 'en', 'es', 'fr', 'de', 'ja'). Falls back to available language if preferred is not found.

## `includeTimestamps` (type: `boolean`):

Include timed segments with start time and duration for each text segment.

## `includeAutoGenerated` (type: `boolean`):

Fall back to auto-generated captions if manual captions are not available.

## `outputFormat` (type: `string`):

Format for the transcript text. 'text' = plain text, 'srt' = SubRip subtitle format, 'vtt' = WebVTT subtitle format, 'json' = raw JSON segments.

## `maxVideos` (type: `integer`):

Maximum number of videos to process from the URL list.

## `proxyConfiguration` (type: `object`):

Residential proxy is recommended — YouTube blocks datacenter IPs. Defaults to Apify residential proxy (requires paid Apify plan). Free-plan users can provide their own residential proxy URL.

## Actor input object example

```json
{
  "urls": [
    "https://www.youtube.com/watch?v=UF8uR6Z6KLc"
  ],
  "language": "en",
  "includeTimestamps": true,
  "includeAutoGenerated": true,
  "outputFormat": "text",
  "maxVideos": 50,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}
```

# Actor output Schema

## `results` (type: `string`):

YouTube video transcripts with segments, timestamps, metadata, channel info, and word counts.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("junipr/youtube-transcript-extractor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("junipr/youtube-transcript-extractor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call junipr/youtube-transcript-extractor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=junipr/youtube-transcript-extractor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "YouTube Transcript Extractor",
        "description": "Extract YouTube transcripts in text, SRT, VTT, or JSON. Auto-generated and manual captions in any language. Video metadata: title, channel, views, duration. Innertube API with residential proxy. Batch videos.",
        "version": "1.0",
        "x-build-id": "bjOcybhePCapEQTaB"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/junipr~youtube-transcript-extractor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-junipr-youtube-transcript-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/junipr~youtube-transcript-extractor/runs": {
            "post": {
                "operationId": "runs-sync-junipr-youtube-transcript-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/junipr~youtube-transcript-extractor/run-sync": {
            "post": {
                "operationId": "run-sync-junipr-youtube-transcript-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "urls": {
                        "title": "YouTube Video URLs",
                        "type": "array",
                        "description": "List of YouTube video URLs to extract transcripts from. Supports standard (youtube.com/watch?v=), short (youtu.be/), and embed (youtube.com/embed/) URL formats.",
                        "items": {
                            "type": "string"
                        },
                        "default": [
                            "https://www.youtube.com/watch?v=UF8uR6Z6KLc"
                        ]
                    },
                    "language": {
                        "title": "Preferred Language",
                        "type": "string",
                        "description": "Preferred transcript language code (e.g., 'en', 'es', 'fr', 'de', 'ja'). Falls back to available language if preferred is not found.",
                        "default": "en"
                    },
                    "includeTimestamps": {
                        "title": "Include Timestamps",
                        "type": "boolean",
                        "description": "Include timed segments with start time and duration for each text segment.",
                        "default": true
                    },
                    "includeAutoGenerated": {
                        "title": "Include Auto-Generated Captions",
                        "type": "boolean",
                        "description": "Fall back to auto-generated captions if manual captions are not available.",
                        "default": true
                    },
                    "outputFormat": {
                        "title": "Output Format",
                        "enum": [
                            "text",
                            "srt",
                            "vtt",
                            "json"
                        ],
                        "type": "string",
                        "description": "Format for the transcript text. 'text' = plain text, 'srt' = SubRip subtitle format, 'vtt' = WebVTT subtitle format, 'json' = raw JSON segments.",
                        "default": "text"
                    },
                    "maxVideos": {
                        "title": "Max Videos",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Maximum number of videos to process from the URL list.",
                        "default": 50
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Residential proxy is recommended — YouTube blocks datacenter IPs. Defaults to Apify residential proxy (requires paid Apify plan). Free-plan users can provide their own residential proxy URL.",
                        "default": {
                            "useApifyProxy": true,
                            "apifyProxyGroups": [
                                "RESIDENTIAL"
                            ]
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
