# YouTube Transcript Scraper (`taroyamada/youtube-transcript-bulk-api`) Actor

Extract YouTube captions, timestamps, SRT, VTT, and plain text from public videos in bulk without browser automation.

- **URL**: https://apify.com/taroyamada/youtube-transcript-bulk-api.md
- **Developed by:** [太郎 山田](https://apify.com/taroyamada) (community)
- **Categories:** Videos, AI, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.50 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## YouTube Transcript Bulk API

Extract transcripts from public YouTube videos in bulk. The actor is built for AI pipelines, content repurposing, subtitle export, research, and searchable video archives.

### What It Does

You provide YouTube video URLs or direct video IDs. The actor fetches the public YouTube watch page, reads available caption tracks, selects the best matching language, downloads the timed transcript XML, and returns one dataset row per video.

The launch implementation is HTTP-first and does not use browser automation. That keeps Apify hosting cost low and makes the pricing predictable.

### Input

- `videoUrls`: YouTube watch, Shorts, embed, live, or youtu.be URLs.
- `videoIds`: Direct 11-character YouTube video IDs.
- `language`: Preferred caption language such as `en` or `ja`.
- `includeAutoGenerated`: Allows auto-generated captions when manual captions are not available.
- `translationLanguage`: Optional YouTube transcript translation target.
- `outputFormat`: `json`, `text`, `srt`, or `vtt`.
- `maxVideos`: Maximum videos to process.
- `dryRun`: Validate input and emit preview rows without fetching YouTube.

### Output

Each video produces one row:

- `videoId`
- `videoUrl`
- `status`
- `language`
- `sourceLanguage`
- `isAutoGenerated`
- `segmentCount`
- `fullText`
- `segments`
- `formattedTranscript`
- `errorCode`
- `errorMessage`
- `scrapedAt`

Unavailable captions, deleted videos, private videos, and request failures are returned as error rows instead of failing the full run. This follows Apify PPE best practice because the actor still performed work for that input.

### Pricing

Recommended PPE launch target:

- `apify-actor-start`: keep Apify default `$0.00005`.
- `apify-default-dataset-item`: `$0.0025` per transcript row.
- Optional future enriched/translated event: `$0.008` per enriched row.

The current cost model assumes HTTP requests, no browser, and no residential proxy. Publication should remain blocked if live cost probes show that residential proxy is required.

### Limits

- Only public videos with public caption tracks are supported.
- Age-restricted, private, deleted, or captionless videos return an error row.
- YouTube may change its watch page payload shape. The canary should run daily against a known captioned video.
- Channel and playlist expansion is intentionally not part of v1. Add it only after transcript extraction has 30-day revenue signal.

### Local Run

```bash
npm test
npm start
````

The default `input.json` uses `dryRun: true` so local startup does not depend on live YouTube access.

# Actor input Schema

## `videoUrls` (type: `array`):

YouTube watch, Shorts, embed, or youtu.be URLs.

## `videoIds` (type: `array`):

Direct 11-character YouTube video IDs.

## `language` (type: `string`):

Preferred caption language code, e.g. en, ja, es. Falls back to the first available track.

## `includeAutoGenerated` (type: `boolean`):

Allow YouTube auto-generated captions when manual captions are unavailable.

## `translationLanguage` (type: `string`):

Optional YouTube translation target language code. Leave blank to return the original caption track.

## `outputFormat` (type: `string`):

Extra formatted transcript field to include with each row.

## `maxVideos` (type: `integer`):

Maximum number of videos to process in one run.

## `timeoutMs` (type: `integer`):

Request timeout in milliseconds.

## `dryRun` (type: `boolean`):

Validate inputs and emit preview rows without fetching YouTube.

## Actor input object example

```json
{
  "videoUrls": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  ],
  "videoIds": [],
  "language": "en",
  "includeAutoGenerated": true,
  "translationLanguage": "",
  "outputFormat": "json",
  "maxVideos": 100,
  "timeoutMs": 15000,
  "dryRun": false
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "videoUrls": [
        "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    ],
    "videoIds": []
};

// Run the Actor and wait for it to finish
const run = await client.actor("taroyamada/youtube-transcript-bulk-api").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
    "videoIds": [],
}

# Run the Actor and wait for it to finish
run = client.actor("taroyamada/youtube-transcript-bulk-api").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "videoUrls": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  ],
  "videoIds": []
}' |
apify call taroyamada/youtube-transcript-bulk-api --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=taroyamada/youtube-transcript-bulk-api",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "YouTube Transcript Scraper",
        "description": "Extract YouTube captions, timestamps, SRT, VTT, and plain text from public videos in bulk without browser automation.",
        "version": "0.1",
        "x-build-id": "l8UwsSVw6ZC3BW4SS"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/taroyamada~youtube-transcript-bulk-api/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-taroyamada-youtube-transcript-bulk-api",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/taroyamada~youtube-transcript-bulk-api/runs": {
            "post": {
                "operationId": "runs-sync-taroyamada-youtube-transcript-bulk-api",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/taroyamada~youtube-transcript-bulk-api/run-sync": {
            "post": {
                "operationId": "run-sync-taroyamada-youtube-transcript-bulk-api",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "videoUrls": {
                        "title": "YouTube Video URLs",
                        "type": "array",
                        "description": "YouTube watch, Shorts, embed, or youtu.be URLs.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "videoIds": {
                        "title": "Video IDs",
                        "type": "array",
                        "description": "Direct 11-character YouTube video IDs.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "language": {
                        "title": "Preferred Caption Language",
                        "type": "string",
                        "description": "Preferred caption language code, e.g. en, ja, es. Falls back to the first available track.",
                        "default": "en"
                    },
                    "includeAutoGenerated": {
                        "title": "Include Auto-generated Captions",
                        "type": "boolean",
                        "description": "Allow YouTube auto-generated captions when manual captions are unavailable.",
                        "default": true
                    },
                    "translationLanguage": {
                        "title": "Translation Language",
                        "type": "string",
                        "description": "Optional YouTube translation target language code. Leave blank to return the original caption track.",
                        "default": ""
                    },
                    "outputFormat": {
                        "title": "Output Format",
                        "enum": [
                            "json",
                            "text",
                            "srt",
                            "vtt"
                        ],
                        "type": "string",
                        "description": "Extra formatted transcript field to include with each row.",
                        "default": "json"
                    },
                    "maxVideos": {
                        "title": "Max Videos",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of videos to process in one run.",
                        "default": 100
                    },
                    "timeoutMs": {
                        "title": "HTTP Timeout",
                        "minimum": 3000,
                        "maximum": 60000,
                        "type": "integer",
                        "description": "Request timeout in milliseconds.",
                        "default": 15000
                    },
                    "dryRun": {
                        "title": "Dry Run",
                        "type": "boolean",
                        "description": "Validate inputs and emit preview rows without fetching YouTube.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
