# YouTube Transcript Scraper - Captions & Auto Subtitles (`nominated_tupelo/yt-transcript-scraper`) Actor

Extract transcripts and subtitles from any YouTube video. Supports auto-generated captions, manual subtitles, multiple languages, and batch video processing. No API key required.

- **URL**: https://apify.com/nominated\_tupelo/yt-transcript-scraper.md
- **Developed by:** [kade](https://apify.com/nominated_tupelo) (community)
- **Categories:** AI, Videos, Developer tools
- **Stats:** 2 total users, 1 monthly users, 66.7% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## YouTube Transcript Scraper — Captions, Subtitles & Auto-Generated Text

Extract the full transcript from any YouTube video in seconds. Supports **auto-generated captions**, **manual subtitles**, **multiple languages**, and **batch processing** of hundreds of videos in one run. No API key. No browser. No login required.

Built for AI/LLM workflows, content research, accessibility tooling, and data science pipelines.

### Why use YouTube Transcript Scraper?

- **AI & LLM ready** — outputs clean full-text transcripts perfect for feeding into ChatGPT, Claude, RAG pipelines, and vector databases
- **Auto-generated captions** — works even when no manual subtitles exist (covers 99% of videos)
- **Multi-language** — specify language priority: `["en", "es", "fr"]` and the actor tries each in order
- **Timestamped segments** — get `{text, start, duration}` arrays for subtitle editors, clipping tools, search indexing
- **Batch mode** — process 1 to 1,000+ videos in a single run
- **Zero auth** — uses YouTube's public caption API, no login or API key needed

### Use cases

- **LLM context extraction** — pull transcript of any video to summarize, translate, or Q&A with an LLM
- **Content research** — analyze what top YouTubers say about a topic across hundreds of videos
- **SEO & keyword mining** — extract spoken content from video for transcription-based SEO
- **Accessibility** — generate caption data for videos missing subtitles
- **Competitive intelligence** — monitor what competitors say in video content
- **Training data** — build speech/NLP datasets from high-quality human-narrated video

### How to use

1. Open the **Input** tab
2. Paste one or more YouTube URLs or video IDs into **Video URLs or IDs**
3. Set your preferred **language** (default: `en`)
4. Choose **output format**: full text (for LLMs), timestamped segments (for editors), or both
5. Click **Start** — transcripts appear in the **Output** tab in seconds

### Input

| Parameter | Type | Default | Description |
|---|---|---|---|
| `videoUrls` | string[] | — | YouTube URLs or video IDs (e.g. `https://youtube.com/watch?v=dQw4w9WgXcQ` or `dQw4w9WgXcQ`) |
| `languages` | string[] | `["en"]` | Language priority list. Tries each in order. Falls back to any available transcript. |
| `includeAutoGenerated` | boolean | `true` | Include YouTube's auto-generated captions if no manual transcript exists |
| `outputFormat` | string | `"both"` | `"full_text"` (single string), `"timestamped"` (segments array), or `"both"` |
| `maxVideos` | integer | `0` | Cap how many videos to process (0 = unlimited) |

### Example input

```json
{
  "videoUrls": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://youtu.be/jNQXAC9IVRw"
  ],
  "languages": ["en", "en-US"],
  "includeAutoGenerated": true,
  "outputFormat": "both"
}
````

### Output

Each processed video produces one output item:

```json
{
  "videoId": "dQw4w9WgXcQ",
  "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "success": true,
  "language": "en",
  "isAutoGenerated": false,
  "segmentCount": 61,
  "transcript": "We're no strangers to love. You know the rules and so do I...",
  "segments": [
    { "text": "We're no strangers to love.", "start": 18.64, "duration": 3.24 },
    { "text": "You know the rules and so do I.", "start": 22.64, "duration": 4.32 }
  ]
}
```

When a transcript cannot be fetched, `success` is `false` and `error` explains why (disabled, private video, no captions available, etc).

### Output fields

| Field | Type | Description |
|---|---|---|
| `videoId` | string | YouTube video ID |
| `videoUrl` | string | Full YouTube URL |
| `success` | boolean | Whether the transcript was fetched successfully |
| `error` | string | null | Error message if failed |
| `language` | string | Language code of the returned transcript (e.g. `en`, `es`) |
| `isAutoGenerated` | boolean | Whether this is an auto-generated caption or a manual transcript |
| `segmentCount` | integer | Number of caption segments |
| `transcript` | string | Full text joined from all segments (outputFormat: full\_text or both) |
| `segments` | array | Timestamped caption segments (outputFormat: timestamped or both) |

### Supported URL formats

All of these work:

- `https://www.youtube.com/watch?v=dQw4w9WgXcQ`
- `https://youtu.be/dQw4w9WgXcQ`
- `https://www.youtube.com/shorts/dQw4w9WgXcQ`
- `https://www.youtube.com/embed/dQw4w9WgXcQ`
- `dQw4w9WgXcQ` (bare video ID)

### Pricing

This actor uses the **Pay Per Event** model — you pay per video processed, not per run.

- **10 videos**: ~$0.01
- **100 videos**: ~$0.10
- **1,000 videos**: ~$1.00

Transcripts are fetched via YouTube's caption API — no proxies or browsers needed, so runs are fast and cheap.

### Tips

- **For LLMs**: use `outputFormat: "full_text"` — clean text with no timestamps, ready to paste into any AI prompt
- **For subtitle editing**: use `outputFormat: "timestamped"` — each segment has `start` and `duration` in seconds
- **Multi-language**: set `languages: ["en", "auto"]` — `auto` is not a real code, so it triggers fallback to auto-generated
- **Batch processing**: dump a list of 100 video IDs and run once — much cheaper than 100 separate runs
- **Private/unavailable videos**: these return `success: false` with a descriptive error, the run still completes cleanly

### FAQ

**Does this work without a YouTube account?** Yes. No login or API key required.

**Does it work with auto-generated captions (videos with no manual subtitles)?** Yes — set `includeAutoGenerated: true` (default).

**What if the video has no captions at all?** The actor returns `success: false` with `error: "Transcripts are disabled"` and continues to the next video.

**Can I get transcripts in Spanish, French, Japanese, etc.?** Yes — set `languages: ["es"]` or whatever language you need. If no Spanish transcript exists it falls back through your priority list or to any available transcript.

**Is this against YouTube's Terms of Service?** This actor uses YouTube's public caption API endpoints. Transcripts are publicly accessible and indexed by search engines. Always ensure your use case complies with applicable data protection laws.

**Something not working?** Open an issue on the Issues tab with the video ID and error message.

# Actor input Schema

## `videoUrls` (type: `array`):

YouTube video URLs or video IDs to extract transcripts from. Accepts full URLs (https://youtube.com/watch?v=...) or plain video IDs (dQw4w9WgXcQ).

## `languages` (type: `array`):

Language codes to prefer, in priority order (e.g. \['en', 'es', 'fr']). If not found, falls back to any available transcript.

## `includeAutoGenerated` (type: `boolean`):

Whether to include YouTube's auto-generated captions when manual transcripts are unavailable.

## `outputFormat` (type: `string`):

How to format the transcript output.

## `maxVideos` (type: `integer`):

Maximum number of videos to process (0 = no limit). Useful for large channel exports.

## Actor input object example

```json
{
  "videoUrls": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  ],
  "languages": [
    "en"
  ],
  "includeAutoGenerated": true,
  "outputFormat": "both",
  "maxVideos": 0
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "videoUrls": [
        "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("nominated_tupelo/yt-transcript-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"] }

# Run the Actor and wait for it to finish
run = client.actor("nominated_tupelo/yt-transcript-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "videoUrls": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  ]
}' |
apify call nominated_tupelo/yt-transcript-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=nominated_tupelo/yt-transcript-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "YouTube Transcript Scraper - Captions & Auto Subtitles",
        "description": "Extract transcripts and subtitles from any YouTube video. Supports auto-generated captions, manual subtitles, multiple languages, and batch video processing. No API key required.",
        "version": "1.0",
        "x-build-id": "gAriPRCVRlmniUkLb"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/nominated_tupelo~yt-transcript-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-nominated_tupelo-yt-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/nominated_tupelo~yt-transcript-scraper/runs": {
            "post": {
                "operationId": "runs-sync-nominated_tupelo-yt-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/nominated_tupelo~yt-transcript-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-nominated_tupelo-yt-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "videoUrls": {
                        "title": "Video URLs or IDs",
                        "type": "array",
                        "description": "YouTube video URLs or video IDs to extract transcripts from. Accepts full URLs (https://youtube.com/watch?v=...) or plain video IDs (dQw4w9WgXcQ).",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "languages": {
                        "title": "Preferred Languages",
                        "type": "array",
                        "description": "Language codes to prefer, in priority order (e.g. ['en', 'es', 'fr']). If not found, falls back to any available transcript.",
                        "default": [
                            "en"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "includeAutoGenerated": {
                        "title": "Include Auto-Generated Captions",
                        "type": "boolean",
                        "description": "Whether to include YouTube's auto-generated captions when manual transcripts are unavailable.",
                        "default": true
                    },
                    "outputFormat": {
                        "title": "Output Format",
                        "enum": [
                            "full_text",
                            "timestamped",
                            "both"
                        ],
                        "type": "string",
                        "description": "How to format the transcript output.",
                        "default": "both"
                    },
                    "maxVideos": {
                        "title": "Max Videos",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of videos to process (0 = no limit). Useful for large channel exports.",
                        "default": 0
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
