# YouTube Transcript Scraper Goat (`goat255/youtube-transcript-scraper`) Actor

Extract transcripts and captions from public YouTube videos in bulk. Returns timed segments (start, duration, text), the language, whether captions are auto-generated, and an optional joined plain-text blob. Accepts watch URLs, youtu.be links, Shorts, or raw video ids. No login or cookies.

- **URL**: https://apify.com/goat255/youtube-transcript-scraper.md
- **Developed by:** [Goutam Soni](https://apify.com/goat255) (community)
- **Categories:** Social media, Videos, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## YouTube Transcript Scraper

Extract transcripts and captions from public YouTube videos in bulk. Give it a list of videos and get back clean transcript text for each one. No login, no cookies, no browser automation.

### What it does

For every video you provide, the actor returns:

- The transcript text, as ordered timed segments where available (each with a start time, duration, and text).
- A single joined plain-text version of the whole transcript (optional).
- The language the transcript was served in.
- Whether the captions are auto-generated or human-written.
- A flag telling you whether the result includes per-segment timing.
- A segment count for quick scanning.

It handles long videos correctly. A multi-hour video returns its complete transcript (thousands of segments), not just the first chunk. A tiered fetch keeps results coming even when one path is temporarily unavailable.

### Input

| Field | Type | Description |
|---|---|---|
| `videos` | array | List of YouTube videos. Accepts watch URLs, short `youtu.be` links, Shorts URLs, or raw 11-character video ids. Mix formats freely. |
| `preferredLanguages` | array | Ordered list of language codes to prefer (for example `en`, `es`, `de`, `ja`). The first available match wins. Default `["en"]`. |
| `includeTimestamps` | boolean | Include `start` and `duration` per segment. Default `true`. |
| `combineToPlainText` | boolean | Also emit a single `fullText` field with the whole transcript joined into one string. Default `false`. |
| `concurrency` | integer | Number of videos processed in parallel. Default `5`. |
| `proxyConfiguration` | object | Optional proxy. Transcripts work cookielessly without one for typical volumes. |

#### Example input

```json
{
  "videos": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://youtu.be/dQw4w9WgXcQ",
    "dQw4w9WgXcQ"
  ],
  "preferredLanguages": ["en", "es"],
  "includeTimestamps": true,
  "combineToPlainText": false
}
````

### Output

One dataset item per video. Fields are returned in a clean, predictable order.

```json
{
  "videoId": "dQw4w9WgXcQ",
  "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "title": "Example Video Title",
  "language": "en",
  "isAutoGenerated": false,
  "isTimed": true,
  "segmentCount": 61,
  "segments": [
    { "start": 18.64, "duration": 3.24, "text": "We're no strangers to love" },
    { "start": 22.64, "duration": 4.32, "text": "You know the rules and so do I" }
  ]
}
```

When `combineToPlainText` is enabled, each row also includes a `fullText` field with the entire transcript joined into one string.

#### Timed vs plain-text results

The `isTimed` flag tells you what kind of result you got:

- `isTimed: true` means the transcript came back with per-segment timing. The `segments` array has `start` and `duration` for each line.
- `isTimed: false` means only the transcript text was available. The full text is returned in `fullText` (and as a single text block in `segments`), with no per-line timing.

The row shape is identical either way, so you can process every row the same.

#### Videos without a transcript

If a video has no captions available, the actor returns a clean row with an `error` field set to `no_transcript_available` and an empty `segments` array, rather than failing the run. This keeps a bulk run intact even when a few inputs lack captions.

### Use cases

- Summarize or analyze video content with your own tools.
- Build searchable archives of a channel's spoken content.
- Feed transcripts into translation or repurposing workflows.
- Power research and content analysis across many videos at once.

### Notes

- Works on public videos that have captions (manual or auto-generated).
- Private, age-restricted, or members-only videos cannot be transcribed.
- Language selection prefers a human-written transcript over an auto-generated one in the same language.

### FAQ

#### Do I need a YouTube API key or login?

No. This actor extracts public transcripts without any API key, login, or cookies.

#### Which videos work?

Any public YouTube video that has captions or auto-generated transcripts available.

#### What happens if a video has no transcript?

The actor reports that cleanly for that video and continues with the rest of your list.

#### Can I get transcripts in a specific language?

Yes. Set preferredLanguages, and when a video provides multiple caption tracks the available languages are returned.

#### Can I get timestamps or one plain-text block?

Both. Use includeTimestamps for per-line timing, or combineToPlainText for a single clean transcript string.

### Related actors

Part of the scraper suite by goat255:

- [YouTube Channel Scraper](https://apify.com/goat255/youtube-channel-scraper) - a channel's videos with metadata.
- [YouTube Comments Scraper](https://apify.com/goat255/youtube-comments-scraper) - comments from any video.

# Actor input Schema

## `videos` (type: `array`):

List of YouTube videos. Mix formats freely: a watch URL (https://www.youtube.com/watch?v=dQw4w9WgXcQ), a short link (https://youtu.be/dQw4w9WgXcQ), a Shorts URL, or a raw 11-character video id (dQw4w9WgXcQ).

## `preferredLanguages` (type: `array`):

Ordered list of language codes to prefer (e.g. en, es, de, ja, pt). The first available match wins. A manual transcript is preferred over an auto-generated one in the same language. If none match, the best available track is used.

## `includeTimestamps` (type: `boolean`):

When true, each segment includes its start time and duration in seconds. When false, segments contain text only.

## `combineToPlainText` (type: `boolean`):

When true, also emit a single fullText field with the whole transcript joined into one string. Handy for feeding into summarizers or search.

## `concurrency` (type: `integer`):

Maximum number of videos processed in parallel.

## `proxyConfiguration` (type: `object`):

Optional proxy. Transcripts are fetched cookielessly and work without a proxy for typical volumes. Add a proxy if you scrape at high volume from one IP.

## Actor input object example

```json
{
  "videos": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  ],
  "preferredLanguages": [
    "en"
  ],
  "includeTimestamps": true,
  "combineToPlainText": false,
  "concurrency": 5,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "videos": [
        "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    ],
    "preferredLanguages": [
        "en"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("goat255/youtube-transcript-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "videos": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
    "preferredLanguages": ["en"],
}

# Run the Actor and wait for it to finish
run = client.actor("goat255/youtube-transcript-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "videos": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  ],
  "preferredLanguages": [
    "en"
  ]
}' |
apify call goat255/youtube-transcript-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=goat255/youtube-transcript-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "YouTube Transcript Scraper Goat",
        "description": "Extract transcripts and captions from public YouTube videos in bulk. Returns timed segments (start, duration, text), the language, whether captions are auto-generated, and an optional joined plain-text blob. Accepts watch URLs, youtu.be links, Shorts, or raw video ids. No login or cookies.",
        "version": "0.0",
        "x-build-id": "hjunBaUShtKuOHVFT"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/goat255~youtube-transcript-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-goat255-youtube-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/goat255~youtube-transcript-scraper/runs": {
            "post": {
                "operationId": "runs-sync-goat255-youtube-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/goat255~youtube-transcript-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-goat255-youtube-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "videos"
                ],
                "properties": {
                    "videos": {
                        "title": "YouTube Videos (URLs or IDs)",
                        "type": "array",
                        "description": "List of YouTube videos. Mix formats freely: a watch URL (https://www.youtube.com/watch?v=dQw4w9WgXcQ), a short link (https://youtu.be/dQw4w9WgXcQ), a Shorts URL, or a raw 11-character video id (dQw4w9WgXcQ).",
                        "items": {
                            "type": "string"
                        }
                    },
                    "preferredLanguages": {
                        "title": "Preferred Languages",
                        "type": "array",
                        "description": "Ordered list of language codes to prefer (e.g. en, es, de, ja, pt). The first available match wins. A manual transcript is preferred over an auto-generated one in the same language. If none match, the best available track is used.",
                        "default": [
                            "en"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "includeTimestamps": {
                        "title": "Include timestamps",
                        "type": "boolean",
                        "description": "When true, each segment includes its start time and duration in seconds. When false, segments contain text only.",
                        "default": true
                    },
                    "combineToPlainText": {
                        "title": "Add joined plain text",
                        "type": "boolean",
                        "description": "When true, also emit a single fullText field with the whole transcript joined into one string. Handy for feeding into summarizers or search.",
                        "default": false
                    },
                    "concurrency": {
                        "title": "Concurrency",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "Maximum number of videos processed in parallel.",
                        "default": 5
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Optional proxy. Transcripts are fetched cookielessly and work without a proxy for typical volumes. Add a proxy if you scrape at high volume from one IP.",
                        "default": {
                            "useApifyProxy": false
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
