# YouTube Transcript Scraper (`happy_b/youtube-transcript-scraper`) Actor

Extract YouTube video transcripts with timestamps, word counts, and full video metadata.

- **URL**: https://apify.com/happy\_b/youtube-transcript-scraper.md
- **Developed by:** [Happy B](https://apify.com/happy_b) (community)
- **Categories:** Social media, Videos
- **Stats:** 3 total users, 2 monthly users, 75.0% runs succeeded, 2 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

### What is YouTube Transcript Scraper?

YouTube Transcript Scraper extracts complete transcripts from any public YouTube video — with **timestamped segments, word counts, and full video metadata** in one flat row.

Every transcript comes with exact view counts, ISO 8601 dates, tags, and categories. No post-processing needed.

#### Data points extracted per video

| Field | Description | Example |
|-------|-------------|---------|
| `transcriptText` | Full plain text transcript | `Apache Spark is an open-source data analytics engine...` |
| `transcriptSegments` | Timestamped segments (JSON) | `[{"start":0.32,"duration":4.08,"text":"Apache Spark..."}]` |
| `transcriptLanguage` | Actual language returned | `en` |
| `transcriptWordCount` | Word count of full text | `1847` |
| `transcriptAvailable` | Whether captions exist | `true` |
| `videoId` | YouTube video ID | `dQw4w9WgXcQ` |
| `title` | Video title | `Rick Astley - Never Gonna Give You Up` |
| `publishedAt` | ISO 8601 upload date | `2009-10-25T06:57:33Z` |
| `viewCount` | Exact view count | `1500000000` |
| `likeCount` | Exact like count | `15000000` |
| `commentCount` | Exact comment count | `3200000` |
| `duration` | ISO 8601 duration | `PT3M33S` |
| `durationSeconds` | Duration in seconds | `213` |
| `tags` | Video tags | `rick astley,never gonna give you up` |
| `categoryId` | YouTube category ID | `10` |
| `categoryName` | Human-readable category | `Music` |
| `thumbnailUrl` | Video thumbnail | `https://i.ytimg.com/vi/.../maxresdefault.jpg` |
| `channelName` | Channel name | `Rick Astley` |
| `channelId` | Channel ID | `UCuAXFkgsw1L7xaCfnd5JJOw` |

**20 fields per video. Transcript + metadata in one row.**

### Why use this scraper?

#### Our Actor vs Top YouTube Transcript Actors

| Feature | Us | Pinto Studio | StarVibe | Karamelo |
|---------|:--:|:------------:|:--------:|:--------:|
| Bulk video URLs | ✅ | ❌ 1 only | ❌ 1 only | ✅ |
| Timestamped segments | ✅ | ✅ | ✅ | ✅ |
| Plain text output | ✅ | ❌ | ✅ | ✅ |
| Word count | ✅ | ❌ | ❌ | ❌ |
| Language selection | ✅ | ✅ | ✅ | ❌ |
| `viewCount` exact integer | ✅ | ❌ | ❌ abbreviated | ❌ abbreviated |
| `likeCount` exact integer | ✅ | ❌ | ❌ | ❌ |
| `publishedAt` ISO 8601 | ✅ | ❌ | ✅ | ✅ |
| `tags` | ✅ | ❌ | ❌ | ✅ keywords |
| `categoryId` + `categoryName` | ✅ | ❌ | ❌ | ❌ |
| `durationSeconds` integer | ✅ | ❌ | ❌ | ❌ |

- **One row, full picture** — transcript + video metadata in one flat CSV row. No second API call needed.
- **Incremental delivery** — Results appear in your dataset as each video is processed.
- **From $5.00 per 1,000 transcripts** — Volume discounts down to $3.00 on Business plan.

### Use cases

- **Content repurposing** — Turn video content into blog posts, newsletters, and social media. The full text is ready for editing, the word count tells you the article length.
- **AI/ML training data** — Feed structured transcripts with metadata into classification, embedding, or fine-tuning pipelines. Tags and categories provide free labels.
- **SEO optimization** — Extract keyword-rich transcript text to create written content that boosts organic search rankings.
- **Academic research** — Build corpora for communication studies, discourse analysis, and media research. Exact timestamps enable precise citation.
- **Accessibility** — Generate subtitle files from timestamped segments for videos that lack proper captions.
- **Competitive analysis** — Analyze what competitors talk about, how long their content is, and which topics get the most engagement.

### How much does it cost?

Each video counts as one item, whether or not a transcript is available.

| Plan | Price per 1,000 items |
|------|----------------------|
| Free | $5.00 |
| Starter | $4.00 |
| Scale | $3.50 |
| Business | $3.00 |

| Scenario | Items | Cost |
|----------|-------|------|
| 5 videos | 5 | $0.025 |
| 50 videos | 50 | $0.25 |
| 500 videos | 500 | $2.50 |
| 1,000 videos | 1,000 | $5.00 |

Apify also charges a small compute cost (CU) for the actor's runtime — typically under $0.01 for most runs.

### Input

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `videoUrls` | string[] | *required* | YouTube video URLs or bare video IDs. Supports youtube.com/watch, youtu.be, /shorts/, /embed/, /live/ formats. |
| `language` | string | `en` | Preferred transcript language (ISO 639-1). Falls back to auto-generated captions if manual not available. |
| `includeTimestamps` | boolean | `true` | Include timestamped segments in `transcriptSegments`. Disable for plain text only. |
| `includeVideoMetadata` | boolean | `true` | Attach video metadata (title, views, likes, tags, category) to each row. |

### Output example

Each item in the dataset is a single video:

```json
{
  "transcriptText": "We're no strangers to love You know the rules and so do I...",
  "transcriptSegments": "[{\"start\":0.0,\"duration\":3.12,\"text\":\"We're no strangers to love\"},{\"start\":3.12,\"duration\":4.56,\"text\":\"You know the rules and so do I\"}]",
  "transcriptLanguage": "en",
  "transcriptWordCount": 254,
  "transcriptAvailable": true,
  "videoId": "dQw4w9WgXcQ",
  "title": "Rick Astley - Never Gonna Give You Up (Official Music Video)",
  "publishedAt": "2009-10-25T06:57:33Z",
  "viewCount": 1500000000,
  "likeCount": 15000000,
  "commentCount": 3200000,
  "duration": "PT3M33S",
  "durationSeconds": 213,
  "tags": "rick astley,never gonna give you up,official music video",
  "categoryId": 10,
  "categoryName": "Music",
  "thumbnailUrl": "https://i.ytimg.com/vi/dQw4w9WgXcQ/maxresdefault.jpg",
  "channelName": "Rick Astley",
  "channelId": "UCuAXFkgsw1L7xaCfnd5JJOw",
  "scrapeTimestamp": "2026-04-02T12:00:00Z"
}
````

When `transcriptAvailable` is `false`, the video has no captions — `transcriptText` and `transcriptSegments` will be empty, but video metadata is still populated.

Download your results as **JSON**, **CSV**, **Excel**, **XML**, or **HTML** from the dataset tab, or access them via the [Apify API](https://docs.apify.com/api/v2#/reference/datasets).

### Good to know

**Not all videos have transcripts.** Videos without captions (manual or auto-generated) will return `transcriptAvailable: false`. The video metadata is still extracted.

**Language fallback.** If the requested language isn't available, the scraper falls back to auto-generated captions in the closest available language.

**Bulk processing.** Videos are processed in batches for efficiency. Large runs (500+ videos) may take a few minutes.

### Integrations

Connect this actor to your workflow with [Apify integrations](https://docs.apify.com/platform/integrations):

- **Make (Integromat)** — trigger workflows when new data is available
- **Zapier** — push transcripts to Google Sheets, Slack, or databases
- **GitHub** — store results in repositories
- **Google Drive** — export directly to spreadsheets
- **Webhooks** — notify your API when the run completes
- **Apify API** — programmatic access for custom pipelines

### Legal and personal data

This actor extracts publicly available data from YouTube. You should ensure your use of the extracted data complies with YouTube's [Terms of Service](https://www.youtube.com/t/terms), applicable data protection laws (GDPR, CCPA), and your jurisdiction's regulations regarding web scraping and data processing.

Transcripts are publicly visible on YouTube when captions are enabled.

### Support

Found a bug or have a feature request? Open an issue on the [Issues tab](https://apify.com/happy_b/youtube-transcript-scraper/issues/open) or contact us through Apify messaging.

# Actor input Schema

## `videoUrls` (type: `array`):

YouTube video URLs or video IDs to extract transcripts from. Supports all URL formats.

## `language` (type: `string`):

Preferred transcript language (ISO 639-1 code). Falls back to auto-generated captions if manual not available.

## `includeTimestamps` (type: `boolean`):

Include segment-level timestamps in transcriptSegments field. Disable for plain text only.

## `includeVideoMetadata` (type: `boolean`):

Attach video metadata (title, views, likes, duration, tags, category) to each row.

## Actor input object example

```json
{
  "videoUrls": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  ],
  "language": "en",
  "includeTimestamps": true,
  "includeVideoMetadata": true
}
```

# Actor output Schema

## `transcripts` (type: `string`):

No description

## `metadata` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "videoUrls": [
        "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("happy_b/youtube-transcript-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"] }

# Run the Actor and wait for it to finish
run = client.actor("happy_b/youtube-transcript-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "videoUrls": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  ]
}' |
apify call happy_b/youtube-transcript-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=happy_b/youtube-transcript-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "YouTube Transcript Scraper",
        "description": "Extract YouTube video transcripts with timestamps, word counts, and full video metadata.",
        "version": "0.1",
        "x-build-id": "kt6LHkbe4FFUdb5yo"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/happy_b~youtube-transcript-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-happy_b-youtube-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/happy_b~youtube-transcript-scraper/runs": {
            "post": {
                "operationId": "runs-sync-happy_b-youtube-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/happy_b~youtube-transcript-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-happy_b-youtube-transcript-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "videoUrls"
                ],
                "properties": {
                    "videoUrls": {
                        "title": "Video URLs",
                        "type": "array",
                        "description": "YouTube video URLs or video IDs to extract transcripts from. Supports all URL formats.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "language": {
                        "title": "Language",
                        "type": "string",
                        "description": "Preferred transcript language (ISO 639-1 code). Falls back to auto-generated captions if manual not available.",
                        "default": "en"
                    },
                    "includeTimestamps": {
                        "title": "Include Timestamps",
                        "type": "boolean",
                        "description": "Include segment-level timestamps in transcriptSegments field. Disable for plain text only.",
                        "default": true
                    },
                    "includeVideoMetadata": {
                        "title": "Include Video Metadata",
                        "type": "boolean",
                        "description": "Attach video metadata (title, views, likes, duration, tags, category) to each row.",
                        "default": true
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
