# Actor Youtube Transcript (`foudhil/actor-youtube-transcript`) Actor

Extract  transcripts from any YouTube video — no API key needed. Supports batch processing, parallel fetching, auto-retry with residential proxies, and multi-language captions. Output is LLM-ready Markdown, built for RAG pipelines, LangChain, LlamaIndex, and AI automation workflows.

- **URL**: https://apify.com/foudhil/actor-youtube-transcript.md
- **Developed by:** [Foudhil Riahi](https://apify.com/foudhil) (community)
- **Categories:** AI, Developer tools, Integrations
- **Stats:** 2 total users, 0 monthly users, 25.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## YouTube Transcript Extractor — RAG & AI Ready

Extract clean, structured transcripts from any YouTube video in seconds.
No YouTube API key required. Batch-ready. Built for AI pipelines that need to index video content at scale.

---

### Why developers choose this actor

| Feature | This actor |
|---|---|
| API key required | ❌ None needed |
| Batch processing | ✅ Hundreds of videos per run |
| Parallel fetching | ✅ Up to 10 concurrent videos |
| Timestamp support | ✅ Optional `[MM:SS]` per line |
| Multi-language | ✅ Any language, priority order |
| Auto-retry on block | ✅ 3 attempts, fresh proxy each time |
| LLM-ready output | ✅ Clean paragraphs, no post-processing |
| Cloud IP bypass | ✅ Residential proxy routing built-in |

---

### Input

| Field | Type | Required | Description |
|---|---|---|---|
| `videoUrl` | string | one of the two | Single YouTube URL or video ID |
| `videoUrls` | array | one of the two | List of URLs/IDs for batch mode |
| `includeTimestamps` | boolean | no | Prefix each line with `[MM:SS]` (default: false) |
| `languages` | array | no | Language priority list (default: `["en","en-US","en-GB"]`) |
| `maxConcurrency` | integer | no | Parallel videos, 1–10 (default: 3) |
| `proxyConfiguration` | object | recommended | Use Apify Residential proxies to bypass YouTube cloud IP blocks |

**Supported URL formats:**
````

https://www.youtube.com/watch?v=VIDEO\_ID
https://youtu.be/VIDEO\_ID
https://www.youtube.com/shorts/VIDEO\_ID
https://www.youtube.com/embed/VIDEO\_ID
VIDEO\_ID                                   ← bare 11-character ID also works

````

---

### Output

Each video produces one result object.

**Successful transcript:**
```json
{
  "videoId": "jNQXAC9IVRw",
  "youtubeUrl": "https://www.youtube.com/watch?v=jNQXAC9IVRw",
  "language": "en",
  "wordCount": 39,
  "durationMinutes": 0.3,
  "segmentCount": 9,
  "transcript": "All right, so here we are in front of the elephants...",
  "status": "success"
}
````

**Failed video (disabled captions, private, etc.):**

```json
{
  "videoId": "abc123",
  "youtubeUrl": "https://www.youtube.com/watch?v=abc123",
  "status": "error",
  "error": "Transcripts are disabled for this video"
}
```

| Field | Description |
|---|---|
| `videoId` | 11-character YouTube video ID |
| `youtubeUrl` | Full YouTube URL |
| `language` | Language code of the fetched transcript |
| `wordCount` | Total word count of the transcript |
| `durationMinutes` | Video duration in minutes |
| `segmentCount` | Number of caption segments |
| `transcript` | Clean Markdown text, ready for LLM ingestion |
| `status` | `success` or `error` |
| `error` | Error description (only present on failure) |

***

### Example inputs

**Single video:**

```json
{
  "videoUrl": "https://www.youtube.com/watch?v=jNQXAC9IVRw",
  "proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }
}
```

**Batch of videos with timestamps:**

```json
{
  "videoUrls": [
    "https://www.youtube.com/watch?v=VIDEO_1",
    "https://www.youtube.com/watch?v=VIDEO_2",
    "https://www.youtube.com/watch?v=VIDEO_3"
  ],
  "includeTimestamps": true,
  "maxConcurrency": 5,
  "proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }
}
```

**Non-English content:**

```json
{
  "videoUrl": "https://www.youtube.com/watch?v=VIDEO_ID",
  "languages": ["fr", "fr-FR", "en"],
  "proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }
}
```

***

### Use cases

- **RAG knowledge bases** — ingest entire YouTube channels into vector databases (LangChain, LlamaIndex, Pinecone, Weaviate)
- **AI video summarization** — feed transcripts to GPT-4 / Claude for summaries, key points, action items
- **Content repurposing pipelines** — convert video content to blog posts, newsletters, social media threads
- **Podcast transcription** — extract transcripts from YouTube-hosted podcast episodes
- **Competitive intelligence** — monitor competitor product demos, webinars, conference talks
- **Educational tools** — index courses and lectures for search and Q\&A
- **Multilingual pipelines** — extract captions in the original language for translation workflows

***

### Code examples

#### Python (Apify client)

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("foudhilriahi/youtube-transcript-extractor").call(
    run_input={
        "videoUrls": [
            "https://www.youtube.com/watch?v=VIDEO_1",
            "https://www.youtube.com/watch?v=VIDEO_2",
        ],
        "maxConcurrency": 5,
        "proxyConfiguration": {
            "useApifyProxy": True,
            "apifyProxyGroups": ["RESIDENTIAL"],
        },
    }
)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    if item["status"] == "success":
        print(f"{item['videoId']}: {item['wordCount']} words")
        print(item["transcript"][:200])
```

#### LangChain integration

```python
from langchain_community.utilities import ApifyWrapper

apify = ApifyWrapper()

loader = apify.call_actor(
    actor_id="foudhilriahi/youtube-transcript-extractor",
    run_input={
        "videoUrls": ["https://www.youtube.com/watch?v=VIDEO_ID"],
        "proxyConfiguration": {
            "useApifyProxy": True,
            "apifyProxyGroups": ["RESIDENTIAL"],
        },
    },
    dataset_mapping_function=lambda item: item.get("transcript", ""),
)

docs = loader.load()
## docs is now ready for your RAG pipeline
```

#### n8n / Make automation

1. Add an **Apify** node
2. Actor ID: `foudhilriahi/youtube-transcript-extractor`
3. Input: `{ "videoUrl": "{{ $json.youtubeUrl }}", "proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] } }`
4. Connect output to your vector database, Google Sheets, or email node
5. Schedule daily — done

***

### Pricing

**Pay Per Event** — you only pay for successful transcripts.

| Volume | Price per transcript |
|---|---|
| Any | $0.05 per successful extraction |

Failed videos (disabled captions, private videos, no captions available) are **not charged**.

**Cost example:** 1,000 transcripts = $50. At a typical video length of 30 minutes,
that's 500,000 minutes of transcribed content for $50.

***

### Notes & limitations

- Videos must have captions available (manually added or auto-generated by YouTube)
- Private and age-restricted videos cannot be transcribed
- YouTube blocks cloud IP ranges — residential proxy configuration is required for reliable operation (pre-filled in the input)
- Auto-generated captions may contain minor transcription errors, especially for technical terms

***

### Keywords

youtube transcript extractor, youtube transcript api, youtube to text, youtube captions download,
video transcript, rag youtube, llm video content, youtube to markdown, ai video pipeline,
langchain youtube, llamaindex video, youtube transcript python, video indexing ai,
batch youtube transcript, youtube captions api

# Actor input Schema

## `videoUrl` (type: `string`):

Single YouTube video URL or video ID

## `videoUrls` (type: `array`):

List of YouTube video URLs or IDs for batch processing

## `includeTimestamps` (type: `boolean`):

Prefix each line with \[MM:SS] timestamps

## `languages` (type: `array`):

Language codes to try in order. First match wins. Supports manual and auto-generated captions.

## `maxConcurrency` (type: `integer`):

How many videos to process in parallel. Higher = faster batch runs. Capped at 10.

## `proxyConfiguration` (type: `object`):

Required for reliable operation. YouTube blocks cloud/datacenter IPs — residential proxies bypass this.

## Actor input object example

```json
{
  "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "videoUrls": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "dQw4w9WgXcQ"
  ],
  "includeTimestamps": false,
  "languages": [
    "en",
    "en-US",
    "en-GB"
  ],
  "maxConcurrency": 3,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ]
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("foudhil/actor-youtube-transcript").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
    } }

# Run the Actor and wait for it to finish
run = client.actor("foudhil/actor-youtube-transcript").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}' |
apify call foudhil/actor-youtube-transcript --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=foudhil/actor-youtube-transcript",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Actor Youtube Transcript",
        "description": "Extract  transcripts from any YouTube video — no API key needed. Supports batch processing, parallel fetching, auto-retry with residential proxies, and multi-language captions. Output is LLM-ready Markdown, built for RAG pipelines, LangChain, LlamaIndex, and AI automation workflows.",
        "version": "0.0",
        "x-build-id": "cE73FW2epz0z1eYiv"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/foudhil~actor-youtube-transcript/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-foudhil-actor-youtube-transcript",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/foudhil~actor-youtube-transcript/runs": {
            "post": {
                "operationId": "runs-sync-foudhil-actor-youtube-transcript",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/foudhil~actor-youtube-transcript/run-sync": {
            "post": {
                "operationId": "run-sync-foudhil-actor-youtube-transcript",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "videoUrl": {
                        "title": "Video URL",
                        "type": "string",
                        "description": "Single YouTube video URL or video ID"
                    },
                    "videoUrls": {
                        "title": "Video URLs (batch)",
                        "type": "array",
                        "description": "List of YouTube video URLs or IDs for batch processing"
                    },
                    "includeTimestamps": {
                        "title": "Include Timestamps",
                        "type": "boolean",
                        "description": "Prefix each line with [MM:SS] timestamps",
                        "default": false
                    },
                    "languages": {
                        "title": "Language Priority",
                        "type": "array",
                        "description": "Language codes to try in order. First match wins. Supports manual and auto-generated captions.",
                        "default": [
                            "en",
                            "en-US",
                            "en-GB"
                        ]
                    },
                    "maxConcurrency": {
                        "title": "Max Concurrent Videos",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "integer",
                        "description": "How many videos to process in parallel. Higher = faster batch runs. Capped at 10.",
                        "default": 3
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Required for reliable operation. YouTube blocks cloud/datacenter IPs — residential proxies bypass this."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
