# YouTube Transcript Scraper + Whisper AI Fallback (`codepoetry/youtube-transcript-ai-scraper`) Actor

Extract YouTube transcripts from any video — even without captions. Whisper AI fallback, LLM-ready output, SRT/VTT export. No API key. $0.001/video.

- **URL**: https://apify.com/codepoetry/youtube-transcript-ai-scraper.md
- **Developed by:** [CodePoetry](https://apify.com/codepoetry) (community)
- **Categories:** AI, Social media, Videos
- **Stats:** 15 total users, 5 monthly users, 71.4% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.70 / 1,000 transcript extracteds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

### YouTube Transcript Scraper — Captions + AI Speech-to-Text

**Extract transcripts from any YouTube video — even when captions don't exist.**

Most transcript tools stop working when a video has no captions. This actor doesn't. It pulls native captions when YouTube has them, and transcribes the audio with built-in speech-to-text AI when it doesn't. No external API key required.

Give it a single video, a full playlist, or an entire channel. Get transcripts in JSON, plain text, SRT, VTT, or an LLM-ready format — ready to download or feed into a pipeline. Built for bulk runs: concurrent processing, pay-per-result pricing, and no wasted resources on requests that don't need them.

> New to Apify? Every new account gets **$5 in free credits** — no credit card needed. That's enough to transcribe an entire YouTube channel (~4,900 native transcripts).

---

### How to scrape YouTube transcripts

1. Click **Try for free** on this actor's page.
2. Paste one or more YouTube URLs into the **YouTube URLs** field:
   - Individual videos (`youtube.com/watch?v=...`)
   - Playlists (`youtube.com/playlist?list=...`)
   - Channels (`youtube.com/@channelname`)
3. Choose your **Output Formats**. Not sure? Start with **Plain Text** — it's the words as one block of text.
4. Set **Caption Languages** if you need something other than English (default: `en`). Use two-letter codes: `es` = Spanish, `fr` = French, `de` = German.
5. Videos without captions are automatically transcribed by the built-in AI model. Set a **Max AI Minutes** cap to control spend (default: 30 minutes).
6. Click **Start**. A single video with captions finishes in under 30 seconds. A 100-video playlist typically finishes in 2–3 minutes.
7. Download results as JSON, CSV, or Excel — or consume them via the [Apify API](#python--apify-client).

A single video costs ~$0.006. No subscription, no commitment — pay only for what you use.

---

### How it works

**Step 1 — Expand**
Paste one or more URLs — single videos, playlists, or channel URLs. The actor resolves them into individual video URLs automatically.

**Step 2 — Extract**
For each video, the actor checks for native captions (manual or auto-generated) in your requested languages. If captions exist, they are fetched and formatted immediately — no audio download needed.

**Step 3 — Transcribe (when needed)**
If no captions are found, the actor automatically downloads the audio and transcribes it using a bundled [faster-whisper](https://github.com/SYSTRAN/faster-whisper) model running on Apify's compute — no external transcription API needed. The output has the same structure as native caption output. Use **Max AI minutes per run** and **Skip AI for long videos** to control AI spend.

One failed video never stops the batch. Every item in the output dataset has an `error_code` field so you can filter results programmatically.

---

### What you get

Every output item contains full video metadata and your transcript in the formats you requested.

#### Video metadata

| Field | Type | Description |
|---|---|---|
| `metadata.id` | string | YouTube video ID |
| `metadata.title` | string | Video title |
| `metadata.url` | string | Canonical watch URL |
| `metadata.channel` | string | Channel display name |
| `metadata.channel_id` | string | Channel ID (UC-prefixed) |
| `metadata.channel_url` | string | Channel URL |
| `metadata.description` | string | Full video description |
| `metadata.duration` | integer | Duration in seconds |
| `metadata.view_count` | integer | Total views |
| `metadata.like_count` | integer | Total likes |
| `metadata.upload_date` | string | Upload date (YYYYMMDD) |
| `metadata.thumbnail` | string | Highest-resolution thumbnail URL |
| `metadata.tags` | array | Creator-set tags |
| `metadata.categories` | array | YouTube categories |

#### Transcript fields

| Field | Type | Description |
|---|---|---|
| `language` | string | Language code of the transcript (e.g. `en`, `zh-TW`) |
| `is_auto_generated` | boolean | `true` if YouTube auto-generated the captions |
| `is_ai_generated` | boolean | `true` if transcribed by the built-in AI model |
| `transcript_json` | array | Timestamped segments `[{start, end, text}]`. When `wordLevel: true`, each segment also has a `words` array: `[{start, text}]` for native captions, `[{start, end, text}]` for AI transcriptions. |
| `transcript_text` | string | Plain text transcript |
| `transcript_llm` | string | Text with `[Music]`, `(laughter)`, and filler tokens stripped — ready for AI pipelines |
| `transcript_srt` | string | SRT subtitle format. Always present for AI-transcribed items even if not in `outputFormats`. |
| `transcript_vtt` | string | WebVTT format |
| `language_probability` | number | AI model's confidence in the detected language (0–1). AI transcription only. |
| `language_was_forced` | boolean | `true` when `forceWhisperLanguage` was set. AI transcription only. |
| `ai_duration_charged_min` | integer | Minutes of AI time charged for this video. AI transcription only. |
| `ai_speech_duration_sec` | number | Actual speech duration detected by the model in seconds (informational). AI transcription only. |
| `available_languages` | array | Caption language codes YouTube provides on this video. Only present on `NO_CAPTIONS_AVAILABLE` and `LANGUAGE_NOT_FOUND` error items — use them to refine your `languages` input. |
| `error_code` | string | Structured error code when extraction failed. See [Error codes](#error-item) for the full reference table. |

---

### Use cases

Here are ten real workflows built on this actor — from quick one-off summaries to scheduled competitive intelligence pipelines.

#### 1. Claude Desktop / Claude.ai MCP integration

Connect this actor as an [MCP server](https://apify.com/integrations/mcp) so Claude Desktop, Claude.ai Projects, Cursor, or any other MCP-compatible AI client can fetch a transcript just by being handed a YouTube URL. Ask Claude to "summarise this video" or "extract the key points from this lecture" — no copy-pasting required.

**Recommended settings:** `outputFormats: ["llm"]`

---

#### 2. YouTube Shorts — per-word karaoke captions

YouTube Shorts often have auto-generated captions. Enable `wordLevel: true` to get per-word start times from the `transcript_json` field. Feed the result into a caption editor (CapCut, DaVinci Resolve, Adobe Premiere) to produce word-by-word highlighted captions — the "karaoke" style popular on short-form video.

**Recommended settings:** `wordLevel: true`, `outputFormats: ["json", "srt"]`, `subType: "auto"`

---

#### 3. Build a searchable knowledge base (RAG)

Bulk-extract every video from a company channel, educational YouTube account, or podcast series. Store the `transcript_llm` text in a vector database (Pinecone, Weaviate, pgvector) indexed by `metadata.id` and `metadata.title`. Use it as a retrieval-augmented generation (RAG) corpus so your chatbot can answer questions grounded in the exact video content.

**Recommended settings:** `outputFormats: ["llm"]`, `maxResults: 500`. AI fallback is always active — videos without captions are transcribed automatically.

---

#### 4. NLP / sentiment analysis pipeline

Extract transcripts from a brand's channel, a competitor's channel, or a set of product-review videos. Pipe `transcript_text` into an NLP pipeline (spaCy, HuggingFace Transformers, OpenAI) for sentiment scoring, named entity extraction, topic modeling, or keyword frequency. Useful for brand monitoring and competitive intelligence.

**Recommended settings:** `outputFormats: ["text", "llm"]`, `subType: "both"`

---

#### 5. LLM training data collection

Curate domain-specific transcripts from niche YouTube channels (medical lectures, legal explainers, coding tutorials, scientific talks) to build fine-tuning datasets. The `transcript_llm` format strips filler tokens cleanly. Use `metadata.tags` and `metadata.categories` to filter and label the data.

**Recommended settings:** `outputFormats: ["llm"]`, `maxAiMinutes` cap per run to control cost.

---

#### 6. SEO content repurposing

Turn a library of tutorials or vlogs into written content. Pass the `transcript_llm` field to an LLM prompt asking it to rewrite the transcript as a blog post, Twitter/X thread, newsletter section, or LinkedIn article. Combine with `metadata.title`, `metadata.tags`, and `metadata.description` for context.

**Recommended settings:** `outputFormats: ["llm"]`, `languages: ["en"]`

---

#### 7. Podcast / lecture transcription (no captions)

Podcasters who upload to YouTube and educators who post lecture recordings rarely add manual captions. The actor automatically transcribes them with faster-whisper. Use `forceWhisperLanguage` if you know the channel's language to skip the auto-detection window and reduce cost.

**Recommended settings:** `forceWhisperLanguage: "en"`, `skipAiFallbackIfLongerThan: 120` to skip anything over 2 hours.

---

#### 8. Accessibility and caption quality audit

Compare YouTube's auto-generated captions (`subType: "auto"`, `is_auto_generated: true`) against an AI transcription of the same video. Differences surface errors in the auto-generated track. Useful for accessibility compliance reviews or for creators who want to improve their caption quality before publishing.

**Recommended settings:** Two runs — one with `subType: "auto"` only, one with `subType: "manual"` to force AI fallback (since no manual captions exist, the actor will auto-transcribe).

---

#### 9. Academic research and citation analysis

Download a researcher's full lecture series, a conference talk archive, or all videos from an academic YouTube channel. Index the transcripts by speaker, date (`metadata.upload_date`), and topic. Use to find when specific terminology first appeared, how arguments evolved over time, or to build a citation graph for a literature review.

**Recommended settings:** `outputFormats: ["json", "text"]`, `maxResults: 1000`, `languages` set to the channel's primary language.

---

#### 10. Competitive intelligence monitoring (scheduled runs)

Schedule the actor to run weekly on a competitor's channel URL. Set `maxResults: 5` to pull only the latest videos. Use an Apify webhook to POST the new transcripts to Slack, a CRM, or an internal dashboard. Get an automatic digest of every new product announcement, feature mention, or pricing discussion your competitor publishes on YouTube.

**Recommended settings:** `maxResults: 5`, `outputFormats: ["llm"]`, paired with an Apify schedule and webhook.

---

### Output examples

#### Native caption output

```json
{
  "metadata": {
    "id": "dQw4w9WgXcQ",
    "title": "Rick Astley - Never Gonna Give You Up",
    "channel": "Rick Astley",
    "duration": 213,
    "view_count": 1757728410,
    "upload_date": "20091025"
  },
  "language": "en",
  "is_auto_generated": false,
  "is_ai_generated": false,
  "transcript_json": [
    { "start": 18.5, "end": 21.0, "text": "We're no strangers to love" },
    { "start": 21.0, "end": 24.5, "text": "You know the rules and so do I" }
  ],
  "transcript_text": "We're no strangers to love You know the rules and so do I ...",
  "transcript_llm": "We're no strangers to love You know the rules and so do I ..."
}
````

#### AI transcription output

When a video has no captions, AI transcription runs automatically:

```json
{
  "metadata": { "title": "...", "duration": 240 },
  "is_ai_generated": true,
  "language": "en",
  "language_probability": 0.9987,
  "ai_duration_charged_min": 4,
  "transcript_json": [
    { "start": 0.0, "end": 3.2, "text": "Welcome to today's episode." }
  ],
  "transcript_text": "Welcome to today's episode. ...",
  "transcript_llm": "Welcome to today's episode. ..."
}
```

#### Error item

```json
{
  "url": "https://www.youtube.com/watch?v=...",
  "metadata": { "title": "...", "duration": 720 },
  "error": "No subtitles found in requested languages.",
  "error_code": "LANGUAGE_NOT_FOUND"
}
```

**Error codes:**

Error items are **never billed** — `Actor.charge()` is only called on successful transcript results. The table below groups errors by cause so you know whether the issue is in your input or something outside your control.

**Input errors** — caused by the URLs or settings you provided:

| Error code | Meaning | What to do |
|---|---|---|
| `AGE_RESTRICTED` | YouTube requires sign-in / age verification to access this video. | Remove the URL — cannot be bypassed. |
| `PRIVATE_OR_UNAVAILABLE` | The video is private, deleted, or blocked in the runner's region. | Remove the URL or check if the video is public. |
| `LIVE_VIDEO` | Live streams have no static captions to extract. | Wait until the stream ends, then retry. |
| `LANGUAGE_NOT_FOUND` | Captions exist but not in the requested language. `available_languages` shows what's available. | Change your `languages` input. |

**Budget / limit errors** — the video could be transcribed, but a budget gate prevented it:

| Error code | Meaning | What to do |
|---|---|---|
| `NO_CAPTIONS_AVAILABLE` | The video has zero caption tracks. AI fallback is attempted if budget allows. | Ensure AI fallback is not blocked by the limits below. |
| `AI_MINUTES_LIMIT_REACHED` | The `maxAiMinutes` budget for this run is exhausted. | Increase `maxAiMinutes` and retry. |
| `AI_FALLBACK_SKIPPED_TOO_LONG` | The video exceeds the `skipAiFallbackIfLongerThan` duration limit. | Increase or remove the limit. |
| `SPENDING_LIMIT_REACHED` | The Apify account spending limit was hit — no further AI charges possible. | Adjust your Apify billing settings. |

**Infrastructure / actor errors** — not caused by your input; no charge is made:

| Error code | Meaning | What to do |
|---|---|---|
| `BOT_DETECTION` | YouTube challenged the request. The actor retried through proxy tiers automatically. | Usually self-resolving. Switch proxy group if persistent. |
| `EXTRACTION_ERROR` | Generic yt-dlp failure — the video may be temporarily unavailable on YouTube's side. | Retry later. |
| `AI_TRANSCRIPTION_FAILED` | The Whisper model or audio download failed for this video. | Check run logs; retry. |
| `UNEXPECTED_ERROR` | An unhandled exception in the actor code. The video gets an error item; other videos continue. | Open an issue if persistent. |

***

### Pricing

This actor uses **Pay-Per-Event** pricing — you pay for results, not compute time or monthly fees.

**In plain terms:** a single native transcript costs $0.001. There is also a $0.005 one-time startup fee per run. Scraping one video costs around **$0.006 total**. From the second video onwards, this actor is cheaper than competitors charging $0.005 flat per transcript.

#### How much does a run cost?

| Videos | This actor | Typical competitor ($0.005/transcript) | You save |
|---|---|---|---|
| 1 | $0.006 | $0.005 | — |
| 2 | $0.007 | $0.010 | 30% |
| 10 | $0.015 | $0.050 | 70% |
| 100 | $0.105 | $0.500 | 79% |
| 1,000 | $1.005 | $5.000 | 80% |

#### Native transcript pricing

| Plan | Per transcript | 10 videos | 100 videos | 1,000 videos |
|---|---|---|---|---|
| Free | $0.001 | $0.015 | $0.105 | $1.005 |
| Bronze ($49/mo) | $0.0009 | $0.014 | $0.095 | $0.905 |
| Silver ($199/mo) | $0.0008 | $0.013 | $0.085 | $0.805 |
| Gold ($999/mo) | $0.0007 | $0.012 | $0.075 | $0.705 |

#### AI transcription pricing (when captions are unavailable)

AI is only charged for videos that actually need it — native captions are always checked first. Billed minutes are based on the published video duration (rounded up to the nearest minute, minimum 1 minute per video), not on detected speech length. The `ai_speech_duration_sec` field in the output is informational.

| Plan | Per AI minute | 10-min video | 60-min video |
|---|---|---|---|
| Free | $0.012 | $0.12 | $0.72 |
| Bronze | $0.011 | $0.11 | $0.66 |
| Silver | $0.010 | $0.10 | $0.60 |
| Gold | $0.009 | $0.09 | $0.54 |

#### Real-world examples

| Task | Videos | AI? | Estimated cost (Free plan) |
|---|---|---|---|
| Single video | 1 | No | ~$0.006 |
| YouTube playlist (20 videos) | 20 | No | ~$0.025 |
| Channel analysis (100 videos) | 100 | No | ~$0.105 |
| Podcast batch (20 × 45 min, no captions) | 20 | Yes — 900 AI min | ~$10.81 |
| Research corpus (500 videos, 20% no captions, 10 min avg) | 500 | Mixed | ~$12.41 |

On the free $5 credit: approximately **4,900 native transcripts** (enough for an entire YouTube channel), or around **400 minutes of AI transcription**.

> The prices above are Pay-Per-Event charges only and do not include proxy costs. The default datacenter proxy costs nothing on clean runs — it is only used as a fallback when YouTube challenges a request. If the datacenter tier is also challenged, the actor auto-escalates to residential (~$0.40/GB), though this is rare. See [Proxy configuration](#proxy-configuration) for details.

#### Built for bulk runs

Every part of this actor is designed to keep costs and resource use as low as possible, especially at scale:

- **Pay per result, not per run.** The $0.005 startup fee is charged once regardless of batch size — so a 1,000-video run costs nearly the same overhead as a 10-video run.
- **No proxy cost on clean runs.** Every request goes direct first. The proxy is only used as a silent fallback if YouTube challenges a specific request — and that happens rarely. Most runs pay $0 in proxy fees.
- **AI model loaded on demand.** The transcription model is only initialised when a video actually needs AI transcription. Runs that rely entirely on native captions start faster and use less memory.
- **Concurrent processing.** Up to 5 videos are processed in parallel, significantly reducing wall-clock time for large playlists or channels.
- **Built-in spend controls.** **Max AI minutes per run** and **Skip AI for long videos** let you set hard caps on AI spend before a run starts — no surprises from unexpectedly long videos.
- **One failed video never slows the batch.** Errors are logged and skipped immediately; the rest of the batch continues at full speed.

***

### How it compares

| Feature | This actor | Typical alternatives |
|---|---|---|
| Transcribes videos with no captions | Yes — built-in AI, no external API key | No — returns an error |
| LLM-optimised output (filler stripped) | Yes — `transcript_llm` field | No |
| Spend safeguards (AI minute cap, skip long videos) | Yes | No |
| Native transcript price | $0.001 per transcript | Up to $0.005 — 5× more |
| No monthly subscription | Yes — pay only for what you run | Flat monthly fee |
| Batch: playlists and channels | Yes | Most |
| Output formats | JSON, Text, SRT, VTT, LLM | Usually JSON only |
| Word-level timestamps | Yes | Rare |
| YouTube Data API key required | No | No |
| Automatic access challenge bypass | Yes — retries via proxy when needed, direct otherwise | Varies |
| MCP-compatible (Claude Desktop, Cursor, etc.) | Yes — via Apify MCP integration | Rare |

***

### Who uses it

#### AI and LLM developers

Feed transcripts into RAG pipelines, summarisation chains, or fine-tuning datasets. The `transcript_llm` field strips `[Music]`, `(laughter)`, and other filler tokens that bloat context windows. Compatible with LangChain, LlamaIndex, and other Python AI frameworks.

#### Content creators and marketers

Turn any YouTube video into a blog post or newsletter draft without manual transcription. Extract pull quotes from interviews. Run an entire channel archive in one batch.

#### SEO professionals and researchers

Extract keyword data from video transcripts at scale. Build text content from videos to rank alongside YouTube results on Google. Analyse a competitor's spoken messaging for topic and positioning gaps.

#### Data scientists and academics

Build NLP corpora from lectures, conference talks, and documentary interviews. Process multilingual transcripts for cross-language analysis. Run large dataset collection jobs overnight via the API.

#### Developers building MCP-integrated AI tools

Connect this actor as an [MCP server](https://apify.com/integrations/mcp) so Claude Desktop, Claude.ai Projects, Cursor, or any MCP-compatible client can fetch and process YouTube transcripts in a single tool call. No copy-pasting, no API wiring — just hand the model a URL.

***

### Integration examples

#### Python — Apify client

> Get your API token from the Apify Console under **Settings → Integrations**. Keep it secret — treat it like a password.

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("codepoetry/youtube-transcript-ai-scraper").call(
    run_input={
        "startUrls": [{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}],
        "languages": ["en"],
        "outputFormats": ["json", "llm"],
    }
)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["metadata"]["title"])
    print(item["transcript_text"][:200])
```

#### JavaScript / Node.js

```javascript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });

const run = await client.actor('codepoetry/youtube-transcript-ai-scraper').call({
    startUrls: [{ url: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ' }],
    languages: ['en'],
    outputFormats: ['json', 'llm'],
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(item => console.log(item.metadata.title, item.transcript_text.slice(0, 200)));
```

#### LangChain / RAG pipeline

```python
from apify_client import ApifyClient
from langchain.docstore.document import Document

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("codepoetry/youtube-transcript-ai-scraper").call(
    run_input={
        "startUrls": [{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}],
        "outputFormats": ["llm"],
        "maxAiMinutes": 60,
    }
)

docs = [
    Document(
        page_content=item["transcript_llm"],
        metadata={"source": item["metadata"]["url"], "title": item["metadata"]["title"]},
    )
    for item in client.dataset(run["defaultDatasetId"]).iterate_items()
    if "transcript_llm" in item
]
## docs is ready for any LangChain vector store or retriever
```

#### Run on a schedule or trigger a webhook

To run this actor on a schedule or receive a webhook notification when a run finishes, use the **Schedules** and **Integrations** tabs on the actor's page in the Apify Console. See the [Apify scheduling docs](https://docs.apify.com/platform/schedules) and [webhook docs](https://docs.apify.com/platform/integrations/webhooks) for setup instructions.

***

### Advanced options

All options can be set in the Input form or passed as JSON when calling via the API.

| Option | UI label | Default | When to use |
|---|---|---|---|
| `maxResults` | Max videos | `10` | Cap how many videos are fetched from a playlist or channel. Single video URLs ignore this. |
| `languages` | Caption languages | `["en"]` | Preferred caption languages in order of priority. First match on the video is used. Codes: `en` English, `es` Spanish, `fr` French, `de` German. |
| `subType` | Caption source | `"both"` | `"manual"` = human captions only · `"auto"` = auto-generated only · `"both"` = prefer manual, fall back to auto |
| `outputFormats` | Output formats | json, text, llm | Which transcript formats to write to the dataset. |
| `wordLevel` | Word-level timestamps | `false` | Add per-word timestamps to JSON segments. Not available for manual captions. |
| `maxAiMinutes` | Max AI minutes | `30` | Hard cap on AI transcription minutes per run. Set to `0` for unlimited. Recommended when processing unknown playlists. |
| `skipAiFallbackIfLongerThan` | Skip AI for videos longer than | `0` (off) | Skip AI for videos exceeding N minutes. Avoids unexpected costs from long videos. |
| `forceWhisperLanguage` | AI transcription language | auto-detect | Force AI to a specific language (ISO code, e.g. `"es"`). Skips 30-second detection window, saves ~20% per video. |

***

### Proxy configuration

Proxy is always active and fully automatic — no configuration needed. Every request goes direct first, and the proxy is only used if YouTube challenges the request. **This costs nothing on a clean run.**

#### How it works

Occasionally YouTube asks automated requests to verify they are not bots. When this happens the actor automatically escalates through progressively stronger proxy tiers until the request succeeds:

1. **Direct request** (no proxy) — used first for every video. Zero cost.
2. **Datacenter proxy** — fast and free on most plans. Handles the vast majority of challenges.
3. **Residential proxy** — highest trust with YouTube. Used only if the datacenter tier is also challenged.

The escalation is fully automatic — you do not need to configure anything. If all tiers are exhausted, the affected video is marked with a `BOT_DETECTION` error code and the actor continues with the remaining videos.

#### Proxy costs

| Type | Cost | Notes |
|---|---|---|
| Datacenter (Apify) | Free on most plans | Default first tier. Zero bandwidth consumed on clean runs. |
| Residential (Apify) | ~$0.40 / GB | Auto-escalation tier. Only consumed if datacenter proxy is also challenged — rare. |

Proxy costs are billed from your Apify account balance as a separate line item, alongside this actor's Pay-Per-Event charges. On a typical run with no bot challenges: **$0 proxy cost**. If datacenter retry is needed: approximately 0.5 MB per affected video. Residential is only consumed if datacenter also fails — this is rare and keeps costs minimal even in bulk runs.

***

### Limitations

All non-recoverable failures produce a dataset item with an `error_code` field. See the [Error codes table](#error-item) for the full reference.

**Constraints:**

- No translation — the actor returns the original spoken language only.
- YouTube may rate-limit very large batches (100+ videos). The automatic proxy escalation handles most cases transparently.
- `maxResults` default of 10 is intentionally conservative — increase it for large playlists or full channel archives.

***

### Memory

This actor runs with a fixed **4 GB** allocation. No configuration needed — the same setting works for both native caption extraction (lightweight) and AI transcription (which loads the faster-whisper speech model into memory).

***

### Frequently asked questions

#### How much does one video cost?

A single video with captions costs approximately **$0.006** on the Free plan — $0.001 for the transcript plus a $0.005 one-time startup fee per run. A second video in the same run adds just $0.001. AI transcription adds $0.012 per minute of audio on the Free plan.

#### What happens if a video has no captions?

The actor automatically downloads the audio and transcribes it using the built-in AI model — the output has the same structure as a native caption result, with `is_ai_generated: true`. If the `maxAiMinutes` cap is reached, remaining caption-free videos receive an `AI_MINUTES_LIMIT_REACHED` error item and the run continues. The `available_languages` field lists the caption language codes YouTube does provide on the video.

#### Does it work for playlists and channels?

Yes. Paste a playlist or channel URL and the actor expands it into individual videos automatically. Use `maxResults` to cap how many are fetched. If one video is private, age-restricted, or unavailable, it gets an error item while the rest continue.

#### What languages are supported?

**Native captions:** Any language YouTube provides captions for — typically 100+ languages for auto-generated captions. Pass multiple language codes (e.g. `["en", "es"]`) to fall back automatically when your first choice is unavailable.

**AI transcription:** 99 languages, including English, Spanish, French, German, Portuguese, Japanese, Chinese, Arabic, and Hindi.

#### What output formats are available?

- **JSON** — timestamped segments as an array `[{start, end, text}, ...]`
- **Text** — plain text joined from all segments
- **LLM** — text with `[Music]`, `(laughter)`, and other filler tokens stripped, ready for AI pipelines
- **SRT** — standard subtitle format for video players and editing software
- **VTT** — WebVTT format for HTML5 `<video>` elements

Multiple formats can be requested in a single run.

#### Can I set a spending limit?

Yes. **Max AI minutes per run** caps total AI-transcribed minutes per run (default: 30). **Skip AI for long videos** skips videos exceeding a duration threshold automatically. Set the AI minutes cap to 0 for unlimited AI transcription.

#### How accurate is AI transcription?

Accurate for clear speech in widely spoken languages. Accuracy degrades with heavy accents, domain-specific jargon, or poor audio quality. The `language_probability` field indicates the model's confidence in the detected language. For quality-critical work, treat AI transcripts as a first draft and review them.

#### What is a YouTube transcript scraper?

A YouTube transcript scraper extracts the spoken text from YouTube videos. This actor retrieves captions when YouTube provides them, or generates a transcript from the audio when captions are unavailable.

#### Does this translate transcripts?

No. The actor returns the original spoken language. Use a separate translation service for translation.

#### Does it work for YouTube Shorts?

Yes. Shorts use the same caption infrastructure as regular videos.

#### Do I need a YouTube Data API key?

No. The actor accesses publicly available caption data without any YouTube API credentials.

#### How does this compare to the YouTube Data API?

The YouTube Data API v3 does not provide transcript data. It requires a Google Cloud project, OAuth credentials, and per-day quotas. This actor requires none of that.

#### How does this compare to the `youtube-transcript-api` Python library?

The [`youtube-transcript-api`](https://github.com/jdepoix/youtube-transcript-api) library is fine for a handful of videos in your own Python script. This actor adds cloud infrastructure, batch processing across playlists and channels, AI transcription for caption-free videos, multiple output formats, scheduling, and Apify platform integrations (webhooks, REST API, n8n, Make, Zapier).

#### What do the run log messages mean?

Open the **Log** tab on any completed run to see what the actor did. Here are the messages you may encounter:

| Message | What it means | Action needed? |
|---|---|---|
| `Processing video: https://...` | Normal progress — one line per video | None |
| `Expanding URL: https://...` | Resolving a playlist or channel to individual videos | None |
| `Total unique videos to process: N` | How many videos were found after deduplication | None |
| `Loading AI transcription model...` | AI model is loading — only happens once per run | None |
| `AI transcription model ready.` | Model loaded, ready to transcribe | None |
| `AI transcription language: set to 'en'` | Language was set by your `forceWhisperLanguage` input | None |
| `AI transcription language: auto-detecting` | Language will be detected from the audio | None |
| `Downloading audio for AI transcription: https://...` | Audio is being downloaded for a caption-free video — normal progress | None |
| `Running AI transcription...` | AI model is actively processing the audio | None |
| `No subtitles found. Running AI fallback for ... (N min estimated)` | AI transcription is starting for a caption-free video | None |
| `AI transcription complete — language: en (confidence: 99%)` | AI transcription finished for one video | None |
| `YouTube access challenge for ... — retrying via proxy tier 1/2...` | YouTube challenged the request; escalating through proxy tiers | None — handled automatically |
| `YouTube access challenge for ... — no proxy tiers available` | Same challenge but no proxy could be created | Check Apify proxy service status |
| `Subtitle fetch failed for ... — retrying via proxy tier 1/2...` | Subtitle download failed; escalating through proxy tiers | None — handled automatically |
| `Subtitle fetch failed for ... (lang): HTTP 429` | Subtitle download failed after all retries (rate-limited) | Try again later or reduce batch size. The actor continues with other videos. |
| `YouTube access challenge on audio download ... — retrying via proxy tier 1/2...` | Audio download challenged; escalating through proxy tiers | None — handled automatically |
| `Audio download failed ... after exhausting all N proxy tiers.` | All proxy tiers failed for audio download | Try again later. The video gets an `AI_TRANSCRIPTION_FAILED` item. |
| `Skipping AI fallback for ...: needs N min but only Y remain` | The `maxAiMinutes` cap was reached for this run | Raise `maxAiMinutes` if you want to transcribe more |
| `Apify spending limit reached. No further AI charges will be made.` | Your Apify account spending limit was hit | Check your Apify billing settings |
| `Audio download failed for ...: <error>` | Could not download the audio for AI transcription | Check the error detail; that video gets an `AI_TRANSCRIPTION_FAILED` item in the dataset |
| `AI fallback failed for ...: <error>` | AI transcription error for this video | Check the error detail; the video gets an error item in the dataset |
| `Unhandled error for ...: <error>` | Unexpected failure — the video gets an `UNEXPECTED_ERROR` item | Open an issue if this happens repeatedly |

Error items written to the dataset always have an `error_code` field — use that for programmatic filtering rather than parsing log text.

#### Is it legal to extract YouTube transcripts?

YouTube's Terms of Service prohibit automated scraping, and you are responsible for complying with their Terms and applicable law in your jurisdiction. This actor accesses only publicly available caption data — the same data visible when you click "Open transcript" in the YouTube player. It does not bypass any authentication, access private content, or collect personal user data. See [Apify's web scraping legality guide](https://apify.com/blog/is-web-scraping-legal) for a broader overview.

***

### Language Reference

#### YouTube caption languages (130+ codes)

Use these codes in the **Caption languages** (`languages`) input. Regional variants such as `zh-TW` and `zh-CN` are also accepted where YouTube differentiates them.

| Code | Language | Code | Language | Code | Language |
|------|----------|------|----------|------|----------|
| `af` | Afrikaans | `ak` | Akan | `sq` | Albanian |
| `am` | Amharic | `ar` | Arabic | `hy` | Armenian |
| `as` | Assamese | `ay` | Aymara | `az` | Azerbaijani |
| `bn` | Bangla | `eu` | Basque | `be` | Belarusian |
| `bho` | Bhojpuri | `bs` | Bosnian | `bg` | Bulgarian |
| `my` | Burmese | `ca` | Catalan | `ceb` | Cebuano |
| `zh` | Chinese | `zh-CN` | Chinese (China) | `zh-HK` | Chinese (Hong Kong) |
| `zh-SG` | Chinese (Singapore) | `zh-TW` | Chinese (Taiwan) | `zh-Hans` | Chinese (Simplified) |
| `zh-Hant` | Chinese (Traditional) | `co` | Corsican | `hr` | Croatian |
| `cs` | Czech | `da` | Danish | `dv` | Divehi |
| `nl` | Dutch | `en` | English | `en-US` | English (United States) |
| `eo` | Esperanto | `et` | Estonian | `ee` | Ewe |
| `fil` | Filipino | `fi` | Finnish | `fr` | French |
| `gl` | Galician | `lg` | Ganda | `ka` | Georgian |
| `de` | German | `el` | Greek | `gn` | Guarani |
| `gu` | Gujarati | `ht` | Haitian Creole | `ha` | Hausa |
| `haw` | Hawaiian | `iw` | Hebrew | `hi` | Hindi |
| `hmn` | Hmong | `hu` | Hungarian | `is` | Icelandic |
| `ig` | Igbo | `id` | Indonesian | `ga` | Irish |
| `it` | Italian | `ja` | Japanese | `jv` | Javanese |
| `kn` | Kannada | `kk` | Kazakh | `km` | Khmer |
| `rw` | Kinyarwanda | `ko` | Korean | `kri` | Krio |
| `ku` | Kurdish | `ky` | Kyrgyz | `lo` | Lao |
| `la` | Latin | `lv` | Latvian | `ln` | Lingala |
| `lt` | Lithuanian | `lb` | Luxembourgish | `mk` | Macedonian |
| `mg` | Malagasy | `ms` | Malay | `ml` | Malayalam |
| `mt` | Maltese | `mi` | Māori | `mr` | Marathi |
| `mn` | Mongolian | `ne` | Nepali | `nso` | Northern Sotho |
| `no` | Norwegian | `ny` | Nyanja | `or` | Odia |
| `om` | Oromo | `ps` | Pashto | `fa` | Persian |
| `pl` | Polish | `pt` | Portuguese | `pa` | Punjabi |
| `qu` | Quechua | `ro` | Romanian | `ru` | Russian |
| `sm` | Samoan | `sa` | Sanskrit | `gd` | Scottish Gaelic |
| `sr` | Serbian | `sn` | Shona | `sd` | Sindhi |
| `si` | Sinhala | `sk` | Slovak | `sl` | Slovenian |
| `so` | Somali | `st` | Southern Sotho | `es` | Spanish |
| `su` | Sundanese | `sw` | Swahili | `sv` | Swedish |
| `tg` | Tajik | `ta` | Tamil | `tt` | Tatar |
| `te` | Telugu | `th` | Thai | `ti` | Tigrinya |
| `ts` | Tsonga | `tr` | Turkish | `tk` | Turkmen |
| `uk` | Ukrainian | `ur` | Urdu | `ug` | Uyghur |
| `uz` | Uzbek | `vi` | Vietnamese | `cy` | Welsh |
| `fy` | Western Frisian | `xh` | Xhosa | `yi` | Yiddish |
| `yo` | Yoruba | `zu` | Zulu | | |

> Not all codes will have captions on every video. When a requested code is not available, the actor returns a `LANGUAGE_NOT_FOUND` or `NO_CAPTIONS_AVAILABLE` error item with an `available_languages` field listing the codes that are actually present on that video.

#### AI transcription languages (99 codes)

Use one of these codes in the **AI transcription language** (`forceWhisperLanguage`) input to skip auto-detection. If a language is not in this list, the AI model cannot transcribe it — use auto-detect instead.

| Code | Language | Code | Language | Code | Language |
|------|----------|------|----------|------|----------|
| `af` | Afrikaans | `am` | Amharic | `ar` | Arabic |
| `as` | Assamese | `az` | Azerbaijani | `ba` | Bashkir |
| `be` | Belarusian | `bg` | Bulgarian | `bn` | Bengali |
| `bo` | Tibetan | `br` | Breton | `bs` | Bosnian |
| `ca` | Catalan | `cs` | Czech | `cy` | Welsh |
| `da` | Danish | `de` | German | `el` | Greek |
| `en` | English | `es` | Spanish | `et` | Estonian |
| `eu` | Basque | `fa` | Persian | `fi` | Finnish |
| `fo` | Faroese | `fr` | French | `gl` | Galician |
| `gu` | Gujarati | `ha` | Hausa | `haw` | Hawaiian |
| `he` | Hebrew | `hi` | Hindi | `hr` | Croatian |
| `ht` | Haitian Creole | `hu` | Hungarian | `hy` | Armenian |
| `id` | Indonesian | `is` | Icelandic | `it` | Italian |
| `ja` | Japanese | `jw` | Javanese | `ka` | Georgian |
| `kk` | Kazakh | `km` | Khmer | `kn` | Kannada |
| `ko` | Korean | `la` | Latin | `lb` | Luxembourgish |
| `ln` | Lingala | `lo` | Lao | `lt` | Lithuanian |
| `lv` | Latvian | `mg` | Malagasy | `mi` | Māori |
| `mk` | Macedonian | `ml` | Malayalam | `mn` | Mongolian |
| `mr` | Marathi | `ms` | Malay | `mt` | Maltese |
| `my` | Burmese | `ne` | Nepali | `nl` | Dutch |
| `nn` | Nynorsk | `no` | Norwegian | `oc` | Occitan |
| `pa` | Punjabi | `pl` | Polish | `ps` | Pashto |
| `pt` | Portuguese | `ro` | Romanian | `ru` | Russian |
| `sa` | Sanskrit | `sd` | Sindhi | `si` | Sinhala |
| `sk` | Slovak | `sl` | Slovenian | `sn` | Shona |
| `so` | Somali | `sq` | Albanian | `sr` | Serbian |
| `su` | Sundanese | `sv` | Swedish | `sw` | Swahili |
| `ta` | Tamil | `te` | Telugu | `tg` | Tajik |
| `th` | Thai | `tl` | Filipino | `tr` | Turkish |
| `tt` | Tatar | `uk` | Ukrainian | `ur` | Urdu |
| `uz` | Uzbek | `vi` | Vietnamese | `yi` | Yiddish |
| `yo` | Yoruba | `yue` | Cantonese | `zh` | Chinese |

> The AI model supports 99 languages. Accuracy varies by language — it is highest for widely spoken languages (English, Spanish, French, German, etc.) and may degrade for low-resource languages. The `language_probability` field in the output indicates the model's confidence in the detected or forced language.

***

### About this actor

This actor runs on the Apify platform. AI transcription uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (MIT license), bundled into the Docker image so there is no model download delay on first run.

Found a bug or have a feature request? Use the **Issues** tab on this actor's page.

# Actor input Schema

## `startUrls` (type: `array`):

YouTube video, playlist, or channel URLs to process. Supports watch URLs, short youtu.be links, playlist URLs, and channel URLs. The prefilled URL is a working demo — click Start to try it.

## `maxResults` (type: `integer`):

Maximum number of videos to process from a playlist or channel. Single video URLs ignore this setting. For channels, videos are returned newest first.

## `languages` (type: `array`):

Caption languages to try, in order of preference. The first match on the video is used. Add multiple codes to fall back automatically. Common: en, es, fr, de, pt, ja, zh, ko, ar, hi. See the README for all 130+ supported codes.

## `subType` (type: `string`):

Which type of YouTube captions to use. Manual = human-written or uploaded by the creator. Auto = YouTube's machine-generated captions. Both (prefer manual) tries manual first, then auto. Most videos only have auto-generated captions.

## `outputFormats` (type: `array`):

Choose which transcript formats to include. Plain Text is simplest — one block of words. JSON includes timestamps for each segment. LLM-Ready strips \[Music], (laughter), and other filler — ideal for AI pipelines. SRT and VTT are standard subtitle formats for video editors and players.

## `wordLevel` (type: `boolean`):

Add per-word start and end times inside each JSON segment. Only available for auto-generated captions and AI transcriptions — not manual captions.

## `forceWhisperLanguage` (type: `string`):

Force the AI model to a specific language, skipping auto-detection (~20% faster). Use when you know the channel's language. Leave empty to auto-detect. Not the same as Caption Languages — that selects YouTube's existing captions. See the README for all 99 supported language codes.

## `maxAiMinutes` (type: `integer`):

Hard cap on total AI transcription minutes for this run. When the cap is reached, remaining videos without captions are skipped with an error — the run continues for other videos. Set to 0 for unlimited. Recommended: set a cap when processing an unknown playlist or channel to avoid unexpected costs.

## `skipAiFallbackIfLongerThan` (type: `integer`):

Skip AI transcription for any video longer than this many minutes. Useful when a channel mixes short clips with long recordings and you only want the shorter ones. Set to 0 to disable.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    }
  ],
  "maxResults": 10,
  "languages": [
    "en"
  ],
  "subType": "both",
  "outputFormats": [
    "json",
    "text",
    "llm"
  ],
  "wordLevel": false,
  "maxAiMinutes": 30,
  "skipAiFallbackIfLongerThan": 0
}
```

# Actor output Schema

## `dataset` (type: `string`):

One item per processed video, containing transcript text (in requested formats), video metadata, and an error\_code field if extraction failed.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("codepoetry/youtube-transcript-ai-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "startUrls": [{ "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" }] }

# Run the Actor and wait for it to finish
run = client.actor("codepoetry/youtube-transcript-ai-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    }
  ]
}' |
apify call codepoetry/youtube-transcript-ai-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=codepoetry/youtube-transcript-ai-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "YouTube Transcript Scraper + Whisper AI Fallback",
        "description": "Extract YouTube transcripts from any video — even without captions. Whisper AI fallback, LLM-ready output, SRT/VTT export. No API key. $0.001/video.",
        "version": "1.0",
        "x-build-id": "8ef9DZXchSY32Li9K"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/codepoetry~youtube-transcript-ai-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-codepoetry-youtube-transcript-ai-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/codepoetry~youtube-transcript-ai-scraper/runs": {
            "post": {
                "operationId": "runs-sync-codepoetry-youtube-transcript-ai-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/codepoetry~youtube-transcript-ai-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-codepoetry-youtube-transcript-ai-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "YouTube URLs",
                        "type": "array",
                        "description": "YouTube video, playlist, or channel URLs to process. Supports watch URLs, short youtu.be links, playlist URLs, and channel URLs. The prefilled URL is a working demo — click Start to try it.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxResults": {
                        "title": "Max videos per playlist/channel",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum number of videos to process from a playlist or channel. Single video URLs ignore this setting. For channels, videos are returned newest first.",
                        "default": 10
                    },
                    "languages": {
                        "title": "Caption languages",
                        "type": "array",
                        "description": "Caption languages to try, in order of preference. The first match on the video is used. Add multiple codes to fall back automatically. Common: en, es, fr, de, pt, ja, zh, ko, ar, hi. See the README for all 130+ supported codes.",
                        "default": [
                            "en"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "subType": {
                        "title": "Caption source",
                        "enum": [
                            "manual",
                            "auto",
                            "both"
                        ],
                        "type": "string",
                        "description": "Which type of YouTube captions to use. Manual = human-written or uploaded by the creator. Auto = YouTube's machine-generated captions. Both (prefer manual) tries manual first, then auto. Most videos only have auto-generated captions.",
                        "default": "both"
                    },
                    "outputFormats": {
                        "title": "Output formats",
                        "type": "array",
                        "description": "Choose which transcript formats to include. Plain Text is simplest — one block of words. JSON includes timestamps for each segment. LLM-Ready strips [Music], (laughter), and other filler — ideal for AI pipelines. SRT and VTT are standard subtitle formats for video editors and players.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "json",
                                "text",
                                "llm",
                                "srt",
                                "vtt"
                            ],
                            "enumTitles": [
                                "JSON (timestamped segments)",
                                "Plain Text",
                                "LLM-Ready (filler stripped)",
                                "SRT (subtitle file)",
                                "VTT (WebVTT)"
                            ]
                        },
                        "default": [
                            "json",
                            "text",
                            "llm"
                        ]
                    },
                    "wordLevel": {
                        "title": "Word-level timestamps",
                        "type": "boolean",
                        "description": "Add per-word start and end times inside each JSON segment. Only available for auto-generated captions and AI transcriptions — not manual captions.",
                        "default": false
                    },
                    "forceWhisperLanguage": {
                        "title": "AI transcription language (optional)",
                        "type": "string",
                        "description": "Force the AI model to a specific language, skipping auto-detection (~20% faster). Use when you know the channel's language. Leave empty to auto-detect. Not the same as Caption Languages — that selects YouTube's existing captions. See the README for all 99 supported language codes."
                    },
                    "maxAiMinutes": {
                        "title": "Max AI minutes per run",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Hard cap on total AI transcription minutes for this run. When the cap is reached, remaining videos without captions are skipped with an error — the run continues for other videos. Set to 0 for unlimited. Recommended: set a cap when processing an unknown playlist or channel to avoid unexpected costs.",
                        "default": 30
                    },
                    "skipAiFallbackIfLongerThan": {
                        "title": "Skip AI for long videos (minutes)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Skip AI transcription for any video longer than this many minutes. Useful when a channel mixes short clips with long recordings and you only want the shorter ones. Set to 0 to disable.",
                        "default": 0
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
