# YouTube Media & Transcript Extractor Pro (`1byte/youtube-media-extractor`) Actor

The most robust, high-speed, and feature-rich YouTube scraper on the Apify platform. Designed for AI researchers, data scientists, and content automation workflows, this Actor extracts everything from raw 4K video and high-fidelity audio to structured transcripts and deep metadata.

- **URL**: https://apify.com/1byte/youtube-media-extractor.md
- **Developed by:** [A J](https://apify.com/1byte) (community)
- **Categories:** AI, Social media, Videos
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 🎥 Premium YouTube Data & Media Extractor

The most robust, high-speed, and feature-rich YouTube scraper on the Apify platform. Designed for AI researchers, data scientists, and content automation workflows, this Actor extracts everything from raw 4K video and high-fidelity audio to structured transcripts and deep metadata.

---

### 🌟 Why this Extractor?

Market-leading scrapers often struggle with YouTube's evolving bot detection or offer thin metadata. Our **Premium Extractor** is built on a high-concurrency architecture with built-in bot bypass and smart proxy routing, ensuring you get the data you need at scale.

#### Key Features:
- ⚡ **High Concurrency**: Process hundreds of videos in minutes using optimized async workers.
- 🤖 **Bot-Bypass Pro**: Integrated with modern PO-Token providers to handle YouTube's latest security layers.
- 💸 **Smart Proxy Fallback**: Intelligently detects blocks and only uses expensive proxies when strictly necessary, saving you up to 80% on compute costs.
- 📝 **Dual Transcript Extraction**: Pulls full text transcripts directly into the Dataset AND uploads raw SRT/VTT files to the Key-Value Store.
- 🔗 **Direct Streaming URLs**: Extract signed, time-limited streaming URLs for instant playback without the overhead of downloading files.
- 🎞️ **High Quality Formats**: Supporting up to 1080p MP4 video and 192kbps MP3 audio extraction.

---

### 🛠️ How to Use

1. **Input**: Provide a list of YouTube URLs (Videos, Shorts, or Playlists).
2. **Select Mode**: Choose between `video_mp4`, `audio_mp3`, `transcript_only`, or `direct_signed_urls`.
3. **Set Limits**: Use `maxItems` and `maxPlaylistItems` to control your budget and run duration.
4. **Proxy (Optional)**: Enable `useSmartFallback` to automatically route around throttled IPs.

---

### 📊 Rich JSON Output Example

Each result in your dataset includes comprehensive metadata tailored for NLP and analysis:

```json
{
  "sourceUrl": "https://www.youtube.com/watch?v=aqz-KE-bpKQ",
  "title": "Big Buck Bunny 60fps 4K - Official Blender Foundation Short Film",
  "channelName": "Blender",
  "viewCount": 25489632,
  "duration": 596,
  "status": "success",
  "mode": "video_mp4",
  "downloadUrl": "https://api.apify.com/v2/key-value-stores/example-store-id/records/aqz-KE-bpKQ.mp4",
  "transcriptText": "[Music]\nHello world, this is a transcript example...",
  "transcriptDownloadUrl": "https://api.apify.com/v2/key-value-stores/example-store-id/records/aqz-KE-bpKQ_transcript.en.vtt",
  "metadata": {
    "id": "aqz-KE-bpKQ",
    "uploadDate": "20100528",
    "isLive": false
  }
}
````

***

### 💰 Cost Estimation

This Actor is highly optimized for performance:

- **Metadata Only**: ~0.01 CU per 100 items.
- **Transcripts**: ~0.05 CU per 100 items.
- **Media Download**: Depends on file size and proxy usage. Typically ~0.2 CU per GB.

***

### ⚖️ License & Disclaimer

This tool is for personal and research use. Please respect YouTube's Terms of Service and only scrape content you have permission to access. We do not host or store any media content on our servers.

***

### 📬 Support

Need a custom feature or high-volume enterprise support? Visit our [Issues](https://github.com/your-repo/issues) tab or contact the developer via the Apify Console.

# Actor input Schema

## `startUrls` (type: `array`):

YouTube video or playlist URLs to extract media and data from.

## `mode` (type: `string`):

Select the type of media or data you want to extract.

## `quality` (type: `string`):

Select the preferred video quality.

## `proxyConfiguration` (type: `object`):

Highly recommended for high-volume extraction to avoid getting blocked by YouTube.

## `useSmartFallback` (type: `boolean`):

Attempt download without proxy first to save costs. Auto-retry with proxy if YouTube blocks the request.

## `maxPlaylistItems` (type: `integer`):

Maximum number of videos to extract from each playlist URL provided.

## `maxItems` (type: `integer`):

Maximum number of total videos to extract across the entire run.

## `cookiesBase64` (type: `string`):

Optional: Base64 encoded cookies file for accessing age-restricted or private videos.

## `cookieText` (type: `string`):

Optional: Raw Netscape cookie format text. Useful for bypassing 'Sign in to confirm you're not a bot'.

## `useProxy` (type: `boolean`):

If enabled, every request will go through Apify Proxy from the start. Recommended if you're frequently blocked.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://www.youtube.com/watch?v=aqz-KE-bpKQ"
    }
  ],
  "mode": "video_mp4",
  "quality": "best",
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "useSmartFallback": true,
  "maxPlaylistItems": 5,
  "maxItems": 50,
  "useProxy": false
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://www.youtube.com/watch?v=aqz-KE-bpKQ"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("1byte/youtube-media-extractor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "startUrls": [{ "url": "https://www.youtube.com/watch?v=aqz-KE-bpKQ" }] }

# Run the Actor and wait for it to finish
run = client.actor("1byte/youtube-media-extractor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://www.youtube.com/watch?v=aqz-KE-bpKQ"
    }
  ]
}' |
apify call 1byte/youtube-media-extractor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=1byte/youtube-media-extractor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "YouTube Media & Transcript Extractor Pro",
        "description": "The most robust, high-speed, and feature-rich YouTube scraper on the Apify platform. Designed for AI researchers, data scientists, and content automation workflows, this Actor extracts everything from raw 4K video and high-fidelity audio to structured transcripts and deep metadata.",
        "version": "1.0",
        "x-build-id": "gsOWlOp2wP2CJOiog"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/1byte~youtube-media-extractor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-1byte-youtube-media-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/1byte~youtube-media-extractor/runs": {
            "post": {
                "operationId": "runs-sync-1byte-youtube-media-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/1byte~youtube-media-extractor/run-sync": {
            "post": {
                "operationId": "run-sync-1byte-youtube-media-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "YouTube video or playlist URLs to extract media and data from.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "mode": {
                        "title": "Extraction Mode",
                        "enum": [
                            "video_mp4",
                            "audio_mp3",
                            "transcript_and_metadata_only",
                            "direct_signed_urls"
                        ],
                        "type": "string",
                        "description": "Select the type of media or data you want to extract.",
                        "default": "video_mp4"
                    },
                    "quality": {
                        "title": "Preferred Quality",
                        "enum": [
                            "best",
                            "1080p",
                            "720p",
                            "worst"
                        ],
                        "type": "string",
                        "description": "Select the preferred video quality.",
                        "default": "best"
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Highly recommended for high-volume extraction to avoid getting blocked by YouTube.",
                        "default": {
                            "useApifyProxy": true
                        }
                    },
                    "useSmartFallback": {
                        "title": "Use Smart Proxy Fallback",
                        "type": "boolean",
                        "description": "Attempt download without proxy first to save costs. Auto-retry with proxy if YouTube blocks the request.",
                        "default": true
                    },
                    "maxPlaylistItems": {
                        "title": "Max Items per Playlist",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum number of videos to extract from each playlist URL provided.",
                        "default": 5
                    },
                    "maxItems": {
                        "title": "Max Total Items",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum number of total videos to extract across the entire run.",
                        "default": 50
                    },
                    "cookiesBase64": {
                        "title": "Cookies (Base64)",
                        "type": "string",
                        "description": "Optional: Base64 encoded cookies file for accessing age-restricted or private videos."
                    },
                    "cookieText": {
                        "title": "Cookies (Text)",
                        "type": "string",
                        "description": "Optional: Raw Netscape cookie format text. Useful for bypassing 'Sign in to confirm you're not a bot'."
                    },
                    "useProxy": {
                        "title": "Always Use Proxy",
                        "type": "boolean",
                        "description": "If enabled, every request will go through Apify Proxy from the start. Recommended if you're frequently blocked.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
