YouTube All-in-One Downloader & Scraper avatar

YouTube All-in-One Downloader & Scraper

Pricing

from $100.00 / 1,000 video downloads

Go to Apify Store
YouTube All-in-One Downloader & Scraper

YouTube All-in-One Downloader & Scraper

Download YouTube videos, Shorts, playlists, and channels as MP4. Up to 10 concurrent downloads with no browser needed. Extract comments, captions, and rich metadata. Metadata-only mode for fast, cheap research. Quality selection with automatic fallback. From $0.10/video.

Pricing

from $100.00 / 1,000 video downloads

Rating

0.0

(0)

Developer

Juyeop Park

Juyeop Park

Maintained by Community

Actor stats

0

Bookmarked

17

Total users

10

Monthly active users

5 hours ago

Last modified

Share

YouTube Data Pipeline for AI/ML Teams

YouTube transcripts, metadata, and downloads in one API call -- built for AI pipelines.

$0.005/transcript | $0.005/metadata | $0.10/video download | No monthly fee | No quota limits | LLM-ready output mode

Extract YouTube transcripts for RAG, embeddings, and fine-tuning. Get structured metadata for research and analytics. Download video files for content analysis. All through a fast, concurrent API with no browser required.

Why AI/ML Teams Use This

  • LLM-ready transcripts at $0.005/video -- Clean plaintext transcripts ready to feed into OpenAI, Anthropic, or any embedding model. No parsing needed.
  • No YouTube API quota limits -- YouTube Data API caps you at 10,000 units/day and doesn't even provide transcripts. This actor has no daily quota.
  • Structured JSON output -- Every field is typed and consistent. Drop results directly into vector DBs (Pinecone, Weaviate, Chroma) or data warehouses.
  • Batch processing at scale -- Process playlists, channels, or search results. Up to 10 concurrent extractions. Feed entire YouTube channels into your training pipeline.

Cost Comparison

1,000 YouTube transcripts for your RAG pipeline:

ApproachCostEffort
YouTube Data APICan't get transcripts10K units/day quota limit
youtube-transcript-api + your server$5-15/month (server costs)Setup, maintenance, IP bans
This Actor (LLM-ready mode)$5.00 totalZero infrastructure, zero maintenance

Pricing Tiers

TierPriceBest For
Transcript / Metadata extraction (outputFormat: "llm_ready" or downloadVideo: false)$0.005/videoRAG, embeddings, LLM training, research, analytics
Video download$0.10/videoArchiving, content analysis, multimodal AI

Plus Apify platform fee (~$0.25-0.50/1,000 videos for compute). Proxy enabled by default -- RESIDENTIAL proxy recommended ($5-6/GB data transfer). Apify Free plan includes $5/month in platform credits.

Quick Start

  1. Click "Try for free" on this page
  2. Paste YouTube URLs into startUrls
  3. Click Start -- structured JSON output appears in seconds
{
"startUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
"outputFormat": "llm_ready"
}

That's it. The output contains clean transcript text, metadata, and everything you need to feed into your AI pipeline.

LLM-Ready Output Mode

Set outputFormat: "llm_ready" to get transcripts optimized for AI/ML workflows. This mode automatically enables transcript extraction, disables video download, and charges $0.005/video (same as metadata extraction).

What you get:

{
"videoId": "dQw4w9WgXcQ",
"title": "Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)",
"channelName": "Rick Astley",
"channelUrl": "http://www.youtube.com/@RickAstleyYT",
"transcript": "We're no strangers to love You know the rules and so do I A full commitment's what I'm thinking of...",
"wordCount": 427,
"language": "English",
"languageCode": "en",
"sourceUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"duration": "3:33",
"durationSeconds": 213,
"viewCount": 1751798914,
"uploadDate": "Oct 25, 2009",
"category": "Music",
"tags": [],
"description": "The official video for 'Never Gonna Give You Up' by Rick Astley..."
}

The transcript field contains the full transcript as clean plaintext -- ready to chunk and embed. The wordCount field gives you the token estimate for chunking strategies.

Integration Examples

Python + OpenAI Embeddings

from apify_client import ApifyClient
import openai
client = ApifyClient("YOUR_API_TOKEN")
# Extract transcripts for RAG
run = client.actor("jy-labs/youtube-all-in-one-downloader-scraper").call(run_input={
"startUrls": ["https://www.youtube.com/watch?v=VIDEO_ID"],
"outputFormat": "llm_ready",
"proxyConfiguration": {"useApifyProxy": True, "apifyProxyGroups": ["RESIDENTIAL"]},
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
transcript = item["transcript"]
if transcript:
# Feed to OpenAI embeddings
embedding = openai.embeddings.create(
model="text-embedding-3-small",
input=transcript
)
# Store in your vector DB (Pinecone, Weaviate, Chroma, etc.)
print(f"Embedded: {item['title']} ({item['wordCount']} words)")

Python -- Batch Transcripts for Fine-Tuning

from apify_client import ApifyClient
import json
client = ApifyClient("YOUR_API_TOKEN")
# Extract transcripts from an entire playlist
run = client.actor("jy-labs/youtube-all-in-one-downloader-scraper").call(run_input={
"startUrls": ["https://www.youtube.com/playlist?list=YOUR_PLAYLIST_ID"],
"outputFormat": "llm_ready",
"maxVideos": 100,
"captionLanguage": "en",
"proxyConfiguration": {"useApifyProxy": True, "apifyProxyGroups": ["RESIDENTIAL"]},
})
# Save as JSONL for fine-tuning
with open("training_data.jsonl", "w") as f:
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
if item.get("transcript"):
f.write(json.dumps({
"text": item["transcript"],
"metadata": {
"title": item["title"],
"channel": item["channelName"],
"video_id": item["videoId"],
"word_count": item["wordCount"],
}
}) + "\n")

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('jy-labs/youtube-all-in-one-downloader-scraper').call({
startUrls: ['https://www.youtube.com/watch?v=VIDEO_ID'],
outputFormat: 'llm_ready',
proxyConfiguration: { useApifyProxy: true, apifyProxyGroups: ['RESIDENTIAL'] },
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
console.log(`${item.title}: ${item.wordCount} words`);
console.log(`Transcript: ${item.transcript.substring(0, 100)}...`);
// Feed item.transcript to your embedding pipeline...
}

cURL (REST API)

# Start a run
curl "https://api.apify.com/v2/acts/jy-labs~youtube-all-in-one-downloader-scraper/runs?token=YOUR_API_TOKEN" \
-X POST \
-H "Content-Type: application/json" \
-d '{
"startUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
"outputFormat": "llm_ready",
"proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }
}'
# Fetch results (after run completes)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN"

vs YouTube Data API

This ActorYouTube Data API
Daily quotaNo limit10,000 units/day
TranscriptsBuilt-inNot available
CommentsBuilt-in with repliesAvailable (costs quota)
Video downloadBuilt-inNot available
Auth requiredNoYes (OAuth/API key)
Output formatStructured JSON, LLM-ready modeNested JSON with pagination tokens
Setup time~1 minute~30 minutes (GCP project + OAuth)

vs Competing Apify Actors (Streamers)

This ActorStreamers ($30/mo actors)
Pricing modelPay per video, no monthly fee$30/month subscription + usage
Transcript extraction$0.005/videoNot available as separate tier
Metadata only$0.005/videoNot available (pay full price)
LLM-ready outputBuilt-inNot available
Video download$0.10/videoRequires $30/mo subscription
SRT/VTT file downloadBuilt-inNot available
Chapter extractionBuilt-inNot available
Channel info (subscribers)Built-inNot available
Comment repliesBuilt-inNot available
Thumbnail downloadBuilt-inNot available
Webhook notificationBuilt-inNot available
YouTube searchBuilt-inNot available
Smart error classification5 categories + adaptive retryBasic

Features Overview

Data Extraction

  • Transcripts/captions -- Full plaintext transcripts in 100+ languages. LLM-ready mode returns clean text optimized for chunking and embedding.
  • Caption file download (SRT/VTT) -- Download subtitle files in industry-standard SRT or VTT format for video annotation or training data.
  • Rich metadata -- Title, description, channel, views, likes, duration, upload date, tags, thumbnail URL, chapters, category, and more.
  • Comments with replies -- Top comments with author, likes, timestamps, and full reply threads. Up to 500 comments per video.
  • Channel info -- Subscriber count, description, banner URL, total video count, and join date.
  • Related videos -- Up to 20 related video suggestions plus video category.
  • Chapter markers -- Auto-extracted chapter titles and timestamps for content segmentation.

Processing

  • Concurrent processing -- Up to 10 videos in parallel for fast batch extraction.
  • Playlist and channel expansion -- Automatically discovers and processes all videos from playlists and channels.
  • YouTube search -- Search by keywords with searchQueries to collect videos without manual URL hunting.
  • Quality selection -- Choose from highest, 1080p, 720p, 480p, 360p, or audio-only (M4A).
  • Automatic quality fallback -- If requested resolution is unavailable, selects the nearest lower quality.
  • Retry with exponential backoff -- Automatic retries with proxy rotation for reliable large-batch runs.
  • Smart error classification -- Five error categories (fatal, retryable, rate_limited, geo_blocked, age_restricted) with adaptive retry logic.

Output and Integration

  • LLM-ready mode -- Set outputFormat: "llm_ready" for transcripts optimized for AI pipelines at $0.005/video.
  • Metadata-only mode -- Set downloadVideo: false for fast, cheap metadata extraction at $0.005/video.
  • Configurable output -- Set extractMetadata: false for lightweight output (sourceUrl + downloadUrl only).
  • Thumbnail download -- Highest-quality thumbnail image stored in Key-Value Store.
  • Custom filename template -- Name files using placeholders: {videoId}, {title}, {quality}, {channelName}, {date}, {type}.
  • Webhook notification -- POST notification to any URL when processing completes.
  • Full REST API -- Call programmatically from Python, Node.js, cURL, or any HTTP client.

Input Parameters

ParameterTypeDefaultDescription
startUrlsstring[](optional)YouTube URLs to process. Supports videos, Shorts, playlists, and channels. At least one of startUrls or searchQueries is required.
outputFormatstring"default"Output mode: "default" or "llm_ready". LLM-ready mode auto-enables transcripts, disables video download, and charges $0.005/video.
qualitystring"highest"Video quality: highest, 1080p, 720p, 480p, 360p, or audio_only
downloadVideobooleantrueDownload the video file. Set to false for metadata-only mode ($0.005/video).
maxConcurrencyinteger4Parallel processing (1--10). Higher values = faster but more memory.
maxRequestRetriesinteger3Retry attempts per video before marking as failed (0--10).
includeFailedVideosbooleanfalseInclude failed videos in output with error details for debugging.
extractMetadatabooleantrueInclude full metadata. Auto-enabled when captions, comments, or downloadVideo: false is set. Set to false for lightweight output.
extractCaptionsbooleanfalseExtract captions/subtitles as full transcript text.
captionLanguagestring"en"Preferred caption language code (e.g., en, ko, ja, es). Falls back to first available.
extractCommentsbooleanfalseExtract top comments (author, text, likes, timestamp).
maxCommentsinteger100Max comments per video (1--500).
maxVideosinteger100Max videos from playlists/channels (1--500).
proxyConfigurationobjectApify Proxy ONProxy settings. Enabled by default. RESIDENTIAL recommended for reliability. Proxy incurs data transfer costs (~$5-6/GB).
searchQueriesstring[](optional)Search YouTube by keywords. Found videos are added to the processing queue.
downloadThumbnailbooleanfalseDownload highest-quality thumbnail to Key-Value Store.
downloadCaptionsbooleanfalseDownload caption file to Key-Value Store. Use captionFormat for SRT or VTT.
captionFormatstring"srt"Caption file format: srt (SubRip) or vtt (WebVTT). Only when downloadCaptions is true.
extractRepliesbooleanfalseFetch full reply threads for each comment. Requires extractComments: true.
extractChannelInfobooleanfalseExtract channel details: subscriber count, description, banner, video count, join date.
extractRelatedVideosbooleanfalseExtract up to 20 related videos and video category.
filenameTemplatestring"{videoId}_{type}"Custom filename with placeholders: {videoId}, {title}, {quality}, {channelName}, {date}, {type}.
webhookUrlstring(optional)URL to receive POST notification when the run completes.

Output Schema

LLM-Ready Output (outputFormat: "llm_ready")

Optimized for AI pipelines. Auto-enables transcript extraction and disables video download. Returns a flat structure with clean transcript text, word count, and essential metadata only.

{
"videoId": "dQw4w9WgXcQ",
"title": "Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)",
"channelName": "Rick Astley",
"channelUrl": "http://www.youtube.com/@RickAstleyYT",
"transcript": "We're no strangers to love You know the rules and so do I A full commitment's what I'm thinking of...",
"wordCount": 427,
"language": "English",
"languageCode": "en",
"sourceUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"duration": "3:33",
"durationSeconds": 213,
"viewCount": 1751798914,
"uploadDate": "Oct 25, 2009",
"category": "Music",
"tags": [],
"description": "The official video for 'Never Gonna Give You Up' by Rick Astley..."
}

Rich Metadata Output (default)

Full details including download URL, comments, channel info, and related videos when enabled.

{
"sourceUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"downloadUrl": "https://api.apify.com/v2/key-value-stores/STORE_ID/records/dQw4w9WgXcQ_video",
"videoId": "dQw4w9WgXcQ",
"title": "Rick Astley - Never Gonna Give You Up",
"description": "The official video for 'Never Gonna Give You Up' by Rick Astley...",
"channelName": "Rick Astley",
"channelUrl": "https://www.youtube.com/channel/UCuAXFkgsw1L7xaCfnd5JJOw",
"viewCount": 1500000000,
"likeCount": 16000000,
"duration": "3:33",
"durationSeconds": 213,
"uploadDate": "Oct 25, 2009",
"thumbnailUrl": "https://i.ytimg.com/vi/dQw4w9WgXcQ/maxresdefault.jpg",
"thumbnailDownloadUrl": "https://api.apify.com/v2/key-value-stores/STORE_ID/records/dQw4w9WgXcQ_thumbnail",
"quality": "720p",
"fileSize": "11.28 MB",
"category": "Music",
"chapters": [
{ "title": "Intro", "startTime": "0:00", "startTimeSeconds": 0 }
],
"captions": [
{
"language": "English",
"languageCode": "en",
"text": "We're no strangers to love You know the rules and so do I..."
}
],
"captionFileUrl": "https://api.apify.com/v2/key-value-stores/STORE_ID/records/dQw4w9WgXcQ_captions.srt",
"comments": [
{
"author": "YouTube User",
"authorChannelUrl": "https://www.youtube.com/channel/UC...",
"text": "This song is timeless!",
"likes": 42000,
"publishedTime": "2 years ago",
"replyCount": 150,
"replies": [
{
"author": "Another User",
"text": "Agreed, classic forever!",
"likes": 1200,
"publishedTime": "1 year ago"
}
]
}
],
"channelInfo": {
"channelId": "@RickAstleyYT",
"subscriberCount": "4.47M subscribers",
"description": "Official YouTube channel of Rick Astley...",
"bannerUrl": "https://yt3.googleusercontent.com/...",
"videoCount": "402 videos",
"joinedDate": "Joined Feb 2, 2015"
},
"relatedVideos": [
{
"videoId": "yPYZpwSpKmA",
"title": "Rick Astley - Together Forever",
"channelName": "Rick Astley",
"viewCount": "198M views",
"duration": "3:24"
}
],
"tags": ["rick astley", "never gonna give you up", "official video"],
"isLive": false,
"isShort": false,
"playlistIndex": 1,
"playlistTitle": "My Playlist"
}

Note: playlistIndex and playlistTitle appear only for videos from playlists. isShort is true when the input URL uses the /shorts/ format.

Lightweight Output (extractMetadata: false)

{
"sourceUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"downloadUrl": "https://api.apify.com/v2/key-value-stores/STORE_ID/records/dQw4w9WgXcQ_video"
}

Failed Video Output (includeFailedVideos: true)

{
"sourceUrl": "https://www.youtube.com/watch?v=INVALID_ID",
"downloadUrl": null,
"error": "This video is unavailable",
"status": "failed"
}

Pricing Details

This actor uses pay-per-event pricing. No monthly subscription. Because it uses the InnerTube API directly (no browser), it runs faster and cheaper than browser-based alternatives.

EventPriceWhen
Transcript / Metadata extraction$0.005/videooutputFormat: "llm_ready" or downloadVideo: false
Video download$0.10/videoDefault (with video file)

Estimated costs by volume:

UsageActor FeeBest For
1,000 transcripts (LLM-ready)~$5.00RAG pipelines, embeddings
10,000 transcripts (LLM-ready)~$50.00Large-scale training data
1,000 metadata extractions~$5.00Research, analytics
100 video downloads~$10.00Content analysis, archiving
500 video downloads~$50.00Large-scale archiving

Actor fees are listed above. Apify platform costs (compute time, proxy data transfer) are billed separately. RESIDENTIAL proxy recommended at ~$5-6/GB. Apify Free plan includes $5/month in platform credits.

Cost optimization tips:

  • Use outputFormat: "llm_ready" for transcript-only workloads -- same price as metadata mode ($0.005/video) but with optimized flat output for AI pipelines.
  • Use audio_only quality for faster, cheaper video downloads when you only need the audio track.
  • Use downloadVideo: false when you only need metadata, captions, and comments.
  • Disable proxy to reduce data transfer costs (may cause blocks on large batches).

AI/ML Use Cases

RAG (Retrieval-Augmented Generation)

Extract transcripts from YouTube videos and store as embeddings in a vector database. When users ask questions, retrieve relevant transcript chunks and feed them to your LLM for grounded, accurate answers. Use outputFormat: "llm_ready" for clean transcript text at $0.005/video.

Training Data Collection

Build fine-tuning datasets from educational YouTube channels. Extract transcripts from entire playlists or channels, pair with metadata (title, channel, tags), and export as JSONL for model training. Use searchQueries to find domain-specific content automatically.

Content Monitoring and Competitive Intelligence

Track competitor YouTube channels with scheduled runs. Extract metadata (views, likes, comments) to monitor content performance over time. Use metadata-only mode at $0.005/video to minimize costs. Set up webhookUrl for automated alerts.

Sentiment Analysis

Extract comments (with replies) from YouTube videos for sentiment analysis. Use metadata-only mode with extractComments: true and extractReplies: true. Feed comment text into your NLP pipeline for brand monitoring or market research.

Video Understanding (Multimodal AI)

Download video files alongside transcripts and metadata for multimodal AI research. Combine transcript text with video frames for tasks like video summarization, scene classification, or visual question answering.

Knowledge Base Construction

Build searchable knowledge bases from educational YouTube content. Extract transcripts with chapter markers to create structured, segmented documents. Chapters provide natural topic boundaries for better document chunking.

Example Inputs

LLM-Ready Transcripts (Cheapest)

{
"startUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
"outputFormat": "llm_ready",
"captionLanguage": "en",
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Batch Transcripts from Playlist

{
"startUrls": ["https://www.youtube.com/playlist?list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf"],
"outputFormat": "llm_ready",
"maxVideos": 100,
"maxConcurrency": 6,
"captionLanguage": "en",
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Research Dataset (Metadata + Comments)

{
"startUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
"downloadVideo": false,
"extractComments": true,
"extractReplies": true,
"extractCaptions": true,
"extractChannelInfo": true,
"maxComments": 200,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Search and Extract

{
"searchQueries": ["machine learning tutorial", "transformer architecture explained"],
"outputFormat": "llm_ready",
"maxVideos": 20,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Full Extraction (All Features)

{
"startUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
"quality": "1080p",
"extractMetadata": true,
"extractCaptions": true,
"captionLanguage": "en",
"downloadCaptions": true,
"captionFormat": "srt",
"downloadThumbnail": true,
"extractComments": true,
"maxComments": 100,
"extractReplies": true,
"extractChannelInfo": true,
"extractRelatedVideos": true,
"filenameTemplate": "{channelName}_{title}_{quality}",
"webhookUrl": "https://your-server.com/webhook/youtube",
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Supported URL Formats

URL TypeExample
Standard videohttps://www.youtube.com/watch?v=VIDEO_ID
Short URLhttps://youtu.be/VIDEO_ID
Shortshttps://www.youtube.com/shorts/VIDEO_ID
Embedhttps://www.youtube.com/embed/VIDEO_ID
Playlisthttps://www.youtube.com/playlist?list=PLAYLIST_ID
Channel (handle)https://www.youtube.com/@ChannelHandle
Channel (ID)https://www.youtube.com/channel/CHANNEL_ID
Channel (custom URL)https://www.youtube.com/c/ChannelName

Mix and match URL types in a single run -- pass playlists, channels, and individual videos together.

Integrations -- Zapier, Make, n8n

Zapier

  1. Add a Webhooks by Zapier action (POST request)
  2. URL: https://api.apify.com/v2/acts/jy-labs~youtube-all-in-one-downloader-scraper/runs?token=YOUR_TOKEN
  3. Payload: JSON with your input (startUrls, outputFormat, etc.)
  4. Use webhookUrl to trigger the next Zapier step when the run completes

Make (formerly Integromat)

  1. Add an HTTP > Make a request module
  2. POST to: https://api.apify.com/v2/acts/jy-labs~youtube-all-in-one-downloader-scraper/runs?token=YOUR_TOKEN
  3. Body: JSON with actor input
  4. Use Make's Apify module to poll for completion and fetch dataset items

n8n

  1. Add an HTTP Request node (POST) with actor input
  2. Use webhookUrl pointing to an n8n webhook trigger for completion notification
  3. Add a second HTTP Request node to GET dataset items

Apify Scheduler

Built-in scheduling -- set a cron expression on the Schedules tab to run the actor automatically. Combine with webhookUrl to send results to Slack, email, or any endpoint.

FAQ

What is LLM-ready mode?

Set outputFormat: "llm_ready" and the actor automatically enables transcript extraction, disables video download, and charges $0.005/video (same as metadata extraction). The output includes clean plaintext transcripts ready to chunk, embed, or feed into any LLM. No extra configuration needed.

Do I need a proxy?

Apify Proxy is enabled by default. For best results, select the RESIDENTIAL proxy group. YouTube aggressively rate-limits automated requests, so keeping the proxy enabled is recommended. Note: Proxy usage incurs data transfer costs (~$5-6/GB for RESIDENTIAL). You can disable it to reduce costs at the risk of blocks.

What quality options are available?

highest (best available), 1080p, 720p, 480p, 360p, and audio_only. Automatic fallback to nearest lower resolution if requested quality is unavailable.

Can I get transcripts in other languages?

Yes. Set captionLanguage to any ISO language code (e.g., "ko", "ja", "es", "de"). The actor extracts captions in 100+ languages. If the preferred language is unavailable, it falls back to the first available.

Can I download age-restricted or private videos?

No. Age-restricted and private videos require YouTube authentication and are not supported. Only publicly available videos can be processed.

What is the maximum batch size?

Up to 500 videos per run (via maxVideos). For larger batches, schedule multiple runs or chain them via the Apify API.

How long are files stored?

Downloaded files are stored in Apify Key-Value Store. Free plan: 7 days retention. Paid plans: longer retention. Export files before they expire.

Can I get audio only?

Yes. Set quality: "audio_only" to extract the audio track as an M4A file (AAC audio in MP4 container). Supported by all major players. Useful for podcast archiving, music extraction, or audio-based NLP.

Can I get SRT/VTT subtitle files?

Yes. Set downloadCaptions: true and captionFormat: "srt" or "vtt". The file is stored in Key-Value Store and the output includes captionFileUrl.

Is this suitable for production use?

Yes. The actor includes retry logic with exponential backoff, proxy rotation, concurrent processing, structured error handling, and webhook notifications. It is designed for automated pipelines and scheduled runs.

Can I search YouTube and extract at the same time?

Yes. Use searchQueries to search by keywords. The actor fetches search results and processes found videos alongside any startUrls. Useful for building training datasets on specific topics.

What format are downloaded videos in?

Videos: MP4 (.mp4). Audio-only: M4A (.m4a, AAC in MP4 container). Both are universally supported.

Limitations

  • Age-restricted videos require YouTube authentication and are not supported
  • Private and unlisted videos accessible only to the uploader cannot be processed
  • Live streams currently broadcasting cannot be downloaded (completed live streams work)
  • DRM-protected content (YouTube Premium originals) cannot be downloaded
  • File size limits per quality tier: 360p=100MB, 480p=150MB, 720p=250MB, 1080p=400MB, highest=500MB, audio=50MB
  • Very long videos (>2 hours) may require more memory; lower maxConcurrency for these
  • YouTube rate limiting may affect large batches without proxy -- always use proxy for production
  • Geographically restricted videos may fail depending on proxy location

Technology

  • youtubei.js -- YouTube InnerTube API client (no browser required)
  • Apify SDK -- Actor framework with dataset, key-value store, and proxy management
  • p-limit -- Concurrent download management
  • TypeScript -- Type-safe ESM implementation

Changelog

See the Changelog tab for version history and updates.

Support

If you encounter issues or have feature requests, open an issue on the Apify Store page. For custom integrations or enterprise use cases, reach out through the Apify platform.

<h1>YouTube Data Pipeline for AI/ML Teams</h1>