YouTube Scraper avatar

YouTube Scraper

Pricing

from $15.00 / 1,000 video scrapeds

Go to Apify Store
YouTube Scraper

YouTube Scraper

Extract videos, comments & transcripts from any YouTube channel, playlist or search. Supports @handles, Shorts filter, date range & min views. Full transcripts with timestamps — perfect for AI/LLM datasets. No API key needed.

Pricing

from $15.00 / 1,000 video scrapeds

Rating

0.0

(0)

Developer

Yuliia Kulakova

Yuliia Kulakova

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

20 days ago

Last modified

Share

YouTube Scraper — Videos, Comments & Transcripts

YouTube Scraper

Extract videos, comments, and full transcripts from YouTube — no API key required. Works with channels, playlists, search queries, and direct video URLs. Built for AI/LLM datasets, market research, competitor analysis, and content intelligence.


💰 Pricing

Pay only for what you extract — three separate billing events:

WhatCost
📹 Videos$15 per 1,000
💬 Comments$8 per 1,000
📝 Transcripts$5 per transcript

A small one-time actor-start fee applies per run. You are never charged for videos that are filtered out or skipped.


✨ Key Features

📝 Full Transcripts with Timestamps

Extracts complete captions — both manual and auto-generated — broken into timestamped segments. Each segment has a start time (seconds), duration, and text. The fullText field provides the entire transcript as a single string. Perfect for building AI training datasets, video summarization pipelines, search indexing, and subtitle analysis.

🔄 Four Input Types

  • Channel URLs including @handle format — scrapes the full video library from a channel
  • Playlist URLs — both user-created playlists (PL...) and YouTube Mix/Radio playlists (RD...)
  • Search queries — finds videos matching one or more keyword queries
  • Direct video URLs — pinpoint specific videos by URL or video ID

📊 Rich Video Metadata

Every video record includes: title, description, channel info, view count, like count, tags, category, duration in both human-readable and seconds format, publish date, thumbnail URL, and whether the video is a YouTube Short.

💬 Comments with Replies

Captures top-level comments and their replies in a single flat dataset. Each comment record includes the author name, channel ID, like count, reply count, pin status, and whether it is a reply (with parentCommentId for threading).

🔍 Powerful Filters

Filter videos before scraping details to save cost and time:

  • Minimum views — skip low-traffic videos
  • Date range — only videos published after a specific date
  • Video type — regular videos only, Shorts only, or all

🚀 Quick Start

Option 1 — Channel URL

Paste a channel URL to scrape all videos from that channel.

https://www.youtube.com/@MrBeast
https://www.youtube.com/channel/UCX6OQ3DkcsbYNE6H8uQQuVA

Option 2 — Playlist

Paste a playlist URL. Works with standard playlists and YouTube Radio/Mix playlists.

https://www.youtube.com/playlist?list=PLxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
https://www.youtube.com/watch?v=dQw4w9WgXcQ&list=RDdQw4w9WgXcQ

Option 3 — Search Queries

Provide one or more search terms as a list. The scraper returns the top results for each query.

["python tutorial", "machine learning 2024", "startup advice"]

Option 4 — Direct Video URL

Provide one or more video URLs to scrape specific videos.

https://www.youtube.com/watch?v=dQw4w9WgXcQ
https://youtu.be/dQw4w9WgXcQ

⚙️ Input Parameters

ParameterTypeDefaultDescription
startUrlsarrayYouTube URLs: video, channel, playlist, or search results page
searchQueriesarrayKeywords to search for (each query runs independently)
maxVideosinteger50Maximum number of videos to process per run
scrapeCommentsbooleanfalseExtract comments for each video
maxCommentsPerVideointeger100Maximum comments to extract per video
scrapeTranscriptsbooleanfalseExtract full transcripts (captions) with timestamps
filterVideoTypestringallFilter by type: all, videos (excludes Shorts), or shorts
filterByMinViewsinteger0Skip videos with fewer than this many views
filterByDateFromstringOnly include videos published on or after this date (YYYY-MM-DD)
filterByLanguagestringenPreferred transcript language code (e.g. en, es, de)

📦 Output Format

All results are saved to the Apify dataset. Three record types are mixed in a single dataset and can be filtered by the type field.


Video Record (type: "video")

One record per video with full metadata.

{
"type": "video",
"videoId": "dQw4w9WgXcQ",
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"title": "Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)",
"description": "The official video for \"Never Gonna Give You Up\" by Rick Astley...",
"channelId": "UCuAXFkgsw1L7xaCfnd5JJOw",
"channelName": "Rick Astley",
"channelUrl": "https://www.youtube.com/channel/UCuAXFkgsw1L7xaCfnd5JJOw",
"publishedAt": "2009-10-25",
"duration": "3:33",
"durationSeconds": 213,
"viewCount": 1760000000,
"likeCount": 18000000,
"commentCount": 0,
"thumbnailUrl": "https://i.ytimg.com/vi/dQw4w9WgXcQ/maxresdefault.jpg",
"tags": ["rick astley", "never gonna give you up", "80s music", "rickroll"],
"category": "Music",
"isShort": false,
"scrapedAt": "2026-04-09T07:32:21.574Z"
}

Field reference:

FieldTypeDescription
videoIdstringYouTube video ID
urlstringFull watch URL
titlestringVideo title
descriptionstringFull video description
channelIdstringYouTube channel ID (UC...)
channelNamestringChannel display name
channelUrlstringChannel URL
publishedAtstringPublish date (ISO format when available, human-readable otherwise)
durationstringDuration in H:MM:SS or M:SS format
durationSecondsintegerDuration in seconds
viewCountintegerTotal view count
likeCountintegerTotal like count
commentCountintegerComment count (may be 0 if not returned by the API)
thumbnailUrlstringMax-resolution thumbnail URL
tagsarrayVideo tags set by the uploader
categorystringYouTube category (Music, Education, etc.)
isShortbooleanTrue if video duration is 60 seconds or less
scrapedAtstringISO timestamp of when the record was created

Comment Record (type: "comment")

One record per comment or reply. Top-level comments and replies are saved as flat records — use isReply and parentCommentId to reconstruct threads.

{
"type": "comment",
"commentId": "UgxHPK_QyTdcuyvuX7B4AaABAg",
"videoId": "dQw4w9WgXcQ",
"videoTitle": "Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)",
"text": "I came here voluntarily. Nobody rickrolled me.",
"authorName": "@username",
"authorChannelId": "UCxxxxxxxxxxxxxxxxxxxxxxxxx",
"likeCount": 134000,
"replyCount": 463,
"isReply": false,
"parentCommentId": null,
"publishedAt": "6 years ago",
"isPinned": false,
"scrapedAt": "2026-04-09T07:32:24.880Z"
}

Field reference:

FieldTypeDescription
commentIdstringUnique YouTube comment ID
videoIdstringParent video ID
videoTitlestringParent video title
textstringFull comment text
authorNamestringAuthor's @handle
authorChannelIdstringAuthor's channel ID
likeCountintegerNumber of likes on the comment
replyCountintegerNumber of replies (top-level comments only)
isReplybooleanTrue if this is a reply to another comment
parentCommentIdstringID of the parent comment if isReply is true
publishedAtstringRelative publish time as shown on YouTube
isPinnedbooleanTrue if the comment is pinned by the channel

Transcript Record (type: "transcript")

One record per video. Contains the full transcript broken into timed segments plus a concatenated fullText field.

{
"type": "transcript",
"videoId": "dQw4w9WgXcQ",
"videoTitle": "Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)",
"language": "en",
"languageName": "English (auto-generated)",
"isAutoGenerated": true,
"segments": [
{ "start": 18.8, "duration": 7.16, "text": "We're no strangers to love." },
{ "start": 25.96, "duration": 4.32, "text": "You know the rules and so do I." },
{ "start": 30.28, "duration": 9.28, "text": "You wouldn't get this from any other guy." }
],
"fullText": "We're no strangers to love. You know the rules and so do I...",
"wordCount": 291,
"scrapedAt": "2026-04-09T07:32:26.316Z"
}

Field reference:

FieldTypeDescription
languagestringISO 639-1 language code (e.g. en)
languageNamestringHuman-readable language name
isAutoGeneratedbooleanTrue if auto-generated captions, false if manually created
segmentsarrayArray of timed segments with start, duration, and text
fullTextstringAll segment text concatenated into one string
wordCountintegerTotal word count of the transcript

🔍 Use Case Examples

Build an LLM training dataset from a podcast channel

{
"startUrls": [{ "url": "https://www.youtube.com/@lexfridman" }],
"maxVideos": 200,
"scrapeTranscripts": true,
"filterByLanguage": "en",
"filterVideoType": "videos"
}

Competitor sentiment analysis via comments

{
"searchQueries": ["notion app review", "notion vs obsidian 2024"],
"maxVideos": 50,
"scrapeComments": true,
"maxCommentsPerVideo": 300,
"filterByMinViews": 10000
}

Scrape only YouTube Shorts from a channel

{
"startUrls": [{ "url": "https://www.youtube.com/@MrBeast" }],
"maxVideos": 200,
"filterVideoType": "shorts"
}

Monitor new uploads (weekly incremental run)

{
"startUrls": [{ "url": "https://www.youtube.com/@ycombinator" }],
"maxVideos": 20,
"filterByDateFrom": "2026-04-01"
}
{
"searchQueries": ["AI tools 2026", "ChatGPT tutorial"],
"maxVideos": 30,
"filterByMinViews": 500000,
"filterByDateFrom": "2026-01-01",
"scrapeComments": true,
"maxCommentsPerVideo": 100
}

Scrape a specific playlist with transcripts

{
"startUrls": [{ "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ&list=RDdQw4w9WgXcQ" }],
"maxVideos": 20,
"scrapeTranscripts": true
}

📊 Who Uses This

Use CaseWhoWhat They Get
LLM / AI training dataAI & ML teamsTimestamped transcripts from thousands of videos in structured JSON
Competitive researchProduct teamsWhat real users say about competitor products in comments
SEO & content strategyMarketersTop-performing titles, tags, view/engagement metrics
Influencer analyticsAgenciesChannel video performance, engagement trends over time
Shorts trend analysisCreatorsTrending Shorts topics, view/engagement ratios
Academic researchResearchersComment corpora for NLP, sentiment analysis, discourse studies
Market researchAnalystsConsumer opinions from product reviews and unboxing videos
Podcast transcriptionJournalists & writersFull text of long-form interviews and podcast episodes
Education dataEdTech companiesCourse content from educational channels in structured format

💡 Pro Tips

1. Transcripts for AI training Use scrapeTranscripts: true with a focused channel. Set filterVideoType: "videos" to skip Shorts, which rarely have quality transcripts. Prefer channels that publish manual (not auto-generated) subtitles for higher accuracy.

2. Trending Shorts discovery Combine filterVideoType: "shorts" + filterByDateFrom (last 7–14 days) + filterByMinViews: 50000 to surface only breakout Shorts from the past week.

3. Comment sentiment at scale Run with scrapeComments: true and maxCommentsPerVideo: 500 on product review videos. Filter by filterByMinViews: 5000 to ensure the comments represent real audience volume.

4. Incremental monitoring Schedule this actor weekly. Set filterByDateFrom to the previous Monday. This way you only process newly published videos and never duplicate data.

5. Multi-query keyword research Pass 5–10 related search queries in searchQueries. Analyze the tags and title fields across results to discover the vocabulary your target audience uses.

6. Combining sources You can mix startUrls and searchQueries in a single run. Results from all sources are deduplicated — each video is processed only once even if it appears in multiple sources.


⚠️ Limits & Notes

  • No API key required — the scraper uses YouTube's internal API, the same one used by the YouTube website and mobile apps.
  • Transcripts availability — not all videos have captions. If no transcript is found, the video record is still saved, but no transcript record is created.
  • Tags and category — some videos, particularly older ones or those on music label channels, do not return tags or category via the API. These fields will be empty arrays or empty strings in that case.
  • publishedAt format — for most videos this is an ISO date. For some videos the YouTube API returns a relative string like "3 years ago" or "Premiered Dec 27, 2022" — these are preserved as-is.
  • Comments — the scraper fetches comments sorted by Top Comments. Replies are included when paginating through comment threads.
  • Rate limiting — the scraper includes automatic retry logic and pacing to handle YouTube's rate limits gracefully.

❓ FAQ

Q: Do I need a YouTube API key? No. The scraper uses YouTube's internal Innertube API — the same one used by the YouTube website and mobile apps — so no API key or developer account is required.

Q: How many videos can I scrape per run? There is no hard limit beyond maxVideos. You can set it to 1,000 or more. Keep in mind that each video requires a few API calls, so large runs take time. For channels or playlists with thousands of videos, consider running multiple incremental jobs with filterByDateFrom.

Q: Why is commentCount always 0 in the video record? YouTube's Innertube API does not reliably return the comment count in the metadata response. The actual comments are fetched separately when scrapeComments: true. The commentCount field is kept in the schema for compatibility but will typically be 0.

Q: Why does some videos have an empty description or missing publishedAt? Certain videos — especially on music label or VEVO channels — are restricted by YouTube's internal API and return limited metadata. The scraper makes a best-effort attempt to fill all fields using multiple API endpoints, but some fields may remain empty for these videos.

Q: What happens if a video has no captions? If scrapeTranscripts: true and no captions are available, no transcript record is created for that video. The video record is still saved. The scrapeTranscripts billing event only fires when a transcript is successfully retrieved.

Q: Can I scrape private or age-restricted videos? No. The scraper only accesses publicly available content — the same content visible to any logged-out user. Private, unlisted, or age-restricted videos are not accessible.

Q: How does billing work exactly? You are charged three separate per-event fees:

  • $15 per 1,000 videos processed (billed per video, after filters pass)
  • $8 per 1,000 comments extracted
  • $5 per transcript successfully retrieved

A small one-time actor-start fee also applies. Videos filtered out by filterByMinViews, filterByDateFrom, or filterVideoType are not billed.

Q: Can I run this on a schedule? Yes. Use the Apify scheduler to run the actor weekly or daily. Set filterByDateFrom to the start of the period you want to capture. This avoids re-processing already-scraped videos and minimizes cost.

Q: How do I get transcripts in a specific language? Set filterByLanguage to the ISO 639-1 code (e.g. "en", "es", "de"). If a transcript in that language is not available, the scraper will fall back to any available language. To strictly require a language, filter the output by the language field in the transcript record.

Q: Are YouTube Shorts supported? Yes. Use filterVideoType: "shorts" to scrape only Shorts, filterVideoType: "videos" to exclude them, or filterVideoType: "all" (default) to include both.


This scraper accesses publicly available data on YouTube — the same data visible to any user without logging in. Use it for legitimate research, content analysis, and data science purposes.

Always comply with:

Do not use scraped data to build spam systems, harass individuals, or circumvent YouTube's monetization systems.


🛠️ Technical Notes

  • Built on the Apify SDK with pay-per-event billing (Actor.charge())
  • Uses YouTube's Innertube API (the same internal API used by youtube.com) — no official Data API quota required
  • Transcript fetching uses the Android Innertube client to access caption tracks without authentication tokens
  • Comment pagination follows the 2024–2026 YouTube comment format using commentEntityPayload from frameworkUpdates.entityBatchUpdate.mutations
  • Residential proxy is used automatically for all requests to ensure reliable access