YouTube Transcript Scraper & Captions Scraper
Pricing
from $0.003 / video scraped
YouTube Transcript Scraper & Captions Scraper
YouTube transcript scraper and captions scraper for RAG datasets, AI training, content research, creator analytics, and video monitoring. Scrape videos, channels, comments, timestamped transcript segments, and AI chapters without a YouTube API key. PPE, x402-ready, Skyfire bundle.
Pricing
from $0.003 / video scraped
Rating
0.0
(0)
Developer
Nick
Maintained by CommunityActor stats
0
Bookmarked
14
Total users
9
Monthly active users
4 days ago
Last modified
Categories
Share
Use this YouTube transcript scraper and captions scraper to turn video URLs, channel pages, or search terms into caption exports and RAG-ready video records without a YouTube Data API key. Paste video URLs and get transcript text, timestamped segments, metadata, comments, and optional AI chapters back as structured JSON. PPE pricing is x402-ready, with a $5 Skyfire bundle for agent-paid bulk runs.
Best first run
{"mode": "transcript","videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],"transcriptLanguages": ["en"]}
Use this actor for YouTube transcript scraping, YouTube captions scraping, RAG datasets, caption exports, content research, creator analytics, and video monitoring. Add comments, channel analysis, and AI chapters after confirming transcript availability for your target videos.
What you get back
- One video row in the Videos Dataset with normalized
url,video_id, title, channel, views, likes, duration, publish date, thumbnail, tags, and engagement rate. - One RAG-ready row in the Transcripts Dataset with
transcript_text,transcript_segments,transcript_language,transcript_kind, andtranscript_available. - Good starter run: transcript mode, one URL, comments off, AI off. Add comments, channel analysis, or AI chapters after the transcript path works for your target content.
Generate YouTube transcripts, AI chapter markers, and channel analytics without an API key or browser - a pay-per-video alternative to vidIQ ($39-$415/mo), TubeBuddy ($9-$50/mo), and manual transcription services ($1.00-$1.50/audio-min via Rev). At $0.003 per video, $0.005 per transcript, and $0.01 per chapter set, a 100-video transcript + chapter audit costs about $1.80 instead of a monthly SaaS seat, and a 60-minute podcast transcript/chapter pass costs cents instead of manual transcription rates. Lightweight HTTP requests parse ytInitialData directly - significantly faster and cheaper than browser-based scrapers.
Whether you are a marketing agency benchmarking influencer channels, a content creator optimizing your upload strategy, or a brand manager evaluating sponsorship opportunities, this actor delivers structured YouTube data plus transcript-aware AI content intelligence.
What it does
This actor scrapes publicly accessible YouTube data across four modes:
- Transcript mode - fastest first-run path for RAG and search pipelines. Paste video URLs and get transcript text plus timestamped segments, with comments and AI disabled for speed.
- Channel mode - analyze any YouTube channel's recent or most popular videos. Retrieves channel metadata (subscriber count, total views, verification status, country, handle) plus per-video data (title, view count, like count, comment count, duration, tags, engagement rate, and Shorts detection). Optionally downloads full transcripts per video.
- Search mode - run a keyword search on YouTube and retrieve the top matching videos with full metadata. Useful for discovering which creators dominate a niche or tracking what content is trending around a topic.
- Video mode - fetch detailed data for specific video URLs you supply. Returns full descriptions, tags, likes, comments, and optional transcript - all in one structured output item per video.
On top of raw scraping, the actor optionally:
- Downloads full video subtitles (human-uploaded captions preferred, auto-generated ASR fallback) in your chosen language.
- Auto-segments each transcript into 3-8 AI-generated chapters with start/end timestamps, a short title, and a one-sentence summary - replacing manual chapter authoring for podcasters, long-form creators, and clip agencies.
- Runs an AI-powered channel analysis that synthesizes video metrics, engagement patterns, and transcript content themes into a strategic report with upload consistency scores, audience engagement scores, and growth recommendations.
No YouTube Data API key is required. No browser is launched. All data is fetched via direct HTTP requests, parsed from YouTube's server-rendered ytInitialData JSON payload.
Features
- No browser required - uses direct HTTP requests to fetch YouTube pages, making it significantly faster and cheaper than browser-based scrapers
- 4 scraping modes - transcript export, channel analysis, video search, or single video details to match your research workflow
- Full channel metadata - subscriber count, total views, video count, channel description, verification status, country, and handle
- Rich video data - title, view count, like count, comment count, duration, tags, publish date, engagement rate, and Shorts detection
- Engagement rate calculation - automatically computes engagement rate (likes + comments / views) for every video
- Comment extraction - scrape top comments with author name, comment text, like count, and publish date
- Video transcripts - download full video subtitles via YouTube's timedtext endpoint (prefers human-uploaded, falls back to auto-generated ASR). Includes both plain-text and per-segment timestamped output. Language preference is configurable
- AI chapter auto-segmentation - feed transcripts to an LLM to synthesize 3-8 chapter markers per video with
start,end,title, andsummary. Replaces manual chapter markers for podcasters and long-form creators, and produces searchable timestamps for clip extraction, content search, and video navigation. Only charged when chapters are successfully generated - Sort flexibility - sort channel videos by most popular, newest, or oldest to focus your analysis
- Transcript-aware AI analysis - when transcripts are enabled alongside AI analysis, the report extracts themes, key phrases, and voice-style from actual spoken content instead of only titles
- AI channel analysis - optional LLM-powered insights covering content strategy, upload consistency scoring, audience engagement scoring, top-performing themes, and growth recommendations
- Multi-LLM support - choose OpenRouter (recommended - 300+ models), Anthropic (Claude), Google AI (Gemini), OpenAI (GPT), or Ollama (self-hosted) for AI analysis
- Pay-per-event pricing - x402-ready PPE charges for videos, transcripts, comments, chapters, and AI analysis, plus a $5 Skyfire bundle for agent-paid bulk extraction
Use Cases
- Marketing agencies - research competitor channels, benchmark engagement rates across your client's vertical, and build data-driven content strategies. Compare multiple channels side-by-side with AI-generated positioning insights.
- Content creators - analyze your own channel performance to identify top-performing content themes. Understand which video formats, topics, and lengths drive the most engagement. Optimize upload timing and strategy.
- Brand managers - evaluate influencer channels for sponsorship fit. Track engagement rates, audience sentiment through comments, and content consistency before committing marketing budgets.
- Influencer marketing platforms - build and maintain influencer databases with up-to-date metrics. Score channels by engagement quality, not just subscriber count. Detect fake engagement through metric analysis.
- Competitive intelligence teams - monitor competitor YouTube channels for new content, messaging changes, product announcements, and audience reactions. Track engagement trends over time.
- PR and communications professionals - monitor brand mentions and sentiment across YouTube comments. Track how product launches and announcements are received by video audiences.
- Academic researchers - collect structured YouTube data for media studies, audience behavior research, content virality analysis, and platform ecosystem studies. Transcript mode produces a searchable corpus of spoken content across channels for qualitative and NLP analysis.
- AI training and content analysts - use transcript mode to build captioned video datasets for fine-tuning, RAG pipelines, or semantic search. Combine with AI analysis for theme tagging and voice-style classification at scale.
- SEO and content strategists - analyze video tags, titles, and descriptions to understand keyword strategies. Identify content gaps and high-engagement topics in your niche.
- Podcasters and long-form creators - auto-generate chapter markers for every episode when you forgot (or didn't want) to write them manually. Upload the chapters to YouTube's description, surface them in your podcast RSS, or drive a "jump to section" UI on your show page.
- Video clip agencies and editors - use AI chapter output to slice long-form content into topic-coherent clips for shorts, reels, or social distribution without watching the full video first.
Input
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
mode | string | transcript | Scraping mode: transcript, video, channel, or search |
channelUrls | array | -- | YouTube channel URLs (required for channel mode) |
searchQuery | string | -- | Search query (required for search mode) |
videoUrls | array | -- | Video URLs (required for transcript and video modes) |
maxVideos | integer | 10 | Maximum videos per channel or search (1-100) |
includeComments | boolean | false | Scrape top comments for each video |
maxCommentsPerVideo | integer | 10 | Comments per video (1-50) |
includeTranscripts | boolean | false | Download video subtitles (human-uploaded preferred, ASR fallback) |
transcriptLanguages | array | ["en"] | Preferred transcript language codes (first-match priority) |
generateChapters | boolean | false | Auto-segment each transcript into 3-8 AI-generated chapters (requires includeTranscripts + LLM key) |
sortVideosBy | string | popular | Sort channel videos: popular, newest, oldest |
enableAiAnalysis | boolean | false | Enable AI channel analysis |
llmProvider | string | openrouter | AI provider: openrouter, anthropic, google, openai, or ollama |
llmModel | string | -- | Override default model (leave empty for recommended default) |
openrouterApiKey | string | -- | OpenRouter API key (required if using OpenRouter) |
anthropicApiKey | string | -- | Anthropic API key (required if using Anthropic) |
googleApiKey | string | -- | Google AI (Gemini) API key (required if using Google) |
openaiApiKey | string | -- | OpenAI API key (required if using OpenAI) |
ollamaBaseUrl | string | http://localhost:11434 | Ollama API base URL (for self-hosted LLM) |
proxyConfiguration | object | {useApifyProxy: true, apifyProxyGroups: [RESIDENTIAL]} | Proxy settings (RESIDENTIAL strongly recommended) |
Hidden API/CLI aliases are also accepted for agent workflows: channelUrl for one channel URL; videoUrl, url, urls, or links for video URLs; query, q, search, keyword, or searchTerm for searchQuery; and maxItems for maxVideos.
Pricing
This actor uses Apify's pay-per-event pricing model. You only pay for what you scrape. The individual events are x402-ready; the Skyfire route uses the $5 bulk bundle because Skyfire enforces a minimum charge per actor invocation.
| Event | Price | Description |
|---|---|---|
video-scraped | $0.003 | Charged per video extracted |
transcript-scraped | $0.005 | Charged per transcript successfully downloaded (only when includeTranscripts is enabled) |
comments-scraped | $0.002 | Charged per video whose comments were successfully scraped (only when includeComments is enabled and comments were returned) |
chapter-generated | $0.01 | Charged per video chapter-set successfully generated (only when generateChapters is enabled and chapters were produced) |
ai-analysis-completed | $0.05 | Charged per AI channel analysis report |
Skyfire bulk bundle (AI-agent payment rail)
A skyfire-bundle-500-videos event ships at $5.00 per 500 videos for AI agents paying via the Skyfire JWT rail. Effective rate: $0.01/video - a 3.3x premium over the raw video-scraped baseline ($0.003) - but the bundle covers the full extractor stack (transcript + AI chapters + AI channel analysis) under one prepaid call, displacing manual Rev.com transcription at $90/hr. Skyfire requires a $5 minimum charge per actor invocation, so the bundle is the canonical agent-payment-rail-compatible option. Pay-as-you-go users via Apify's standard PPE rail still get the cheaper individual-event pricing.
Cost Examples
| Scenario | Videos | Transcripts | Chapters | AI Analysis | Total Cost |
|---|---|---|---|---|---|
| Quick channel check | 10 | No | No | No | $0.03 |
| Channel deep dive | 20 | No | No | Yes | $0.11 |
| Channel deep dive + transcripts | 20 | Yes (20) | No | Yes | $0.21 |
| Podcast chapter generation | 20 | Yes (20) | Yes (20) | No | $0.36 |
| Full content intelligence pass | 20 | Yes (20) | Yes (20) | Yes | $0.41 |
| Multi-channel comparison | 50 | No | No | Yes | $0.20 |
| Multi-channel + transcripts | 50 | Yes (50) | No | Yes | $0.45 |
| Large content audit | 100 | No | No | Yes | $0.35 |
vs. commercial alternatives: vidIQ Pro charges $49+/mo and TubeBuddy $19+/mo for YouTube analytics, while the YouTube Data API imposes strict rate limits and requires authentication. This actor uses pay-per-event with no subscription: $0.003/video and zero monthly fees.
Typical Runtime
Because this actor uses HTTP requests instead of a browser, it runs significantly faster than browser-based YouTube scrapers:
- 10 videos without comments: ~30-60 seconds
- 20 videos without comments: ~1-2 minutes
- 20 videos with comments: ~2-4 minutes
- 50 videos with comments: ~5-8 minutes
- AI analysis adds ~15-30 seconds to any run
Output
The actor writes to multiple datasets so RAG pipelines and analyst workflows can consume the clean table they need:
- Videos Dataset - default video and channel records with metadata, metrics, optional embedded transcripts, comments, and chapters.
- Transcripts Dataset - one transcript row per video with
transcript_textand timestampedtranscript_segments. - Comments Dataset - one row per scraped comment with video context attached.
- AI Analysis Dataset - optional LLM channel, search, or content-analysis records.
- Diagnostics Dataset - invalid input, blocked request, target-error, and no-result records.
Channel Mode Output
In channel mode, each result contains channel metadata, video list, and optional AI analysis:
{"channel": {"channel_id": "UCBcRF18a7Qf58cCRy5xuWwQ","channel_name": "MKBHD","handle": "@mkbhd","subscriber_count": 19200000,"total_videos": 20,"total_views": 450000000,"description": "...","verified": true,"country": "US"},"videos": [{"video_id": "abc123","title": "Video Title","view_count": 5000000,"like_count": 200000,"comment_count": 15000,"duration_seconds": 720,"engagement_rate": 4.3,"is_short": false,"tags": ["tech", "review"],"comments": [{"comment_id": "Ugy...AaABAg","author": "User","author_channel_id": "UCxxx","text": "Great video!","like_count": 500,"likes": 500,"published_date": "2 weeks ago","url": "https://www.youtube.com/watch?v=abc123&lc=Ugy...AaABAg"}]}],"ai_analysis": {"channel_overview": "...","upload_consistency_score": 8,"audience_engagement_score": 9,"top_performing_themes": ["tech reviews", "smartphones"],"recommendations": ["..."]},"scraped_at": "2026-04-10T10:30:00Z"}
Search and Video Mode
In search and video mode, each result is a flat video object with the following fields:
{"videoId": "dQw4w9WgXcQ","title": "Rick Astley - Never Gonna Give You Up (Official Video)","description": "The official video for "Never Gonna Give You Up" by Rick Astley...","channelId": "UCuAXFkgsw1L7xaCfnd5JJOw","channelTitle": "Rick Astley","viewCount": 1500000000,"likeCount": 16000000,"commentCount": 2100000,"publishedAt": "2009-10-25T06:57:33Z","duration_seconds": 212,"engagement_rate": 1.21,"is_short": false,"tags": ["rick astley", "never gonna give you up", "pop"],"transcript": [{"start": 0.0, "duration": 3.5, "text": "We're no strangers to love..."},{"start": 3.5, "duration": 4.0, "text": "You know the rules and so do I..."}],"chapters": [{"start": 0, "end": 43.0, "title": "Opening verse", "summary": "Artist introduces the emotional stakes of commitment."},{"start": 43.0, "end": 212.0, "title": "Chorus and bridge", "summary": "Repeated declaration of unconditional loyalty."}]}
The transcript field is an array of timed segments (only present when includeTranscripts is enabled). The chapters array is only present when generateChapters is enabled and the LLM successfully segmented the transcript.
Transcript Output
In transcript mode, transcript retrieval is enabled automatically and comments/AI are skipped for speed. When includeTranscripts is enabled in other modes, each video record also includes these fields, and the same data is written as a clean row in the Transcripts Dataset:
{"transcript_available": true,"transcript_language": "en","transcript_kind": "","transcript_text": "Welcome back to the channel. Today we're looking at...","transcript_segments": [{"start": 0.0, "duration": 4.12, "text": "Welcome back to the channel."},{"start": 4.12, "duration": 3.8, "text": "Today we're looking at..."}]}
transcript_kindis""for human-uploaded captions and"asr"for auto-generated (machine-transcribed) captions. The scraper prefers uploaded over ASR and matches your preferred-language list first.transcript_segmentspreserves timestamps for use cases like chaptering, search-within-video, or clipping.transcript_textis the flat concatenation for full-text search or LLM input.- When no captions are available on a video,
transcript_availableisfalseand the other fields are empty.
AI Chapter Output
When generateChapters is enabled (and an LLM key is set and the transcript was retrieved), each video record gains a chapters array:
{"chapters": [{"start": 0, "end": 42.5, "title": "Intro and new studio setup", "summary": "Creator welcomes viewers and introduces the revamped studio."},{"start": 42.5, "end": 310.0, "title": "iPhone 16 Pro camera test", "summary": "Walk-through of the new 48MP main sensor with outdoor comparison shots."},{"start": 310.0, "end": 720.0, "title": "Battery life and benchmarks", "summary": "Real-world battery test plus synthetic CPU/GPU benchmarks vs. last year's model."}]}
- Timestamps are in seconds (float), so they map directly to YouTube's URL fragment format (
&t=310s). - Typical output is 3-8 chapters per video - the LLM decides the optimal count based on topic shifts in the transcript.
- Chapters are chronological, non-overlapping, and cover the whole video from 0 to duration.
- If the video has no transcript (
transcript_available: false), no chapters are generated and no charge is emitted. - On rare LLM parse failures, the
chaptersfield is simply absent and no charge is emitted - your metrics + transcripts are unaffected.
Quick Start
Example: Auto-chapter a long-form podcast (highest-value mode)
Pulls the timed transcript, generates AI chapter timestamps with summaries, and runs sentiment analysis. Best for podcasts, interviews, lectures (videos >10 min).
{"mode": "video","videoUrls": ["https://www.youtube.com/watch?v=YOUR_PODCAST_VIDEO_ID"],"includeTranscripts": true,"generateChapters": true,"enableAiAnalysis": true,"llmProvider": "openrouter"}
Per video: $0.003 (video) + $0.005 (transcript) + $0.01 (chapters) + $0.05 (AI) = ~$0.068/video. 30-min weekly podcast season (10 episodes): $0.68.
Example: Quick Channel Check
The simplest way to get started - analyze a YouTube channel's top videos:
{"mode": "channel","channelUrls": ["https://www.youtube.com/@mkbhd"],"maxVideos": 10}
This scrapes the 10 most popular videos from MKBHD's channel with full metrics.
Example: Scrape Latest Channel Videos with Transcripts
Fetch the 15 newest uploads from a channel and download the English transcript for each video - useful for content monitoring pipelines that need the spoken text, not just titles.
{"mode": "channel","channelUrls": ["https://www.youtube.com/@lexfridman"],"maxVideos": 15,"sortVideosBy": "newest","includeTranscripts": true,"transcriptLanguages": ["en"]}
Each video item includes transcript_text (flat string) and transcript_segments (timestamped array) alongside standard metadata such as view count, like count, and duration.
Example: Search YouTube for a Keyword
Search for a topic and retrieve the top 20 matching videos with full metadata - ideal for market research or tracking which creators dominate a niche.
{"mode": "search","searchQuery": "electric vehicle review 2026","maxVideos": 20}
Returns a list of 20 video objects, each containing title, channel name, view count, like count, engagement rate, duration, and publish date.
Example: Weekly Competitor Channel Analysis
Analyze a competitor's 50 most recent videos and generate an AI report on their content strategy - schedule this weekly to track messaging shifts and upload cadence over time.
{"mode": "channel","channelUrls": ["https://www.youtube.com/@mkbhd"],"maxVideos": 50,"sortVideosBy": "newest","enableAiAnalysis": true,"llmProvider": "openrouter","openrouterApiKey": "sk-or-..."}
Produces a structured AI report covering upload consistency score, audience engagement score, top content themes, and growth recommendations alongside full video metrics for all 50 videos.
Tips for Best Results
- Sort by popular for content strategy. Analyzing a channel's most popular videos reveals what content themes and formats resonate most with their audience.
- Sort by newest for competitive monitoring. Track what competitors are publishing right now and how their recent content performs.
- Use comments for sentiment analysis. Top comments provide qualitative audience feedback that complements quantitative engagement metrics.
- Enable AI analysis for strategic insights. The AI report synthesizes video metrics, engagement patterns, and content themes into actionable content strategy recommendations.
- Combine transcripts with AI analysis. When both are enabled, the AI report adds transcript-derived themes, key phrases, and creator voice-style - surfacing what the creator actually talks about, not just title keywords. This is the most actionable config for sponsorship research, brand-fit evaluation, and content gap analysis.
- Use
generateChaptersfor long-form content. Auto-chaptering shines on podcasts, interviews, tutorials, and explainers where topic shifts matter. For short videos (<2 min) or single-topic content, chapters add little value - skip the feature and save the $0.01 per video. - Compare engagement rates, not just views. A channel with 100K views and 5% engagement is often more valuable for sponsorships than one with 1M views and 0.5% engagement.
- Schedule weekly runs for ongoing monitoring. Track how channels evolve their content strategy and audience engagement over time.
MCP Quickstart - call this actor from Claude / Cursor / ChatGPT
Open Apify's hosted MCP configurator at mcp.apify.com, or install the Apify MCP server in your AI agent of choice:
# Claude Codeclaude mcp add apify -- npx -y @apify/actors-mcp-server --token YOUR_APIFY_TOKEN# Claude Desktop / Cursor (add to mcp.json):{"mcpServers":{"apify":{"command":"npx","args":["-y","@apify/actors-mcp-server","--token","YOUR_APIFY_TOKEN"]}}}
Then prompt the agent:
"Use the harvestlab/youtube-scraper actor on Apify to fetch the 30 most recent videos from the @LexClips channel with timed transcripts and AI-generated chapters. Push the results back as JSON."
Through Apify MCP, the agent will discover the actor's dataset_schema.json, generate the right input, run it, and pipe the typed output back into your conversation.
Troubleshooting
No transcripts returned for a video
YouTube restricts transcripts on some videos (auto-captions disabled by the creator, music-industry content, age-gated, member-only, or live streams in progress). The video item is returned with transcript_available: false and empty transcript_text / transcript_segments - and the transcript-scraped event is not charged in this case. Workaround: set includeTranscripts: false to skip the transcript fetch entirely for faster runs, or target channels known to have captions enabled.
403/429 errors or empty results
This actor does not use the YouTube Data API - it scrapes YouTube's public web pages (ytInitialData JSON) directly. 403 / 429 responses come from YouTube rate-limiting your exit IP, not from an API quota. Datacenter proxies are flagged most aggressively. Add proxyConfiguration with useApifyProxy: true and apifyProxyGroups: ["RESIDENTIAL"] to rotate through residential exits. If you still see 429s, reduce maxVideos per run and split large jobs across multiple scheduled runs to spread the load over time.
AI analysis fails with "API key missing"
Set openrouterApiKey (or your chosen provider's key) in the actor input, or pass it via the matching env var (OPENROUTER_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY, OPENAI_API_KEY). AI analysis is optional - leave enableAiAnalysis: false (the default) to skip it without losing other data. Ollama uses ollamaBaseUrl and needs a reachable local server instead.
Channel scraping returns fewer videos than expected
YouTube paginates channel videos in batches of ~30. The actor follows continuation tokens up to maxVideos (max 100). For channels with fewer total uploads than maxVideos, you simply get all available videos - this is expected, not a failure. Set sortVideosBy to newest or oldest if you need a specific slice instead of the popularity ranking. Very large channels (10,000+ videos) hit rate limits faster - combine residential proxies with multiple smaller scheduled runs.
Videos returning with no or incomplete metadata Private, age-restricted, or member-only videos expose limited data even when visible in search or channel listings. Some videos surface tags only via YouTube's InnerTube API rather than the page payload - the actor falls back to InnerTube automatically with per-attempt IP rotation, but a residential proxy group materially improves coverage. If a specific video consistently returns empty fields, it is likely restricted at the source and cannot be fully scraped regardless of proxy choice.
Transcript not available for this video
The video may have auto-captions disabled, or the language you requested is not available. Check the transcriptLanguages parameter (note: plural) - if you set ["en"] but the channel primarily publishes in another language, add fallback codes (e.g. ["en", "es", "fr"]). The first matching track in priority order wins; if none match, the first available track is returned. Music videos, live streams in progress, and member-only content routinely have no accessible captions; the actor returns transcript_available: false for these and does not charge the transcript-scraped event.
AI chapters are missing even though transcripts succeeded
Chapter generation requires includeTranscripts: true, generateChapters: true, and a working LLM key for the chosen llmProvider. If the LLM call fails or returns invalid JSON, the chapters field is silently omitted and no chapter-generated charge is emitted - your video metrics and transcripts are unaffected. Check the run log for chapter generation failed warnings, then verify your API key and credit balance with the provider.
Rate limiting or quota exceeded errors
YouTube applies rate limits at both the IP and session level. Datacenter IPs hit these limits faster than residential ones. Add a proxyConfiguration block with useApifyProxy: true and apifyProxyGroups: ["RESIDENTIAL"] to route requests through rotating residential exits. If you are already using residential proxies and still seeing 429 responses, reduce maxVideos per run and split large jobs across multiple scheduled runs to spread the load over time.
Known Limitations
- YouTube may rate-limit requests for large scraping jobs; the actor retries up to 3 times per page
- Comment scraping retrieves top comments only (sorted by YouTube's relevance algorithm), not all comments
- Subscriber counts and view counts use YouTube's abbreviated format (e.g., "1.2M") which provides approximate values
- YouTube Shorts metrics may be less complete than long-form video metrics
- YouTube's page structure may change; the actor handles multiple layout variations but temporary disruptions are possible
- AI analysis quality depends on the chosen model and the number of videos analyzed (15+ recommended)
- Maximum 100 videos per channel per run
- Private or age-restricted videos cannot be scraped
- Tags coverage is partial on datacenter proxies. YouTube selectively strips the
keywordsfield from video pages returned to IPs it flags as suspicious. The actor falls back to YouTube's InnerTube API (with per-attempt IP rotation) to recover tags, but some videos may still return emptytags: []. For maximum tag coverage, use a residential proxy group. Note that many large channels (e.g. MrBeast) genuinely set no tags on their videos - an emptytagsarray in those cases is the correct result, not a scraping failure. - Transcripts are not available on every video. Private, age-restricted, member-only, and music-industry videos typically disable captions. Music videos often have ASR-only captions that transcribe background lyrics. Live streams usually have no captions while live and limited ASR after the stream ends. The actor returns
transcript_available: falseand empty strings/arrays when no captions are accessible - this is not a failure. - AI chapters depend on transcript quality. When the transcript is an ASR auto-generation, segmentation follows whatever the ASR heard - filler-heavy, stream-of-consciousness videos may produce fewer or less-precise chapters than tightly-scripted content. Very short videos (<2 minutes) may collapse into a single chapter, which is suppressed (min 3 chapters). On rare LLM parse failures the
chaptersfield is absent and no charge is emitted.
Frequently Asked Questions
What transcripts does this return?
YouTube videos typically have one or both of: human-uploaded captions (manually created by the uploader, highest accuracy) and auto-generated ASR captions (machine transcribed, available on most public videos in many languages). The scraper returns the best match for your transcriptLanguages preference, preferring uploaded over ASR. The transcript_kind field tells you which was returned. Paid/private captions and closed captions locked behind DRM are not accessible.
How do AI chapters work?
When generateChapters is enabled, the actor passes each video's timestamped transcript to your chosen LLM with a strict JSON schema prompt. The model identifies topic-shift boundaries and returns 3-8 chapters with start / end seconds, a concise title, and a 1-sentence summary. This is essentially automatic chapter markers for videos where the creator didn't write them - useful for podcasters publishing to YouTube, agencies clipping long-form content, or anyone who wants searchable timestamps without manual authoring. Requires both includeTranscripts=true and an LLM API key; you are charged $0.01 per video only when chapters are successfully produced.
Why is this scraper faster and cheaper than others?
This actor uses direct HTTP requests (httpx) to fetch YouTube pages instead of launching a browser. It parses YouTube's server-rendered ytInitialData JSON payload, which contains all the structured data needed. No Playwright or Chromium overhead means faster runs and lower compute costs.
Can I scrape multiple channels in one run?
Yes. In channel mode, provide multiple URLs in the channelUrls array. Each channel will be scraped sequentially with up to maxVideos videos per channel. AI analysis is generated per channel.
How is engagement rate calculated? Engagement rate is calculated as (like count + comment count) / view count, expressed as a percentage. This metric normalizes for audience size, allowing fair comparison between channels of different scales.
Does this scraper detect YouTube Shorts?
Yes. Each video includes an is_short boolean field that identifies whether the content is a YouTube Short. This allows you to filter or analyze Shorts separately from long-form content.
Can I search for videos by keyword?
Yes. Use search mode with a searchQuery parameter. For example, search for "product review 2026" to find recent review content. Search mode returns videos from across YouTube, not limited to specific channels.
How do I track channel performance over time? Schedule regular runs on Apify (weekly or monthly) for the same channels. Over time, you build a dataset showing subscriber growth, engagement trends, content theme shifts, and upload frequency changes.
Use with AI agents (LangChain & LangGraph)
Outputs from this actor are agent-ready: video metadata, timed transcripts, and AI chapters are returned as structured JSON, so you can plug a run directly into a LangChain Tool or a LangGraph StateGraph node without post-processing.
LangChain - wrap the actor as a Tool
from apify_client import ApifyClientfrom langchain.tools import Toolclient = ApifyClient("YOUR_APIFY_TOKEN")def youtube_scraper(channel_handle: str) -> list[dict]:run = client.actor("harvestlab/youtube-scraper").call(run_input={"mode": "channel","channelUrls": [f"https://www.youtube.com/{channel_handle}"],"maxVideos": 20,"includeTranscripts": True,})return list(client.dataset(run["defaultDatasetId"]).iterate_items())youtube_tool = Tool(name="youtube_scraper",description="Fetch YouTube channel videos + transcripts. Input: a channel handle like '@mkbhd'.",func=youtube_scraper,)# Agent calls it: youtube_tool.invoke({"channelHandle": "@mkbhd"})
LangGraph - call the actor inside a StateGraph node
from typing import TypedDictfrom apify_client import ApifyClientfrom langgraph.graph import StateGraph, ENDclient = ApifyClient("YOUR_APIFY_TOKEN")class State(TypedDict):channelHandle: strtranscripts: list[dict]def fetch_youtube(state: State) -> State:run = client.actor("harvestlab/youtube-scraper").call(run_input={"mode": "channel","channelUrls": [f"https://www.youtube.com/{state['channelHandle']}"],"maxVideos": 10,"includeTranscripts": True,})items = list(client.dataset(run["defaultDatasetId"]).iterate_items())return {**state, "transcripts": items}graph = StateGraph(State)graph.add_node("fetch_youtube", fetch_youtube)graph.set_entry_point("fetch_youtube")graph.add_edge("fetch_youtube", END)app = graph.compile()
See also apify/actor-templates/js-langchain and js-langgraph-agent for full template scaffolds in JavaScript.
Scheduling and webhooks
Schedule daily or weekly YouTube runs in Apify Console to keep a live feed of channel uploads or keyword results. Wire a webhookUrl in n8n or Make to push each new video into a Notion content calendar, Slack editorial alert, or CMS queue the moment a run completes.
Legal and Compliance
This actor scrapes publicly available data. By using this actor, you agree to the following:
- Your responsibility: You are solely responsible for ensuring your use complies with all applicable laws, regulations, and the target website's terms of service. This includes but is not limited to GDPR (EU), CCPA (California), and other data protection laws in your jurisdiction.
- No legal advice: This actor does not constitute legal advice. Consult a qualified attorney if you have questions about the legality of your specific use case.
- Intended use: This actor is designed for legitimate business purposes such as market research, competitive analysis, and academic research using publicly accessible data.
- Data handling: You are responsible for how you store, process, and share any data collected. Ensure you have a lawful basis for processing any personal data under applicable privacy laws.
- Rate limiting: This actor implements polite crawling practices including request delays and retry backoff to minimize impact on target servers.
- No warranty: This actor is provided "as is" without warranty. Data accuracy depends on the target website's content and structure.
- YouTube data: YouTube's terms of service restrict automated data collection. Consider using the official YouTube Data API for production use cases. This actor is intended for analytics and research purposes.
- Personal data notice: Channel and video data may include creator names, profile images, and commenter usernames. Under GDPR and similar regulations, this constitutes personal data subject to data protection requirements. Ensure you have a lawful basis for processing. Do not use extracted data for unsolicited contact or harassment.
Related Actors
- Google News Monitor - Pair YouTube creator coverage with Google News article tracking for cross-channel media monitoring on the same topic, brand, or industry event.
- Reddit Scraper - Capture community discussion of the videos you're tracking; cross-reference YouTube engagement with Reddit thread sentiment for a fuller audience-reaction picture.
- ProductHunt Scraper - Track creator-economy product launches and creator tools alongside YouTube channel analytics for end-to-end creator and launch intelligence.