YouTube Metadata Scraper & Analyzer
Pricing
from $200.00 / 1,000 video processeds
YouTube Metadata Scraper & Analyzer
Extract YouTube video and live stream metadata, views, likes, comments, channel details, captions, and daily snapshots. Analyze performance over time with production-ready output views and optional comment sentiment analysis.
Pricing
from $200.00 / 1,000 video processeds
Rating
0.0
(0)
Developer

Marielise
Actor stats
0
Bookmarked
3
Total users
2
Monthly active users
2 days ago
Last modified
Categories
Share
YouTube Video Metadata
Extract structured metadata from YouTube videos, Shorts, premieres, and ended live streams without downloading the media file. The Actor is built around the same yt-dlp + Cloudflare WARP + PO-token stack as the existing downloader actor, with residential proxies used only as a last-resort fallback.
What It Scrapes
- Channel details: channel name, channel ID, handle, verified status, follower count, URLs
- Video details: title, description, duration, category, tags, media type, embed/playability fields
- Engagement: views, likes, comments, concurrent viewers when present
- Publication: upload date, release date, timestamps, availability
- Live-specific info:
live_status,is_live,was_live, live chat replay presence - Captions: subtitle languages, automatic caption languages, optional full caption-track maps
- Technical info: selected format, codecs, resolution, fps, approximate filesize
- Comments: YouTube comments are fetched by default with a capped limit
- Comment sentiment: optional integrated AI summary using the actor environment's Gemini setup
- Format inventory: counts by muxed/video-only/audio-only/storyboard, plus optional per-format summaries
- Daily snapshot: a clean view/like/comment snapshot for scheduled tracking
- Metric quality: confidence, warnings, previous-snapshot deltas, and fallback diagnostics
Input
{"url": "https://www.youtube.com/live/q3yHzSmemN4","saveRawToKeyValueStore": true,"rawStoreName": "youtube-metadata-raw","historyDatasetName": "youtube-video-history"}
Input Fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | - | Single YouTube video or live-stream URL |
urls | array | No | - | Optional batch input for API users who want multiple URLs |
proxyCountry | string | No | - | Optional country for residential fallback |
useResidentialProxyFallback | boolean | No | true | Only used if WARP/direct extraction fails |
maxComments | integer | No | 50 | Maximum YouTube comments to fetch per video. Comments are fetched by default. |
analyzeCommentSentiment | boolean | No | false | Optional AI sentiment analysis over fetched comments using the actor's integrated Gemini setup and server-side GOOGLE_API_KEY |
includeFormats | boolean | No | false | Include summarized entries for every format variant |
includeThumbnails | boolean | No | false | Include all thumbnail objects |
includeCaptionTracks | boolean | No | false | Include the full subtitle and auto-caption track maps |
includeRawMetadata | boolean | No | false | Embed the raw yt-dlp JSON in the dataset output |
saveRawToKeyValueStore | boolean | No | true | Save full raw JSON to key-value store keys |
rawStoreName | string | No | default store | Named store for persistent raw snapshots across runs |
historyDatasetName | string | No | - | Named dataset to append daily metric snapshots across runs |
Daily Tracking
For scheduled runs, set:
historyDatasetNameto append one slim daily record per video across runsrawStoreNameto keepRAW-LATEST-{videoId}andRAW-{videoId}-{YYYY-MM-DD}keys in a persistent store
That gives you both:
- a compact time series for views/likes/comments
- a full raw metadata snapshot for each day
Required Secrets
GOOGLE_API_KEY: only needed whenanalyzeCommentSentiment=trueGEMINI_COMMENTS_MODEL: optional override, defaults togemini-2.5-flash
Users do not pass these in actor input. They belong in the actor environment.
Monetization
Publish this actor with Pay per event + usage enabled. That is the production-safe setup because it passes compute and proxy usage through to the user and avoids uncovered residential-proxy losses.
Configure these exact PPE events in Apify:
| Event name | Title | Price | Description |
|---|---|---|---|
video-processed | Video processed | $0.30 | Charged once for each successfully processed YouTube video or live stream URL, including metadata, engagement stats, comments, and daily snapshot output. |
comment-sentiment-analyzed | Comment sentiment analyzed | $0.10 | Charged only when comment sentiment analysis is enabled and successfully completed for the fetched YouTube comments. |
residential-fallback-used | Residential proxy fallback used | $1.00 | Charged only when direct or WARP extraction fails and the actor must use residential proxy fallback to complete the scrape. |
Production charging rules:
- Charge
video-processedonly on successful results - Charge
comment-sentiment-analyzedonly when sentiment succeeds - Charge
residential-fallback-usedonly whenextraction.proxyMode = residential - Do not charge for invalid URLs, unsupported channel URLs, playlists, or failed scrapes
Output Shape
Each successful dataset item is split into sections:
schemaVersionsummarywarningsinputvideochannelengagementmetricscommentspublicationliveStreamcaptionscontentStructureassetstechnicalformatsextractiondailySnapshotstorage
Apify dataset views are also configured separately for:
overviewmetricsOnlyengagementdailySnapshotsvideochannelcommentscommentSentimentqualityWarningstechnicalerrors
Example Output
{"status": "success","input": {"originalUrl": "https://www.youtube.com/live/q3yHzSmemN4","normalizedUrl": "https://www.youtube.com/watch?v=q3yHzSmemN4","videoId": "q3yHzSmemN4","label": "daily-check"},"video": {"id": "q3yHzSmemN4","title": "Big Live Slot Play + HUGE GIVEAWAY On PlayFame","durationText": "55:50","mediaType": "livestream"},"channel": {"name": "NG Slot","followerCount": 1230000,"isVerified": true},"engagement": {"viewCount": 14152,"likeCount": 423,"commentCount": 19},"metrics": {"status": "complete","confidence": "high","rawMetricSource": "warp","retryUsed": false,"deltas": {"viewCount": 37,"likeCount": 0,"commentCount": 0}},"liveStream": {"liveStatus": "was_live","isLive": false,"wasLive": true,"hasLiveChatReplay": true},"dailySnapshot": {"scrapeDate": "2026-03-17","viewCount": 14152,"likeCount": 423,"commentCount": 19}}
Production Notes
- CDN media URLs in raw
yt-dlpoutput are temporary and expire. Persist them only if you need same-run debugging. - If the default WARP/direct path works, the Actor will not send traffic through residential proxies.
- Ended live streams usually expose
was_live,live_status, likes, comments, and live chat replay availability. - Comment sentiment analysis is optional and uses the actor's server-side
GOOGLE_API_KEYwith Gemini. Users do not pass their own key in the input. - Public YouTube subscriber counts are rounded.
channel.followerCountis not an exact private subscriber count. - View, like, and comment counts can lag or shift slightly on live and recently-ended live streams.
- The actor flags suspicious metric drops and missing counts in
warnings,metrics.status, andmetrics.confidenceinstead of hiding them. RUN-SUMMARYis saved in the key-value store with run totals, fallback counts, sentiment counts, and metric quality counts.RUN-SUMMARYalso includes the expected PPE pricing configuration so production runs can be checked against the published pricing.
Quality Controls
- Metadata is retried once when
viewCount,likeCount, orcommentCountis missing. - Previous snapshots are loaded from
RAW-LATEST-{videoId}to compute deltas and detect suspicious regressions. - Comments are fetched in a second pass so capped comment retrieval does not overwrite the true YouTube
commentCount. - Extraction diagnostics include attempt history, player client used, PO-token usage, and whether residential fallback was needed.
- Comment sentiment uses
gemini-2.5-flashwiththinkingBudget: 0to reduce cost and keep sentiment pricing predictable. - Residential fallback is disabled for a request when the user spending limit cannot cover another
residential-fallback-usedcharge. - Comment sentiment is skipped when the user spending limit cannot cover another
comment-sentiment-analyzedcharge. - Regression tests cover URL validation, comment normalization, metric regression warnings, and Gemini sentiment payload normalization.