YouTube Metadata Scraper & Analyzer avatar

YouTube Metadata Scraper & Analyzer

Pricing

from $200.00 / 1,000 video processeds

Go to Apify Store
YouTube Metadata Scraper & Analyzer

YouTube Metadata Scraper & Analyzer

Extract YouTube video and live stream metadata, views, likes, comments, channel details, captions, and daily snapshots. Analyze performance over time with production-ready output views and optional comment sentiment analysis.

Pricing

from $200.00 / 1,000 video processeds

Rating

0.0

(0)

Developer

Marielise

Marielise

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

2 days ago

Last modified

Share

YouTube Video Metadata

Extract structured metadata from YouTube videos, Shorts, premieres, and ended live streams without downloading the media file. The Actor is built around the same yt-dlp + Cloudflare WARP + PO-token stack as the existing downloader actor, with residential proxies used only as a last-resort fallback.

What It Scrapes

  • Channel details: channel name, channel ID, handle, verified status, follower count, URLs
  • Video details: title, description, duration, category, tags, media type, embed/playability fields
  • Engagement: views, likes, comments, concurrent viewers when present
  • Publication: upload date, release date, timestamps, availability
  • Live-specific info: live_status, is_live, was_live, live chat replay presence
  • Captions: subtitle languages, automatic caption languages, optional full caption-track maps
  • Technical info: selected format, codecs, resolution, fps, approximate filesize
  • Comments: YouTube comments are fetched by default with a capped limit
  • Comment sentiment: optional integrated AI summary using the actor environment's Gemini setup
  • Format inventory: counts by muxed/video-only/audio-only/storyboard, plus optional per-format summaries
  • Daily snapshot: a clean view/like/comment snapshot for scheduled tracking
  • Metric quality: confidence, warnings, previous-snapshot deltas, and fallback diagnostics

Input

{
"url": "https://www.youtube.com/live/q3yHzSmemN4",
"saveRawToKeyValueStore": true,
"rawStoreName": "youtube-metadata-raw",
"historyDatasetName": "youtube-video-history"
}

Input Fields

FieldTypeRequiredDefaultDescription
urlstringYes-Single YouTube video or live-stream URL
urlsarrayNo-Optional batch input for API users who want multiple URLs
proxyCountrystringNo-Optional country for residential fallback
useResidentialProxyFallbackbooleanNotrueOnly used if WARP/direct extraction fails
maxCommentsintegerNo50Maximum YouTube comments to fetch per video. Comments are fetched by default.
analyzeCommentSentimentbooleanNofalseOptional AI sentiment analysis over fetched comments using the actor's integrated Gemini setup and server-side GOOGLE_API_KEY
includeFormatsbooleanNofalseInclude summarized entries for every format variant
includeThumbnailsbooleanNofalseInclude all thumbnail objects
includeCaptionTracksbooleanNofalseInclude the full subtitle and auto-caption track maps
includeRawMetadatabooleanNofalseEmbed the raw yt-dlp JSON in the dataset output
saveRawToKeyValueStorebooleanNotrueSave full raw JSON to key-value store keys
rawStoreNamestringNodefault storeNamed store for persistent raw snapshots across runs
historyDatasetNamestringNo-Named dataset to append daily metric snapshots across runs

Daily Tracking

For scheduled runs, set:

  • historyDatasetName to append one slim daily record per video across runs
  • rawStoreName to keep RAW-LATEST-{videoId} and RAW-{videoId}-{YYYY-MM-DD} keys in a persistent store

That gives you both:

  • a compact time series for views/likes/comments
  • a full raw metadata snapshot for each day

Required Secrets

  • GOOGLE_API_KEY: only needed when analyzeCommentSentiment=true
  • GEMINI_COMMENTS_MODEL: optional override, defaults to gemini-2.5-flash

Users do not pass these in actor input. They belong in the actor environment.

Monetization

Publish this actor with Pay per event + usage enabled. That is the production-safe setup because it passes compute and proxy usage through to the user and avoids uncovered residential-proxy losses.

Configure these exact PPE events in Apify:

Event nameTitlePriceDescription
video-processedVideo processed$0.30Charged once for each successfully processed YouTube video or live stream URL, including metadata, engagement stats, comments, and daily snapshot output.
comment-sentiment-analyzedComment sentiment analyzed$0.10Charged only when comment sentiment analysis is enabled and successfully completed for the fetched YouTube comments.
residential-fallback-usedResidential proxy fallback used$1.00Charged only when direct or WARP extraction fails and the actor must use residential proxy fallback to complete the scrape.

Production charging rules:

  • Charge video-processed only on successful results
  • Charge comment-sentiment-analyzed only when sentiment succeeds
  • Charge residential-fallback-used only when extraction.proxyMode = residential
  • Do not charge for invalid URLs, unsupported channel URLs, playlists, or failed scrapes

Output Shape

Each successful dataset item is split into sections:

  • schemaVersion
  • summary
  • warnings
  • input
  • video
  • channel
  • engagement
  • metrics
  • comments
  • publication
  • liveStream
  • captions
  • contentStructure
  • assets
  • technical
  • formats
  • extraction
  • dailySnapshot
  • storage

Apify dataset views are also configured separately for:

  • overview
  • metricsOnly
  • engagement
  • dailySnapshots
  • video
  • channel
  • comments
  • commentSentiment
  • qualityWarnings
  • technical
  • errors

Example Output

{
"status": "success",
"input": {
"originalUrl": "https://www.youtube.com/live/q3yHzSmemN4",
"normalizedUrl": "https://www.youtube.com/watch?v=q3yHzSmemN4",
"videoId": "q3yHzSmemN4",
"label": "daily-check"
},
"video": {
"id": "q3yHzSmemN4",
"title": "Big Live Slot Play + HUGE GIVEAWAY On PlayFame",
"durationText": "55:50",
"mediaType": "livestream"
},
"channel": {
"name": "NG Slot",
"followerCount": 1230000,
"isVerified": true
},
"engagement": {
"viewCount": 14152,
"likeCount": 423,
"commentCount": 19
},
"metrics": {
"status": "complete",
"confidence": "high",
"rawMetricSource": "warp",
"retryUsed": false,
"deltas": {
"viewCount": 37,
"likeCount": 0,
"commentCount": 0
}
},
"liveStream": {
"liveStatus": "was_live",
"isLive": false,
"wasLive": true,
"hasLiveChatReplay": true
},
"dailySnapshot": {
"scrapeDate": "2026-03-17",
"viewCount": 14152,
"likeCount": 423,
"commentCount": 19
}
}

Production Notes

  • CDN media URLs in raw yt-dlp output are temporary and expire. Persist them only if you need same-run debugging.
  • If the default WARP/direct path works, the Actor will not send traffic through residential proxies.
  • Ended live streams usually expose was_live, live_status, likes, comments, and live chat replay availability.
  • Comment sentiment analysis is optional and uses the actor's server-side GOOGLE_API_KEY with Gemini. Users do not pass their own key in the input.
  • Public YouTube subscriber counts are rounded. channel.followerCount is not an exact private subscriber count.
  • View, like, and comment counts can lag or shift slightly on live and recently-ended live streams.
  • The actor flags suspicious metric drops and missing counts in warnings, metrics.status, and metrics.confidence instead of hiding them.
  • RUN-SUMMARY is saved in the key-value store with run totals, fallback counts, sentiment counts, and metric quality counts.
  • RUN-SUMMARY also includes the expected PPE pricing configuration so production runs can be checked against the published pricing.

Quality Controls

  • Metadata is retried once when viewCount, likeCount, or commentCount is missing.
  • Previous snapshots are loaded from RAW-LATEST-{videoId} to compute deltas and detect suspicious regressions.
  • Comments are fetched in a second pass so capped comment retrieval does not overwrite the true YouTube commentCount.
  • Extraction diagnostics include attempt history, player client used, PO-token usage, and whether residential fallback was needed.
  • Comment sentiment uses gemini-2.5-flash with thinkingBudget: 0 to reduce cost and keep sentiment pricing predictable.
  • Residential fallback is disabled for a request when the user spending limit cannot cover another residential-fallback-used charge.
  • Comment sentiment is skipped when the user spending limit cannot cover another comment-sentiment-analyzed charge.
  • Regression tests cover URL validation, comment normalization, metric regression warnings, and Gemini sentiment payload normalization.