YouTube Video 360 Intelligence — Bundle Scraper
Pricing
from $11.25 / 1,000 video info snapshots
YouTube Video 360 Intelligence — Bundle Scraper
The most complete YouTube video snapshot on Apify. Metadata, transcript, related videos, and subtitle languages in one parallel run. Built for AI training, journalism, video SEO, and content intelligence. For comments, pair with our dedicated comments actor.
Pricing
from $11.25 / 1,000 video info snapshots
Rating
0.0
(0)
Developer
SIÁN OÜ
Maintained by CommunityActor stats
1
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
YouTube Video 360 Intelligence — Bundle Scraper 🎬
The most complete YouTube video snapshot on Apify. One run pulls metadata, transcript, related videos, and subtitle languages into a single tidy-long dataset — in parallel. Built for AI training, journalism, video SEO, and content intelligence.
💬 Need comments? This actor deliberately excludes comments to keep the bundle focused and predictable. Pair with our Cheapest YouTube Comments Scraper or AI Comments + Questions Extractor.
Why this actor
Most YouTube-video scrapers on Apify ship one endpoint. To build a real per-video pack you have to chain three or four actors together — and your data team still has to JOIN the results.
This actor delivers four YouTube video endpoints in one parallel run, one bill, one dataset:
| This actor | Top per-endpoint commodity scrapers | |
|---|---|---|
| Video metadata (title, views, likes, description, tags) | ✅ | ✅ (single-endpoint) |
| Full transcript with timestamped chunks | ✅ | ✅ (separate transcript-only actor) |
| Related / recommended videos graph | ✅ | partial / not surfaced |
| Subtitle languages (native + 100+ auto-translation) | ✅ | rarely exposed |
| Parallel execution (one wall-clock latency) | ✅ | sequential |
Unified tidy-long row schema with rowType | ✅ | flat blob — flatten yourself |
| Opt-out per endpoint (transparent pricing) | ✅ | n/a |
Drop the result into Pandas, DuckDB, or BigQuery and filter by rowType to slice the bundle.
What you get back
A single dataset where every row carries a rowType discriminator:
rowType | What it is | Charged event |
|---|---|---|
video_info | Full per-video metadata snapshot — title, channel, views, likes, description, tags, length, publishedAt, category, available countries | video-info-result |
transcript | One consolidated transcript row per video. Full transcriptText + structured chunks (with startMs / endMs / text) in extra.chunks | video-transcript-result |
related_video | One row per related/recommended video (videoId, channelTitle, views, publishedAt) | video-related-row |
subtitle_language | One row per native subtitle language available; first row carries the full 100+ auto-translation language list in extra.translationLanguages | video-subtitle-language-row |
error | Status row (`status: invalid_video | endpoint_unavailable |
Every row also carries _sourceVideoInput, _sourceVideoId, _sourceEndpoint, _fetchedAt, and _page so you can group, dedupe, and trace lineage in your downstream pipeline.
Input
The actor accepts a single video or a batch of videos. Each entry can be:
- Video ID —
dQw4w9WgXcQ(11 chars, fastest). - Watch URL —
https://www.youtube.com/watch?v=dQw4w9WgXcQ. - Short URL —
https://youtu.be/dQw4w9WgXcQ. - Shorts URL —
https://www.youtube.com/shorts/.... - Embed URL —
https://www.youtube.com/embed/.... - Live URL —
https://www.youtube.com/live/....
{"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ","includeEndpoints": ["info", "transcript", "related", "subtitles"],"maxRelatedPages": 1}
{"videoUrls": "dQw4w9WgXcQ\naircAruvnKk\nkJQP7kiw5Fk","includeEndpoints": ["info", "transcript"],"maxRelatedPages": 1}
Input fields
| Field | Type | Default | Description |
|---|---|---|---|
videoUrl | string | — | One video (ID or URL). One of videoUrl or videoUrls is required. |
videoUrls | string | — | Batch — comma/newline-separated list. Max 50 per run. Overrides videoUrl. |
includeEndpoints | array | All 4 | Pick which endpoints to scrape. Defaults to: info, transcript, related, subtitles. info is always required (anchor row). |
maxRelatedPages | integer | 1 | Max pages of related videos to fetch (each ~20 rows). Range 1–10. |
transcriptLang | string | — | Optional. Preferred transcript language code (en, es, ja, pt). |
How endpoint-level gating works
Every video is scraped in parallel across all 4 endpoints (Promise.allSettled) for one wall-clock latency, not four. If an endpoint can't return data for a specific video — e.g. transcript on a silent music video, or subtitles on a livestream that YouTube is still processing — the actor:
- Detects the body-level error (yt-api wraps endpoint errors at the body level, not HTTP).
- Does not charge you for that endpoint.
- Pushes one status row (
status: endpoint_unavailable, with the upstream message) so you have an auditable trail.
The same logic catches invalid videos (404s) and aborts downstream calls for that single video, saving your quota.
Pricing
Pay-per-event. Bronze tier rates shown — higher tiers auto-ladder via the Apify tier system (FREE is intentionally higher to prevent abuse — see Apify pricing docs).
| Event | BRONZE | What it covers |
|---|---|---|
apify-actor-start (once) | $0.005 | Actor lifecycle |
video-info-result ⭐ | $0.015 | Per video — full metadata snapshot (the anchor row) |
video-transcript-result | $0.012 | Per video — one consolidated transcript row with timestamped chunks |
video-related-row | $0.003 | Per related-video row |
video-subtitle-language-row | $0.001 | Per native subtitle language row |
⭐ = primary / headline event (snapshot anchor; exactly 1 per successful video).
Estimated full-bundle cost (Rick Astley, 1 related page, BRONZE tier): $0.005 (start) + $0.015 (info) + $0.012 (transcript) + 18 × $0.003 (related) + 6 × $0.001 (subtitle langs) = ~$0.092 / video.
Cheap-snapshot cost (includeEndpoints: ["info"] only):
$0.005 (start) + $0.015 (info) = ~$0.020 / video for the metadata snapshot alone.
Skip endpoints you don't need — never pay for data you didn't request.
Use cases
-
Content-Intelligence Platforms — Per-Video Data Packs. Platforms aggregating creator and brand video performance need a complete per-video snapshot: metadata, transcript, related-video graph, subtitle coverage. One actor run replaces four separate scraping pipelines.
-
AI / ML Training Datasets — Multi-Modal Video Data. Teams training video understanding, summarization, or recommendation models need text + relational + metadata signals together. The tidy-long
rowTypeschema (video_info,transcript,related_video,subtitle_language) is purpose-built for Pandas, DuckDB, and BigQuery pipelines. -
Journalism + OSINT — Full Provenance Pack. A research desk vetting a viral video needs metadata (publishedAt, channel, views), full transcript, the recommended/related sphere, and language reach (subtitle coverage). All in one run, one bill — no juggling four scrapers under deadline pressure.
-
Video SEO + Competitor Research. Marketing teams competing on YouTube need to scrape competitors' top videos as full snapshots — descriptions, tags, related-video graph (their content network), and subtitle coverage (their international reach). One snapshot per competitor video, batched across your watchlist.
-
Content-Localisation Teams — Subtitle-Language Audits. L10n teams need to know which videos already have native subtitles versus YouTube auto-translation only. The subtitle-languages endpoint returns native language coverage; combine with metadata and transcript to baseline a creator's localisation maturity.
Output schema (unified rows)
Every row carries:
- Tracing fields:
_sourceVideoInput,_sourceVideoId,_sourceEndpoint,_fetchedAt,_page - Discriminator:
rowType(one ofvideo_info | transcript | related_video | subtitle_language | error) - Status:
status(success/invalid_video/endpoint_unavailable/error)
Plus per-rowType fields:
video_info — videoId, videoPageUrl, videoTitle, videoDescription, lengthText, lengthSeconds, publishedAt, viewCount, likeCount, category, videoKeywords, thumbnailUrl, isLiveContent, isShortsEligible, isFamilySafe, isPrivate, isUnlisted, hasCaption, defaultVideoLanguage, defaultVideoLanguageCode, availableCountries, channelId, channelTitle.
transcript — transcriptText (full joined text), transcriptLanguageCode, transcriptLanguageTitle, transcriptChunkCount. Structured chunks (with startMs, endMs, text per chunk) live in extra.chunks.
related_video — relatedVideoId, relatedVideoPageUrl, channelId, channelTitle, channelHandle, publishedAt, publishedTimeText, viewCount, viewCountText, lengthText, thumbnailUrl. The related video's own title sits in extra.relatedVideoTitle.
subtitle_language — subtitleLanguageCode, subtitleLanguageName, subtitleUrl (direct srv1 download URL; expires in a few hours), subtitleIsTranslatable. The first row carries the full 100+ auto-translation language list in extra.translationLanguages.
Any endpoint-specific upstream field that doesn't fit the unified schema is captured in extra (object) so you never lose data.
FAQ
Q: I only need the transcript. Can I opt out of the rest?
Yes — set includeEndpoints: ["info", "transcript"]. You'll get the metadata anchor + transcript only, and pay only for those events. info is required because it's the bundle's anchor row and source of the row's video title.
Q: Why is info always required?
The info row anchors every downstream row (transcript, related, subtitle) with the video's title and channel. It also lets the actor short-circuit on invalid videos — if info returns "Video unavailable", no other endpoint is called and you don't get charged for them.
Q: What if a video has no transcript?
Some music videos and silent clips have no transcript. The actor pushes one status: endpoint_unavailable row for transcript with the upstream message. No charge.
Q: What if I pass an invalid video ID or a deleted video?
The actor pushes one status: invalid_video row with the upstream message. No upstream calls for transcript/related/subtitles, no charges.
Q: Why are comments excluded? We ship two dedicated comments actors with simpler input modes and lower per-row pricing:
- Cheapest YouTube Comments Scraper for raw comment harvesting.
- YouTube AI Comments + Questions Extractor for AI-extracted Q&A.
Bundling comments into here would commoditize them and make this actor's pricing harder to read. The bundle stays focused on the 4 non-comment endpoints.
Q: Do you handle Shorts URLs (youtube.com/shorts/...)?
Yes — the URL extractor recognises /shorts/ paths and pulls the 11-char video ID. The info endpoint returns the same row shape, with isShortsEligible flagged when YouTube classifies it as a Short.
Q: How fast is it for a batch? Parallel execution within each video + 3-video concurrent batching across the run. Our local smoke test: 3 videos × 4 endpoints = 103 rows in 2.1 seconds.
Q: Are the subtitle URLs downloadable?
Yes — subtitleUrl is YouTube's direct srv1 timed-text URL. Note that these URLs expire after a few hours; download promptly or re-scrape.
Q: Can I get more than 20 related videos?
Yes — set maxRelatedPages up to 10. Each page returns ~20 related videos (sequential pagination via continuation tokens).
⚠️ Trademark Disclaimer
YouTube® is a trademark of Google LLC. This actor is an independent data-discovery tool. It is not affiliated with, endorsed by, sponsored by, or supported by Google LLC or YouTube. All trademarks, registered trademarks, and brand names are the property of their respective owners.
Legal & data privacy
This actor scrapes publicly accessible YouTube video data. Users are responsible for ensuring compliance with YouTube's Terms of Service and applicable data-protection laws (GDPR, CCPA, etc.) when using the data downstream. Don't scrape PII or use scraped data to harass / discriminate. See Apify's blog on legal scraping.
Support
- Email: apify@sian-agency.online
- Issues: use the Apify Console Issues tab for this actor
- More actors: apify.com/sian.agency