YouTube Intelligence - Metadata, Transcript, Chapters, Entities avatar

YouTube Intelligence - Metadata, Transcript, Chapters, Entities

Pricing

Pay per usage

Go to Apify Store
YouTube Intelligence - Metadata, Transcript, Chapters, Entities

YouTube Intelligence - Metadata, Transcript, Chapters, Entities

Bulk YouTube video data: metadata (title, channel, views, likes, comments, thumbnails) + timestamped transcript + auto-detected chapters + extracted entities/keywords/hashtags + optional auto-translation. Pay only for what you use - 3 tiers.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Seibs.co

Seibs.co

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Bulk YouTube data with metadata, timestamped transcript, chapters, entities, sentiment, and translation. Pay only for the tier you need - 3 tiers.

What does YouTube Intelligence do?

For each YouTube URL or 11-character video ID you pass in, the actor pulls full video metadata (title, description, channel, views, likes, comments, thumbnails, language, age limit, tags, categories, live status, Shorts flag), auto-detected chapters, hashtags, mentions, and description links. The enriched tier adds the full timestamped transcript with segment-level start and duration, top-20 keywords by frequency, named entities categorized into people / organizations / locations / products / events, speaking pace WPM, and filler-word rate. The premium tier adds auto-translation of non-English transcripts via Google Translate. Optional comment fetching, channel-level enrichment, content-type classification, SEO metrics, sponsorship detection, and lexicon sentiment over comments are all available.

AI / RAG / Agent

Drop YouTube content straight into a RAG pipeline - the enriched tier returns timestamped transcript segments that are pre-chunked for embedding, plus named entities and keywords you can use as metadata filters in your vector store. Pass unwind=transcript_segments to get one document per segment. Compatible with LangChain, LlamaIndex, Pinecone, Weaviate, Chroma, and any MCP-aware agent runtime.

from apify_client import ApifyClient
from langchain.schema import Document
from langchain_community.vectorstores import Pinecone
from langchain_openai import OpenAIEmbeddings
client = ApifyClient("APIFY_TOKEN")
run = client.actor("you/youtube-intelligence").call(run_input={
"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
"tier": "enriched",
"unwind": "transcript_segments",
})
docs = [
Document(
page_content=item["text"],
metadata={
"video_id": item["video_id"],
"start": item["start"],
"duration": item["duration"],
"channel": item.get("channel_name"),
"entities": item.get("entities", {}),
},
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items()
]
Pinecone.from_documents(docs, OpenAIEmbeddings(), index_name="youtube-rag")

Features

  • Metadata - title, description, channel id/name/handle/url, subscriber count, views, likes, comments, upload date, duration, language, age limit, tags, categories, all-resolution thumbnails, live status, Shorts flag.
  • Chapters - YouTube's auto-detected chapters when available, parsed from description timestamps (0:00 Intro style) as fallback. chapters_source tells you which.
  • Transcript (enriched / premium) - full text + segment array with start and duration for every line. Tries English variants first, falls back to any available language.
  • Entities + keywords (enriched / premium) - heuristic NER (people / organizations / locations / products / events / unclassified) + top keywords by frequency from the transcript.
  • Translation (premium) - non-English transcripts auto-translated to your translate_to language code.
  • Engagement metrics - engagement rate, like ratio, comment ratio, views per day/hour, trending flag, days since publish.
  • SEO metrics - title/description length, tag count, hashtag count in title/desc, title-keyword density vs description, CTA detection, timestamp count.
  • Content-type classification - heuristic top-3 over 20 video categories with confidence scores.
  • Sponsorship detection - sponsor-read phrase matches with approximate position in the transcript.
  • Speaking pace and filler rate - WPM and percent of tokens that are um/uh/like/you-know.
  • Comments (optional) - top N with author, likes, replies, pinned/hearted/member flags + lexicon sentiment + top-5 by likes.
  • Channel-level enrichment (optional) - subscriber count, total views, video count, country, created date, verified flag, banner / avatar, badges, tags. Cached per channel across the run.

Use cases

  • AI / RAG pipelines - feed transcript segments into vector stores; the transcript_segments unwind view gives you one row per timestamped line ready for chunking.
  • Content monitoring - track creator output, sponsorship reads, sentiment, and engagement velocity across a channel set.
  • Competitor research - SEO metrics + content-type classification across a competitor's catalog.
  • Search and recommendation - metadata + entities + keywords feed a search index.
  • Caption and subtitle pipelines - transcript segments are time-aligned and ready for SRT/VTT generation.
  • Trend research - engagement metrics + trending flag + views per hour identify breakout videos early.
  • Localization - premium-tier translation produces a same-language corpus from a multilingual creator set.

FAQ

Q: Is this legal? A: Yes. The actor only reads publicly accessible YouTube video pages and metadata that YouTube exposes to anonymous browsers - the same data you see when you open a video URL. No login walls are bypassed, no age-restricted content is unlocked, and no private / unlisted videos are accessed. You are responsible for complying with YouTube's terms when redistributing the output.

Q: Why might a run fail or return partial data? A: Most failures trace back to upstream YouTube rate-limiting, datacenter-IP blocks, or unavailable transcripts. Use the recommended RESIDENTIAL Apify Proxy group - datacenter IPs are blocked within seconds. Transcripts are not always available (music, low-view content, age-restricted, creator-disabled); the transcript.available flag and transcript.reason field tell you why, and you are only charged the metadata event for those videos, not the transcript event.

Q: How does this differ from karamelo/youtube-transcripts and other YouTube actors on Apify? A: Three things: (1) tiered pay-per-event pricing - $0.001/video metadata-only, $0.004 enriched (transcript + entities + keywords + speaking pace + filler rate), $0.008 premium (auto-translation) - so you only pay for the tier you need; (2) content classification (heuristic top-3 over 20 video categories with confidence scores), sentiment over comments (lexicon-based), sponsorship detection (phrase-library scan of the transcript), and SEO metrics (title/desc length, CTA detection, tag count) that pure transcript actors don't ship; (3) MCP-ready - one row per (video, transcript_line) via unwind=transcript_segments drops straight into LangChain / LlamaIndex / Pinecone / Weaviate / Chroma / any MCP-aware agent runtime. Comments fetching is included in the tier price, not a separate add-on.

Q: How fresh is the data? A: Every record is fetched live at run time - the scraped_at ISO timestamp tells you exactly when each video was collected. No stale cache.

Q: Can I schedule daily or weekly runs? A: Yes. Use Apify's built-in Schedules feature to monitor a creator set on any cron interval. Pair with a webhook to push only the diff (new uploads, sponsorship changes, engagement spikes) to your destination.

Q: Does it integrate with my stack? A: Yes - via Zapier, Make, n8n, direct webhook, or MCP for AI agents. The transcript_segments unwind view is pre-shaped for vector-store ingestion (Pinecone, Weaviate, Chroma) and LangChain / LlamaIndex pipelines. See the Integrations section below.

Q: What does it cost in practice? A: Pay-per-event tiered: metadata $0.001/video, enriched $0.004/video, premium $0.008/video. A 1,000-video competitor-catalog enriched run lands at ~$4. A 100-video premium-with-translation run lands at ~$0.80. No subscription, no minimum.


Pair this actor with other SEIB intelligence sources to build a richer content-intelligence motion:

  • Reddit Topic Watcher - track the same topics across Reddit threads and pair with YouTube transcript signals to triangulate cultural moments
  • B2B Sales Triggers - detect when creators or their sponsors announce funding, partnerships, or hiring to time outreach
  • Google Maps Reviews Pro - cross-reference reviewer-mentioned businesses or products against creator sponsorship reads

Integrations

- Zapier - push to HubSpot/Salesforce/Pipedrive/Apollo
- Make.com - workflow automation
- n8n - self-hosted automation
- Apify webhooks - POST to your endpoint
- API + dataset export (JSON/CSV/Excel/XML)
- MCP / AI agents - call from Claude/GPT/LangChain

Input

{
"video_urls": [
"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"https://youtu.be/9bZkp7q19f0",
"M7lc1UVf-VE"
],
"tier": "enriched",
"translate_to": "en",
"fetch_comments": true,
"comment_count": 50,
"fetch_channel_data": true,
"use_apify_proxy": true,
"apify_proxy_groups": ["RESIDENTIAL"],
"max_concurrency": 6
}

Output

Sample output: ./.actor/sample-output.json — copy-paste-ready preview of real-looking records.

First record inline:

{
"input": "https://www.youtube.com/watch?v=K8x4ZqRpL2c",
"video_id": "K8x4ZqRpL2c",
"scraped_at": "2026-05-13T19:02:14Z",
"tier": "premium",
"available": true,
"reason": null,
"title": "I rebuilt my $1,200 mechanical keyboard from scratch (and it sounds incredible)",
"description": "Build sheet, switches, and where to buy in the description. Timestamps:\n0:00 Intro\n1:42 Why this build\n4:15 Sourcing the case + plate\n9:08 Lubing 88 switches (oh no)\n18:22 Stabilizers \u2014 the secret to a clean sound\n26:40 Assembly and first sound test\n31:55 Final thoughts\n\nGear used: https://example.com/build-list\nMy keycaps: https://example.com/gmk-set",
"channel_id": "UCqBpZ4_typingmechanic",
"channel_name": "Typing Mechanic",
"channel_handle": "@typingmechanic",
"channel_url": "https://www.youtube.com/@typingmechanic",
"subscriber_count": 412000,
"duration_seconds": 2118,
"view_count": 184221,
"like_count": 12482,
"comment_count": 1844,
"upload_date": "2026-05-08",
"publish_date": "20260508",
"live_status": "not_live",
"is_short": false,
"categories": [
"Howto & Style",
"Science & Technology"
],
"tags": [
"mechanical keyboard",
"custom keyboard",
"lubing switches",
"GMK",
"keyboard build",
"ASMR keyboard"
],
"language": "en",
"age_limit": 0,
"thumbnail_url": "https://i.ytimg.com/vi/K8x4ZqRpL2c/maxresdefault.jpg",
"thumbnails": [
{
"url": "https://i.ytimg.com/vi/K8x4ZqRpL2c/maxresdefault.jpg",
"width": 1280,
"height": 720
}
],
"chapters": [
{
"title": "Intro",
"start_seconds": 0,
"end_seconds": 102,
"timestamp": "0:00"
},
{
"title": "Why this build",
"start_seconds": 102,
"end_seconds": 255,
"timestamp": "1:42"
},
{
"title": "Sourcing the case + plate",
"start_seconds": 255,
"end_seconds": 548,
"timestamp": "4:15"
},
{
"title": "Lubing 88 switches",
"start_seconds": 548,
"end_seconds": 1102,
"timestamp": "9:08"
},
{
"title": "Stabilizers",
"start_seconds": 1102,
"end_seconds": 1600,
"timestamp": "18:22"
},
{
"title": "Assembly and first sound test",
"start_seconds": 1600,
"end_seconds": 1915,
"timestamp": "26:40"
},
{
"title": "Final thoughts",
"start_seconds": 1915,
"end_seconds": 2118,
"timestamp": "31:55"
}
],
"chapters_source": "description",
"hashtags": [
"#keyboardbuild",
"#mechkeys"
],
"mentions": [],
"description_links": [
"https://example.com/build-list",
"https://example.com/gmk-set"
],
"availability": "public",
"webpage_url": "https://www.youtube.com/watch?v=K8x4ZqRpL2c",
"transcript": {
"available": true,
"reason": null,
"language": "en",
"is_generated": false,
"segment_count": 612,
"word_count": 4842,
"text": "Welcome back to the channel \u2014 today we're tearing down a $400 keyboard kit and rebuilding it with everything I've learned over the last two years of typing nonsense...",
"segments": [
{
"text": "Welcome back to the channel.",
"start": 0.0,
"duration": 2.4
},
{
"text": "Today we're tearing down a $400 keyboard kit.",
"start": 2.4,
"duration": 3.1
}
],
"formatted": {
"format": "text",
"text": "Welcome back to the channel. Today we're tearing down..."
}
},
"keywords": [
{
"term": "switch",
"count": 88
},
{
"term": "stabilizer",
"count": 42
},
{
"term": "case",
"count": 31
},
{
"term": "lube",
"count": 28
},
{
"term": "sound",
"count": 24
},
{
"term": "keyboard",
"count": 142
}
],
"named_entities": [],
"named_entities_categorized": {
"people": [
"Theo Browne"
],
"organizations": [
"GMK",
"Holy Pandas",
"Cherry"
],
"locations": [],
"products": [
"Mode SixtyFive",
"GMK Botanical",
"Krytox 205g0",
"Durock V2"
],
"events": [],
"unclassified": []
},
"transcript_translated": null,
"engagement_metrics": {
"engagement_rate": 7.78,
"like_ratio": 6.78,
"comment_ratio": 1.0,
"views_per_day": 36844.2,
"views_per_hour": 1535.2,
"is_trending": true,
"days_since_publish": 5
},
"seo_metrics": {
"title_length_chars": 76,
"title_length_words": 14,
"description_length_chars": 482,
"description_length_words": 78,
"tag_count": 6,
"hashtag_count_in_title": 0,
"hashtag_count_in_description": 2,
"title_keyword_density": [
{
"term": "keyboard",
"title_count": 1,
"description_count": 4
}
],
"has_call_to_action": true,
"has_timestamps_in_description": 7
},
"content_type_classification": {
"top_3": [
{
"type": "tutorial_howto",
"confidence": 0.74
},
{
"type": "review",
"confidence": 0.18
},
{
"type": "vlog",
"confidence": 0.05
}
],
"all_scores": {
"tutorial_howto": 0.74,
"review": 0.18,
"vlog": 0.05,
"unboxing": 0.03
}
},
"sponsorship_detected": {
"has_sponsorship": true,
"segments": [
{
"phrase": "today's video is sponsored by",
"position_estimate": 38.4,
"char_index": 1842
}
]
},
"speaking_pace_wpm": 168.4,
"filler_word_rate": 2.8,
"transcript_urls": [
"https://example.com/build-list"
],
"comments": {
"available": true,
"reason": null,
"count": 50,
"sort": "top",
"total_words": 2412,
"comments": [
{
"id": "Ugw1",
"author_name": "@clackclackdaddy",
"author_channel_id": "UC1",
"text": "The stabilizer breakdown alone earned my sub.",
"like_count": 412,
"reply_count": 4,
"published_relative": "4 days ago",
"is_pinned": true,
"is_hearted": true,
"is_member": false
},
{
"id": "Ugw2",
"author_name": "@switchsommelier",
"author_channel_id": "UC2",
"text": "What lube did you use on the springs vs housings?",
"like_count": 188,
"reply_count": 12,
"published_relative": "3 days ago",
"is_pinned": false,
"is_hearted": false,
"is_member": true
}
]
},
"comments_top_5": [
{
"author_name": "@clackclackdaddy",
"text": "The stabilizer breakdown alone earned my sub.",
"like_count": 412
},
{
"author_name": "@switchsommelier",
"text": "What lube did you use on the springs vs housings?",
"like_count": 188
}
],
"comment_sentiment": {
"positive_pct": 78.4,
"negative_pct": 4.2,
"neutral_pct": 17.4,
"top_positive_phrases": [
"earned my sub",
"incredible build",
"love the detail"
],
"top_negative_phrases": [
"overpriced",
"too long"
]
},
"channel": {
"available": true,
"reason": null,
"channel_id": "UCqBpZ4_typingmechanic",
"handle": "@typingmechanic",
"title": "Typing Mechanic",
"description": "Custom mechanical keyboards, sound tests, and the occasional rant.",
"subscriber_count": 412000,
"total_views": 28412000,
"video_count": 184,
"country": "US",
"created_date": "2019-04-12",
"verified": true,
"custom_url": "@typingmechanic",
"banner_url": null,
"avatar_url": null,
"channel_badges": [
"Verified"
],
"tags": [
"mechanical keyboards",
"ASMR",
"tech"
],
"is_family_safe": true
}
}

Sample record (abridged, tier: enriched):

{
"video_id": "dQw4w9WgXcQ",
"title": "Rick Astley - Never Gonna Give You Up (Official Video)",
"channel_name": "Rick Astley",
"subscriber_count": 5230000,
"view_count": 1612345678,
"like_count": 18234567,
"comment_count": 2345678,
"duration_seconds": 213,
"upload_date": "2009-10-25",
"is_short": false,
"tier": "enriched",
"available": true,
"chapters": [
{"title": "Intro", "start_seconds": 0, "end_seconds": 18}
],
"transcript": {
"available": true,
"language": "en",
"is_generated": false,
"segment_count": 67,
"word_count": 478,
"text": "We're no strangers to love...",
"segments": [
{"text": "We're no strangers to love", "start": 18.5, "duration": 3.2}
]
},
"keywords": [{"term": "love", "count": 12}, {"term": "give", "count": 9}],
"named_entities_categorized": {
"people": ["Rick Astley"],
"organizations": [],
"locations": [],
"products": [],
"events": [],
"unclassified": []
},
"engagement_metrics": {
"engagement_rate": 1.27,
"like_ratio": 1.13,
"views_per_day": 261000,
"is_trending": false,
"days_since_publish": 6041
},
"seo_metrics": {
"title_length_chars": 56,
"tag_count": 14,
"has_call_to_action": false
},
"scraped_at": "2026-05-14T12:00:00Z"
}

The dataset preview ships with four views: Overview (key columns), Detailed (every field with formatting hints, including thumbnail images and clickable channel/video links), Transcript segments (one row per timestamped line per video - exploded via unwind, ideal for caption rendering, search indexing, and per-segment LLM analysis), and a CSV download.

Pricing

PAY_PER_EVENT - tiered.

TierPriceWhat you get
metadata$0.001/videoMetadata + chapters + hashtags + description links
enriched$0.004/video+ timestamped transcript + named entities + top keywords + speaking pace + filler rate
premium$0.008/video+ auto-translation of non-English transcripts

Honest pricing: PPE charges fire on success. If transcripts are unavailable for a video (which happens - YouTube doesn't auto-generate for all content), you are only charged the metadata event for that video, not the transcript event.

Technical notes

Auto-generated vs human transcripts: transcript.is_generated: true means YouTube's ASR; false means a creator-provided caption file. The actor prefers human captions when both exist.

Translations: Set translate_to to any Google Translate-supported language code (e.g. es, fr, ja, pt-br). Premium tier only.

is_short: True if the video is a YouTube Short (vertical, sub-60s). Useful for filtering or A/B-comparing Shorts vs long-form.

Sponsorship detection: A curated phrase library scans the transcript for sponsor-read patterns (brought to you by, sponsor of today's video, use code, etc.) and reports approximate positions. Heuristic - high-precision, lower recall.

Transcript-segments view: One row per (video, transcript_line) pair. So a 5-minute video with 80 transcript segments expands to 80 rows. Ideal for vector-store ingestion, search indexing, and per-segment LLM analysis.

Channel monitoring: Feed the same video URLs across runs and dedupe on video_id + scraped_at downstream. Engagement velocity is captured in engagement_metrics.views_per_hour.

Save your input as an Apify Task

Apify Tasks let you save a configured input once and re-run it with a single click - no need to re-type search terms, locations, filters, or tier settings every time. Tasks are the foundation for everything that comes next: schedules, monitor mode, and webhook routing all attach to a saved Task, not to the raw actor.

Steps to save your current input as a Task:

  1. On this actor's Apify Store page, click Run with your input fully configured.
  2. Click the Save as task button at the top of the run page.
  3. Name the task something memorable (e.g. Competitor channel uploads - daily).
  4. Reload the task page and click Start anytime to re-run with the same inputs.

Tasks unlock the next two features below: scheduling and monitor mode.

Run this weekly with Apify Schedules

Apify Schedules cron-run any saved Task automatically. Pair this with the saved Task above and you get hands-off recurring runs with no manual clicks, no missed weeks, and a steady stream of fresh data into your CRM or warehouse.

Steps to schedule a Task:

  1. Save your input as a Task (see above).
  2. Go to https://console.apify.com/schedules and click Create new schedule.
  3. Pick your Task and set the cron expression. Common patterns:
    • Daily at 9am UTC: 0 9 * * *
    • Weekly on Mondays at 9am: 0 9 * * 1
    • Monthly on the 1st: 0 9 1 * *
  4. Save. Apify will run your Task on that schedule automatically, push the dataset to whatever integrations you have wired up, and fire run-completion webhooks for downstream automation.

Run daily to monitor for new uploads, view-velocity changes, and breakout videos across the channels you track.

Monitor mode (v2, beta)

Monitor mode is the v2 evolution of this actor and is currently in BETA. It turns a recurring schedule into a true change-feed instead of a firehose of duplicate records.

How it works:

  • When this actor runs under an Apify Schedule, monitor mode is enabled automatically.
  • Instead of emitting ALL records every run, it emits ONLY records that are NEW or CHANGED since the last scheduled run.
  • A digest record summarizes the delta (X new, Y changed, Z removed) at the top of every run.
  • Optional: provide a Slack or email webhook URL in the monitor_webhook_url input field and the digest fires there too, so your team gets the delta in their inbox or channel without polling the dataset.
  • Cost: a single scheduled_delta_run event ($0.05) per scheduled run, plus standard PPE on emitted delta records only. Predictable monthly cost, no surprise bills from re-charging for unchanged records.

Monitor mode is rolling out to the top 3 actors first (this one included if it's hotel-motel-lead-finder, google-maps-reviews-pro, or mcp-accounting-firm-leads). Full portfolio coverage by end of June.

Support

GitHub issues or DM via the Apify Store contact form.

Found this useful?

If this actor saved you time or money, please consider leaving a quick review on the Apify Store. Reviews help other buyers find work that solves their problem and let me prioritize the features paying customers actually use. Leave a review: https://apify.com/seibs.co/youtube-intelligence#reviews