YouTube Comments — CSV, Likes, Replies, isVerified, No API Key
Pricing
Pay per usage
YouTube Comments — CSV, Likes, Replies, isVerified, No API Key
21 runs. YouTube comments in CSV/JSON — text, author, likes, replies, isVerified. Innertube backend, no API quotas. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For LLM training + brand listening. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Alex
Actor stats
0
Bookmarked
4
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
YouTube Comments Scraper — Innertube API, No 10K-Quota, No OAuth
Get the full comment stream from any YouTube video into CSV / JSON without an API key. The actor calls YouTube's internal Innertube endpoint (the same one youtube.com uses on every page load) instead of the YouTube Data API v3 — so there is no 10,000-unit/day quota to budget, no OAuth console to register, and no key to rotate.
19 production runs to date by sentiment-analysis, creator-economy, and NLP-dataset teams.
Why Innertube over the Data API
| Factor | YouTube Data API v3 | Innertube (this actor) |
|---|---|---|
| API key | Required | Not needed |
| Daily quota | 10,000 units (≈ 5–8 fully-paginated videos) | No published quota |
| Setup | Google Cloud project + OAuth | Paste video URLs |
| Comment payload | commentRenderer (older) | commentEntityPayload (current) |
| Reply count + verified flag | Available, costs quota | Available, no quota |
Honest disclosure on "no quota": YouTube can — and does — rate-limit Innertube traffic when it looks abusive. The actor includes random 0.8–2 s delays between paginated requests and 1.5–3 s between videos to stay under that radar. At default settings it has run cleanly across 19 jobs; bursting harder is your call.
Output schema (per comment, verified against src/main.js)
{"videoId": "dQw4w9WgXcQ","commentId": "Ugzge340dBgB75hWBm54AaABAg","author": "@YouTube","authorChannelId": "UCBR8-60-B28hp2BmDPdntcQ","text": "can confirm: he never gave us up","likes": 200,"publishedAt": "10 months ago","isReply": false,"replyCount": 961,"isHearted": false,"isVerified": true,"scrapedAt": "2026-04-29T12:00:00.000Z"}
12 fields per comment. publishedAt is the relative string YouTube exposes (e.g. "10 months ago") — there is no absolute timestamp on Innertube's public-page response, so we don't pretend to derive one.
isHearted reflects creator-heart state. isVerified is the channel-verified flag on the comment author. authorChannelId is the YouTube channel ID of the commenter (useful for joining against youtube-channel-scraper).
Input
| Parameter | Type | Default | Range | Description |
|---|---|---|---|---|
videoUrls | string[] | required | ≥1 | YouTube video URLs or 11-character video IDs |
maxCommentsPerVideo | integer | 100 | 1–100000 | Top-level cap per video (replies flagged, NOT recursed) |
sortBy | string | "top" | "top" | "newest" | "newest" is best-effort (see Honest limitations) |
videoUrls accepts:
https://www.youtube.com/watch?v=IDhttps://youtu.be/IDhttps://www.youtube.com/shorts/ID- Bare 11-char video IDs (regex
^[a-zA-Z0-9_-]{11}$)
How it works
- Fetch the watch page —
https://www.youtube.com/watch?v=<videoId>with a hardcoded Chrome 120 User-Agent. ExtractytInitialDataandINNERTUBE_API_KEYfrom inline<script>. - Find the comments continuation token inside
twoColumnWatchNextResults. - POST to
youtubei/v1/nextwithINNERTUBE_CONTEXT(clientName=WEB) and the continuation token. - Optionally re-fetch with the "Newest first" sort token if
sortBy="newest"and the sort menu surfaces in the response. - Walk
frameworkUpdates.entityBatchUpdate.mutations— eachcommentEntityPayloadbecomes one record. Continue paginating viaonResponseReceivedEndpointsuntil the cap is hit or the continuation chain ends.
Pure fetch(), no Cheerio, no Playwright, no browser automation.
Honest limitations (read before bulk runs)
maxCommentsPerVideo=0returns ZERO comments, NOT unlimited. The pagination loop guard isallComments.length < maxComments, so0 < 0is false → loop never enters → empty output. The schema now enforcesminimum: 1. For "as much as possible," set a high explicit number (e.g. 100000); the run will end naturally when the continuation chain ends or the cap is hit.- Replies are flagged but NOT recursed. Each record exposes
isReply(bool) andreplyCount(int). The actor never POSTs into reply-panel continuation tokens. If you need full reply threads, see Custom builds below. sortBy="newest"is best-effort. The actor re-fetches with the "Newest first" token only iffindSortToken()finds it in the response. If the comments-header sort menu is missing for a given video (rare but possible), the actor silently falls back to default "top" sort. Check the run log to confirm; the lineSwitching to "Newest first" sort...indicates success.- Hardcoded User-Agent (Chrome 120). Both the watch-page GET and the Innertube POST send the same UA string. If YouTube fingerprints UA + IP and that combination triggers a soft-block, this actor cannot rotate. Custom build can wire in UA rotation and proxy.
- No proxy. Direct fetch from the Apify worker IP. Bursting many videos in parallel from the same IP raises rate-limit risk.
- Hardcoded fallback Innertube API key. If the regex
"INNERTUBE_API_KEY":"([^"]+)"fails to match the watch page (YouTube response shape change), the actor falls back to the well-known public web-embed key (AIzaSyAO_FJ2SlqU8Q...). This usually works but is not guaranteed long-term. - Comments disabled / video unavailable / age-gated — actor logs
No comments section found.orVideo not availableand skips that video; no record pushed. - No absolute timestamps.
publishedAtis the relative string YouTube returns ("10 months ago"). We do NOT infer absolute timestamps from a relative string — that would be a fabrication. If you need ISO 8601 timestamps, use the Data API v3 (at the cost of quota). - Continuation chain ends naturally — for popular videos with 100K+ comments, YouTube eventually stops returning a next-token. Practical hard cap is whatever YouTube serves you for that specific video.
- Inter-video delay 1.5–3 s, intra-pagination delay 0.8–2 s. Adjust expectations: 100 videos × 100 comments ≈ 5–10 min wall-clock, even though each fetch is sub-second.
Python integration
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("knotless_cadence/youtube-comments-scraper").call(run_input={"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ",],"maxCommentsPerVideo": 500,"sortBy": "newest",})for c in client.dataset(run["defaultDatasetId"]).iterate_items():flag = " ✓" if c["isVerified"] else ""print(f"{c['author']}{flag}: {c['text'][:120]} ({c['likes']} likes)")
Common questions
Q: Will this stop working if YouTube changes their internal API?
A: Innertube is YouTube's core internal API — they can't remove it without breaking youtube.com itself. The actor parses the commentEntityPayload schema, which has been stable on the WEB client. If a field is renamed, the actor returns partial records (the ||-fallback chains across likeCountLiked / likeCountNotliked, replyCount, etc. mean a rename on one path doesn't crash extraction — it degrades gracefully).
Q: Can I get full reply threads?
A: Currently the actor records isReply (bool) and replyCount (int). It does not recurse into reply panels. Full-reply-thread extraction is a custom build — see Custom scraping below.
Q: Why does publishedAt look like "10 months ago" instead of an ISO date?
A: That's the literal value YouTube returns in the WEB-client Innertube response. We don't infer absolute timestamps from a relative string — that would be a fabrication. If you need exact timestamps, you'll need the Data API v3 (which exposes publishedAt as ISO 8601, at the cost of quota).
Q: What if a video has comments disabled?
A: The actor logs "No comments section found. Comments may be disabled." and continues to the next video. No record is pushed for that video.
Q: Cost on Apify?
A: Apify charges by compute units, not per-comment. Each fetch is sub-second over plain fetch(); pagination delays dominate runtime. Free tier covers small batches. For sustained large pulls, email and we can quote a fixed-price custom build.
Use cases
- Sentiment analysis on creator audience.
- AI / NLP training-data collection (real conversational text).
- Competitor monitoring on rival channels.
- Influencer engagement-quality audit (verified-vs-unverified ratio, reply density).
- Brand-mention surveillance across creator videos.
- Academic research into platform discourse.
YouTube tool family (related actors)
Cross-platform pairing:
| Source | Actor | Use |
|---|---|---|
| Trustpilot | Customer reviews | External brand sentiment (949 production runs) |
| Reddit Discussion | Community discussions | Unfiltered audience opinion |
| Bluesky Scraper | AT Protocol | Open social-graph mentions |
| Hacker News | Tech sentiment | Developer-audience reactions |
All 31 published actors free to inspect on Apify Store.
Custom scraping — pilot tiers
Need full reply threads, multi-video pipelines, UA rotation + proxy, or a different schema (translation, profanity flags, sentiment scoring)? Three tiers:
- Pilot — $97 · 1 actor, basic config, 7-day support. Good entry point — useful for a single creator-comment dump or a one-off competitor channel audit.
- Standard — $297 · custom actor + Slack/email alerts on results, 30-day support. Most NLP-dataset and creator-economy projects fit here.
- Premium — $797 · custom actor + dashboard + 90-day support + 1 modification round. For ongoing pipelines (daily creator-channel sentiment, multi-platform fan-out with TikTok / Instagram / Threads), full reply-thread recursion, proxy-routed UA rotation.
Email: spinov001@gmail.com — drop video URLs and the schema you need; quote within 48h.
Proof of work: 31 published Apify scrapers (78 total in portfolio) — Trustpilot 949 runs, Reddit 80+, Google News 43, Glassdoor 37, Email Extractor 36+. Recently delivered a paid 3-article series for a client in the proxy industry ($150).
More tips: t.me/scraping_ai · blog.spinov.online
Disclaimer
Designed for sentiment analysis, audience research, and academic use. Respect YouTube's Terms of Service, applicable data-protection law (GDPR, CCPA), and scrape publicly visible content only. Not affiliated with YouTube or Google LLC.
Honest disclosure: extracts 12 fields per comment (videoId, commentId, author, authorChannelId, text, likes, publishedAt, isReply, replyCount, isHearted, isVerified, scrapedAt). publishedAt is the relative string YouTube returns ("10 months ago") — no inferred absolute timestamps. Reply threads are flagged but NOT recursed. maxCommentsPerVideo=0 returns 0 comments (use a high integer for "all"). Hardcoded Chrome 120 UA + fallback Innertube API key on watch-page extraction failure. Innertube has no published quota but YouTube can rate-limit aggressive callers; the actor's default 0.8–2 s pagination delay keeps runs polite.