YouTube Comments — CSV, Likes, Replies, isVerified, No API Key avatar

YouTube Comments — CSV, Likes, Replies, isVerified, No API Key

Pricing

Pay per usage

Go to Apify Store
YouTube Comments — CSV, Likes, Replies, isVerified, No API Key

YouTube Comments — CSV, Likes, Replies, isVerified, No API Key

21 runs. YouTube comments in CSV/JSON — text, author, likes, replies, isVerified. Innertube backend, no API quotas. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For LLM training + brand listening. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Alex

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

1

Monthly active users

2 days ago

Last modified

Share

YouTube Comments Scraper — Innertube API, No 10K-Quota, No OAuth

Get the full comment stream from any YouTube video into CSV / JSON without an API key. The actor calls YouTube's internal Innertube endpoint (the same one youtube.com uses on every page load) instead of the YouTube Data API v3 — so there is no 10,000-unit/day quota to budget, no OAuth console to register, and no key to rotate.

19 production runs to date by sentiment-analysis, creator-economy, and NLP-dataset teams.


Why Innertube over the Data API

FactorYouTube Data API v3Innertube (this actor)
API keyRequiredNot needed
Daily quota10,000 units (≈ 5–8 fully-paginated videos)No published quota
SetupGoogle Cloud project + OAuthPaste video URLs
Comment payloadcommentRenderer (older)commentEntityPayload (current)
Reply count + verified flagAvailable, costs quotaAvailable, no quota

Honest disclosure on "no quota": YouTube can — and does — rate-limit Innertube traffic when it looks abusive. The actor includes random 0.8–2 s delays between paginated requests and 1.5–3 s between videos to stay under that radar. At default settings it has run cleanly across 19 jobs; bursting harder is your call.


Output schema (per comment, verified against src/main.js)

{
"videoId": "dQw4w9WgXcQ",
"commentId": "Ugzge340dBgB75hWBm54AaABAg",
"author": "@YouTube",
"authorChannelId": "UCBR8-60-B28hp2BmDPdntcQ",
"text": "can confirm: he never gave us up",
"likes": 200,
"publishedAt": "10 months ago",
"isReply": false,
"replyCount": 961,
"isHearted": false,
"isVerified": true,
"scrapedAt": "2026-04-29T12:00:00.000Z"
}

12 fields per comment. publishedAt is the relative string YouTube exposes (e.g. "10 months ago") — there is no absolute timestamp on Innertube's public-page response, so we don't pretend to derive one.

isHearted reflects creator-heart state. isVerified is the channel-verified flag on the comment author. authorChannelId is the YouTube channel ID of the commenter (useful for joining against youtube-channel-scraper).


Input

ParameterTypeDefaultRangeDescription
videoUrlsstring[]required≥1YouTube video URLs or 11-character video IDs
maxCommentsPerVideointeger1001–100000Top-level cap per video (replies flagged, NOT recursed)
sortBystring"top""top" | "newest""newest" is best-effort (see Honest limitations)

videoUrls accepts:

  • https://www.youtube.com/watch?v=ID
  • https://youtu.be/ID
  • https://www.youtube.com/shorts/ID
  • Bare 11-char video IDs (regex ^[a-zA-Z0-9_-]{11}$)

How it works

  1. Fetch the watch pagehttps://www.youtube.com/watch?v=<videoId> with a hardcoded Chrome 120 User-Agent. Extract ytInitialData and INNERTUBE_API_KEY from inline <script>.
  2. Find the comments continuation token inside twoColumnWatchNextResults.
  3. POST to youtubei/v1/next with INNERTUBE_CONTEXT (clientName=WEB) and the continuation token.
  4. Optionally re-fetch with the "Newest first" sort token if sortBy="newest" and the sort menu surfaces in the response.
  5. Walk frameworkUpdates.entityBatchUpdate.mutations — each commentEntityPayload becomes one record. Continue paginating via onResponseReceivedEndpoints until the cap is hit or the continuation chain ends.

Pure fetch(), no Cheerio, no Playwright, no browser automation.


Honest limitations (read before bulk runs)

  • maxCommentsPerVideo=0 returns ZERO comments, NOT unlimited. The pagination loop guard is allComments.length < maxComments, so 0 < 0 is false → loop never enters → empty output. The schema now enforces minimum: 1. For "as much as possible," set a high explicit number (e.g. 100000); the run will end naturally when the continuation chain ends or the cap is hit.
  • Replies are flagged but NOT recursed. Each record exposes isReply (bool) and replyCount (int). The actor never POSTs into reply-panel continuation tokens. If you need full reply threads, see Custom builds below.
  • sortBy="newest" is best-effort. The actor re-fetches with the "Newest first" token only if findSortToken() finds it in the response. If the comments-header sort menu is missing for a given video (rare but possible), the actor silently falls back to default "top" sort. Check the run log to confirm; the line Switching to "Newest first" sort... indicates success.
  • Hardcoded User-Agent (Chrome 120). Both the watch-page GET and the Innertube POST send the same UA string. If YouTube fingerprints UA + IP and that combination triggers a soft-block, this actor cannot rotate. Custom build can wire in UA rotation and proxy.
  • No proxy. Direct fetch from the Apify worker IP. Bursting many videos in parallel from the same IP raises rate-limit risk.
  • Hardcoded fallback Innertube API key. If the regex "INNERTUBE_API_KEY":"([^"]+)" fails to match the watch page (YouTube response shape change), the actor falls back to the well-known public web-embed key (AIzaSyAO_FJ2SlqU8Q...). This usually works but is not guaranteed long-term.
  • Comments disabled / video unavailable / age-gated — actor logs No comments section found. or Video not available and skips that video; no record pushed.
  • No absolute timestamps. publishedAt is the relative string YouTube returns ("10 months ago"). We do NOT infer absolute timestamps from a relative string — that would be a fabrication. If you need ISO 8601 timestamps, use the Data API v3 (at the cost of quota).
  • Continuation chain ends naturally — for popular videos with 100K+ comments, YouTube eventually stops returning a next-token. Practical hard cap is whatever YouTube serves you for that specific video.
  • Inter-video delay 1.5–3 s, intra-pagination delay 0.8–2 s. Adjust expectations: 100 videos × 100 comments ≈ 5–10 min wall-clock, even though each fetch is sub-second.

Python integration

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("knotless_cadence/youtube-comments-scraper").call(
run_input={
"videoUrls": [
"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
],
"maxCommentsPerVideo": 500,
"sortBy": "newest",
}
)
for c in client.dataset(run["defaultDatasetId"]).iterate_items():
flag = " ✓" if c["isVerified"] else ""
print(f"{c['author']}{flag}: {c['text'][:120]} ({c['likes']} likes)")

Common questions

Q: Will this stop working if YouTube changes their internal API? A: Innertube is YouTube's core internal API — they can't remove it without breaking youtube.com itself. The actor parses the commentEntityPayload schema, which has been stable on the WEB client. If a field is renamed, the actor returns partial records (the ||-fallback chains across likeCountLiked / likeCountNotliked, replyCount, etc. mean a rename on one path doesn't crash extraction — it degrades gracefully).

Q: Can I get full reply threads? A: Currently the actor records isReply (bool) and replyCount (int). It does not recurse into reply panels. Full-reply-thread extraction is a custom build — see Custom scraping below.

Q: Why does publishedAt look like "10 months ago" instead of an ISO date? A: That's the literal value YouTube returns in the WEB-client Innertube response. We don't infer absolute timestamps from a relative string — that would be a fabrication. If you need exact timestamps, you'll need the Data API v3 (which exposes publishedAt as ISO 8601, at the cost of quota).

Q: What if a video has comments disabled? A: The actor logs "No comments section found. Comments may be disabled." and continues to the next video. No record is pushed for that video.

Q: Cost on Apify? A: Apify charges by compute units, not per-comment. Each fetch is sub-second over plain fetch(); pagination delays dominate runtime. Free tier covers small batches. For sustained large pulls, email and we can quote a fixed-price custom build.


Use cases

  • Sentiment analysis on creator audience.
  • AI / NLP training-data collection (real conversational text).
  • Competitor monitoring on rival channels.
  • Influencer engagement-quality audit (verified-vs-unverified ratio, reply density).
  • Brand-mention surveillance across creator videos.
  • Academic research into platform discourse.

Cross-platform pairing:

SourceActorUse
TrustpilotCustomer reviewsExternal brand sentiment (949 production runs)
Reddit DiscussionCommunity discussionsUnfiltered audience opinion
Bluesky ScraperAT ProtocolOpen social-graph mentions
Hacker NewsTech sentimentDeveloper-audience reactions

All 31 published actors free to inspect on Apify Store.


Custom scraping — pilot tiers

Need full reply threads, multi-video pipelines, UA rotation + proxy, or a different schema (translation, profanity flags, sentiment scoring)? Three tiers:

  • Pilot — $97 · 1 actor, basic config, 7-day support. Good entry point — useful for a single creator-comment dump or a one-off competitor channel audit.
  • Standard — $297 · custom actor + Slack/email alerts on results, 30-day support. Most NLP-dataset and creator-economy projects fit here.
  • Premium — $797 · custom actor + dashboard + 90-day support + 1 modification round. For ongoing pipelines (daily creator-channel sentiment, multi-platform fan-out with TikTok / Instagram / Threads), full reply-thread recursion, proxy-routed UA rotation.

Email: spinov001@gmail.com — drop video URLs and the schema you need; quote within 48h.

Proof of work: 31 published Apify scrapers (78 total in portfolio) — Trustpilot 949 runs, Reddit 80+, Google News 43, Glassdoor 37, Email Extractor 36+. Recently delivered a paid 3-article series for a client in the proxy industry ($150).

More tips: t.me/scraping_ai · blog.spinov.online


Disclaimer

Designed for sentiment analysis, audience research, and academic use. Respect YouTube's Terms of Service, applicable data-protection law (GDPR, CCPA), and scrape publicly visible content only. Not affiliated with YouTube or Google LLC.

Honest disclosure: extracts 12 fields per comment (videoId, commentId, author, authorChannelId, text, likes, publishedAt, isReply, replyCount, isHearted, isVerified, scrapedAt). publishedAt is the relative string YouTube returns ("10 months ago") — no inferred absolute timestamps. Reply threads are flagged but NOT recursed. maxCommentsPerVideo=0 returns 0 comments (use a high integer for "all"). Hardcoded Chrome 120 UA + fallback Innertube API key on watch-page extraction failure. Innertube has no published quota but YouTube can rate-limit aggressive callers; the actor's default 0.8–2 s pagination delay keeps runs polite.