Reddit Comment Tree Scraper — Full Threads + Scores avatar

Reddit Comment Tree Scraper — Full Threads + Scores

Pricing

from $5.00 / 1,000 threads

Go to Apify Store
Reddit Comment Tree Scraper — Full Threads + Scores

Reddit Comment Tree Scraper — Full Threads + Scores

Premium: scrape full nested comment trees WITH upvote scores and depth from any Reddit thread or subreddit, using a real browser to get the canonical data RSS can't.

Pricing

from $5.00 / 1,000 threads

Rating

0.0

(0)

Developer

James Taylor

James Taylor

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

Get the complete comment tree of any Reddit thread — every comment with its author, body, upvote score, depth, and parent — plus the post's score and comment count. This is the premium, full-fidelity Reddit scraper: it returns the engagement data that lightweight RSS-based scrapers simply can't.

Built for researchers, analysts, sentiment/social-listening tools, and anyone who needs the real structure and scores of a Reddit discussion.

Why this one

Reddit blocks its .json/API endpoints to ordinary scrapers (you get a 403), which is why cheaper actors fall back to RSS and return comments without scores or nesting. This actor uses a real browser (headless Chromium) through a residential proxy to pass Reddit's anti-bot, then reads the canonical thread data — so you get:

  • Per-comment upvote scores and post score / upvote ratio / comment count
  • Full nested structure — each comment's depth and parentId to rebuild the tree
  • Author, body, timestamp, and permalink for every comment

What it does

  • Scrapes specific thread URLs, and/or discovers threads from subreddits you name.
  • Returns one record per thread: the post plus a flat-but-tree-preserving comments[] array (each comment carries depth + parentId, so you can reconstruct the hierarchy).

Input

FieldTypeDefaultDescription
postUrlsarray[]Specific Reddit thread URLs to scrape in full.
subredditsarray[]Discover threads from these subreddits and scrape each.
sortstringhotSort for subreddit discovery: hot/new/rising/top.
maxPostsinteger25Total threads to scrape (caps spend).
maxCommentsinteger200Cap comments per thread (Reddit serves ~200/page).
maxConcurrencyinteger3Parallel browser contexts (kept low — real browsers).
proxyConfigurationobjectApify residentialRequired — Reddit blocks datacenter IPs.

Provide postUrls, subreddits, or both.

Example input

{
"subreddits": ["SaaS"],
"sort": "hot",
"maxPosts": 10,
"maxComments": 200,
"postUrls": ["https://www.reddit.com/r/Entrepreneur/comments/abc123/some_thread/"]
}

Output

One dataset item per thread:

{
"type": "post",
"subreddit": "SaaS",
"author": "founder_jane",
"title": "How we cut churn 30%",
"score": 142,
"upvoteRatio": 0.97,
"numComments": 88,
"commentCount": 200,
"postUrl": "https://www.reddit.com/r/SaaS/comments/abc123/how_we_cut_churn_30",
"createdAt": "2026-06-01T12:00:00.000Z",
"comments": [
{
"type": "comment",
"id": "opcuxfu",
"postId": "abc123",
"parentId": "t3_abc123",
"author": "growth_greg",
"body": "What did your onboarding look like before?",
"score": 24,
"depth": 0,
"createdAt": "2026-06-01T12:30:00.000Z",
"url": "https://www.reddit.com/r/SaaS/comments/abc123/_/opcuxfu"
}
]
}

Rebuild the tree from depth + parentId, or use the flat list as-is.

Pricing & cost control

Pay-Per-Event — charged per thread (all of a thread's comments included). This is a premium tier: it runs a real browser through a residential proxy (Reddit hard-blocks datacenter IPs), so it costs more than the RSS-based Reddit Scraper — but it's the only one that returns scores + nested trees. Set maxPosts to cap spend.

Two cost levers:

  • Bring your own residential proxy. In the proxy input choose Custom proxies and paste your own residential URLs ($1–2/GB) instead of Apify's residential ($8/GB) — typically 3–5× cheaper.
  • threadsPerSession amortises browser startup: one warmed session fetches many threads' .json before rotating IP, so you mostly pay for the lightweight JSON payloads, not page renders.

Limitations

  • ~200 comments per thread per Reddit page; very large threads truncate the deepest branches (the collapsed "load more" stubs are skipped).
  • Residential proxy required. Datacenter IPs are blocked.
  • Slower and pricier than the RSS scraper by design — use that one when you don't need scores/trees.

Compliance

Reads public Reddit data only, identifies itself, and never logs in, posts, votes, or messages. Use the data in line with Reddit's terms and any laws that apply to you.

FAQ

How is this different from your Reddit Scraper? That one is RSS-based — fast and cheap, but no upvote scores and only flat top-level comments. This one uses a real browser to get full nested trees + scores.

Do I need a Reddit account or API key? No — just the (default) residential proxy.

Why a browser? Reddit fingerprints and challenges non-browser clients; a real browser passes, then reads the canonical thread data.


Want this turned into action, not just data?

If you want Reddit conversations turned into leads and AI-drafted replies automatically, that's SignalEngine — this actor is a piece of the engine behind it.