Reddit Comment Tree Scraper — Full Threads + Scores
Pricing
from $5.00 / 1,000 threads
Reddit Comment Tree Scraper — Full Threads + Scores
Premium: scrape full nested comment trees WITH upvote scores and depth from any Reddit thread or subreddit, using a real browser to get the canonical data RSS can't.
Pricing
from $5.00 / 1,000 threads
Rating
0.0
(0)
Developer
James Taylor
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Share
Get the complete comment tree of any Reddit thread — every comment with its author, body, upvote score, depth, and parent — plus the post's score and comment count. This is the premium, full-fidelity Reddit scraper: it returns the engagement data that lightweight RSS-based scrapers simply can't.
Built for researchers, analysts, sentiment/social-listening tools, and anyone who needs the real structure and scores of a Reddit discussion.
Why this one
Reddit blocks its .json/API endpoints to ordinary scrapers (you get a 403), which is why
cheaper actors fall back to RSS and return comments without scores or nesting. This actor uses
a real browser (headless Chromium) through a residential proxy to pass Reddit's anti-bot,
then reads the canonical thread data — so you get:
- Per-comment upvote scores and post score / upvote ratio / comment count
- Full nested structure — each comment's
depthandparentIdto rebuild the tree - Author, body, timestamp, and permalink for every comment
What it does
- Scrapes specific thread URLs, and/or discovers threads from subreddits you name.
- Returns one record per thread: the post plus a flat-but-tree-preserving
comments[]array (each comment carriesdepth+parentId, so you can reconstruct the hierarchy).
Input
| Field | Type | Default | Description |
|---|---|---|---|
postUrls | array | [] | Specific Reddit thread URLs to scrape in full. |
subreddits | array | [] | Discover threads from these subreddits and scrape each. |
sort | string | hot | Sort for subreddit discovery: hot/new/rising/top. |
maxPosts | integer | 25 | Total threads to scrape (caps spend). |
maxComments | integer | 200 | Cap comments per thread (Reddit serves ~200/page). |
maxConcurrency | integer | 3 | Parallel browser contexts (kept low — real browsers). |
proxyConfiguration | object | Apify residential | Required — Reddit blocks datacenter IPs. |
Provide postUrls, subreddits, or both.
Example input
{"subreddits": ["SaaS"],"sort": "hot","maxPosts": 10,"maxComments": 200,"postUrls": ["https://www.reddit.com/r/Entrepreneur/comments/abc123/some_thread/"]}
Output
One dataset item per thread:
{"type": "post","subreddit": "SaaS","author": "founder_jane","title": "How we cut churn 30%","score": 142,"upvoteRatio": 0.97,"numComments": 88,"commentCount": 200,"postUrl": "https://www.reddit.com/r/SaaS/comments/abc123/how_we_cut_churn_30","createdAt": "2026-06-01T12:00:00.000Z","comments": [{"type": "comment","id": "opcuxfu","postId": "abc123","parentId": "t3_abc123","author": "growth_greg","body": "What did your onboarding look like before?","score": 24,"depth": 0,"createdAt": "2026-06-01T12:30:00.000Z","url": "https://www.reddit.com/r/SaaS/comments/abc123/_/opcuxfu"}]}
Rebuild the tree from depth + parentId, or use the flat list as-is.
Pricing & cost control
Pay-Per-Event — charged per thread (all of a thread's comments included). This is a premium
tier: it runs a real browser through a residential proxy (Reddit hard-blocks datacenter IPs), so
it costs more than the RSS-based Reddit Scraper — but it's the only one that returns scores +
nested trees. Set maxPosts to cap spend.
Two cost levers:
- Bring your own residential proxy. In the proxy input choose Custom proxies and paste your own residential URLs ($1–2/GB) instead of Apify's residential ($8/GB) — typically 3–5× cheaper.
threadsPerSessionamortises browser startup: one warmed session fetches many threads'.jsonbefore rotating IP, so you mostly pay for the lightweight JSON payloads, not page renders.
Limitations
- ~200 comments per thread per Reddit page; very large threads truncate the deepest branches (the collapsed "load more" stubs are skipped).
- Residential proxy required. Datacenter IPs are blocked.
- Slower and pricier than the RSS scraper by design — use that one when you don't need scores/trees.
Compliance
Reads public Reddit data only, identifies itself, and never logs in, posts, votes, or messages. Use the data in line with Reddit's terms and any laws that apply to you.
FAQ
How is this different from your Reddit Scraper? That one is RSS-based — fast and cheap, but no upvote scores and only flat top-level comments. This one uses a real browser to get full nested trees + scores.
Do I need a Reddit account or API key? No — just the (default) residential proxy.
Why a browser? Reddit fingerprints and challenges non-browser clients; a real browser passes, then reads the canonical thread data.
Want this turned into action, not just data?
If you want Reddit conversations turned into leads and AI-drafted replies automatically, that's SignalEngine — this actor is a piece of the engine behind it.