Reddit Comments Deep Scraper
Pricing
from $0.001 / result
Reddit Comments Deep Scraper
Scrape Reddit comments with full nested reply trees from any subreddit or post URL. Get author karma, scores, timestamps, flair, and threading depth. Perfect for AI training data, sentiment analysis, and brand monitoring.
Pricing
from $0.001 / result
Rating
0.0
(0)
Developer
LIAICHI MUSTAPHA
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
The only Reddit scraper that goes all the way down the thread.
Most Reddit scrapers give you post titles and upvote counts. This one gives you the conversation — every reply, every nested thread, reconstructed as a clean structured dataset ready for analysis, AI pipelines, or brand intelligence.
The problem with other Reddit comment scrapers
Every other Reddit scraper on the market has the same blind spot: they only scrape top-level comments.
You get the surface. You miss the conversation.
Reddit's real value is in the replies. That's where opinions get challenged, nuanced, and refined. That's where the signal lives — buried two or three levels deep in a thread. Scraping only top-level comments from Reddit is like reading only the headlines.
This actor scrapes the full tree. Every reply to every reply, down to whatever depth you set.
What makes this different
| Feature | Other Reddit scrapers | This actor |
|---|---|---|
| Top-level comments | ✅ | ✅ |
| Nested replies (full tree) | ❌ | ✅ |
| Configurable depth (1–10 levels) | ❌ | ✅ |
parent_id to reconstruct threads | ❌ | ✅ |
| Plain text (markdown stripped) | ❌ | ✅ |
| Author karma + flair signals | ❌ | ✅ |
| Subreddit feed + search + direct URLs | ❌ | ✅ |
| No API key required | ✅ | ✅ |
| No browser / Playwright overhead | ❌ | ✅ |
What you get per comment
Every comment comes out as a flat record with 23 fields:
{"comment_id": "t1_o8z9p9x","author": "some_user","author_karma": 4821,"author_flair": "ML Engineer","body": "This is the original **markdown** text","body_plain": "This is the original markdown text","score": 142,"depth": 2,"parent_id": "t1_o8z1abc","num_replies": 3,"created_utc": "2025-03-15T10:22:00Z","is_edited": false,"is_deleted": false,"is_removed": false,"distinguished": null,"permalink": "https://www.reddit.com/r/.../comments/.../.../","post_title": "Parent post title","post_url": "https://www.reddit.com/r/.../comments/.../","subreddit": "MachineLearning","post_id": "1rihows"}
body_plain is the key field — markdown stripped, quotes removed, clean text ready to feed directly into any LLM or sentiment model without preprocessing.
depth + parent_id let you reconstruct the full conversation tree from the flat output. Every comment knows exactly where it sits in the thread.
Three ways to run it
1. Direct post URLs
Paste one or more Reddit post URLs and get every comment from them.
startUrls:- https://www.reddit.com/r/MachineLearning/comments/1abc123/post_title/- https://www.reddit.com/r/LocalLLaMA/comments/1xyz789/another_post/
2. Subreddit feeds
Give it subreddit names and it fetches the top posts automatically, then scrapes all their comments.
Choose sort: hot / new / top / rising / controversial
subreddits: ["MachineLearning", "LocalLLaMA", "artificial"]subredditSort: "top"maxPostsPerSubreddit: 20
3. Keyword search
Enter a search query. The actor finds the most relevant Reddit posts and scrapes their comment threads.
searchQuery: "Claude vs GPT-4 performance comparison"maxPostsPerSubreddit: 10
Key settings
| Setting | Default | What it controls |
|---|---|---|
maxDepth | 3 | How many reply levels to follow. Set to 1 for top-level only, 10 for the full thread |
maxCommentsPerPost | 100 | Cap per post. Set to 0 for no limit |
minScore | 0 | Skip low-quality noise. Set to 5+ to keep only upvoted comments |
skipDeleted | true | Drop [deleted] and [removed] rows from the output |
includeReplies | true | Whether to recurse into reply threads at all |
dateFrom | (none) | Only return comments posted after this date (YYYY-MM-DD) |
includePostInfo | true | Attach the parent post title and URL to every comment record |
Who uses this
AI researchers — Reddit comment trees are multi-turn human dialogue. The depth + parent_id fields let you reconstruct conversation chains for fine-tuning datasets, RLHF, or preference data collection.
Brand & product teams — Scrape subreddits where your product gets mentioned. Every comment comes with a score, author karma, and timestamp — so you can weight opinions by credibility, not just volume.
Market researchers — Find what people actually complain about in a product category. The pain points don't live in the top comment. They're in the third reply where someone says "actually the real issue is..."
Competitive intelligence — Monitor competitor brand mentions across Reddit with daily scheduled runs. Filter by minScore to surface only the comments that resonated.
Sentiment analysts — body_plain is clean, stripped text. Pipe it directly into VADER, TextBlob, or any transformer-based sentiment model without a preprocessing step.
My other actors
| Actor | Description |
|---|---|
| Apify Store Analyzer | Competitive intelligence across 10,000+ Apify actors |
| n8n Marketplace Analyzer | Scrape and analyze n8n workflow templates |
| Zapier Template Analyzer | Extract and analyze Zapier automation templates |
| Make Template Analyzer | Analyze Make.com templates and automation trends |
| Substack Newsletter Scraper | Scrape Substack posts, authors, and publication data |
| Beehiiv Newsletter Scraper | Extract newsletters and publication data from Beehiiv |
| Mubawab Housing Scraper | Real estate listings from Mubawab.ma |
Built by Mustapha LIAICHI