Reddit Comments Deep Scraper avatar

Reddit Comments Deep Scraper

Pricing

from $0.001 / result

Go to Apify Store
Reddit Comments Deep Scraper

Reddit Comments Deep Scraper

Scrape Reddit comments with full nested reply trees from any subreddit or post URL. Get author karma, scores, timestamps, flair, and threading depth. Perfect for AI training data, sentiment analysis, and brand monitoring.

Pricing

from $0.001 / result

Rating

0.0

(0)

Developer

LIAICHI MUSTAPHA

LIAICHI MUSTAPHA

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

The only Reddit scraper that goes all the way down the thread.

Most Reddit scrapers give you post titles and upvote counts. This one gives you the conversation — every reply, every nested thread, reconstructed as a clean structured dataset ready for analysis, AI pipelines, or brand intelligence.


The problem with other Reddit comment scrapers

Every other Reddit scraper on the market has the same blind spot: they only scrape top-level comments.

You get the surface. You miss the conversation.

Reddit's real value is in the replies. That's where opinions get challenged, nuanced, and refined. That's where the signal lives — buried two or three levels deep in a thread. Scraping only top-level comments from Reddit is like reading only the headlines.

This actor scrapes the full tree. Every reply to every reply, down to whatever depth you set.


What makes this different

FeatureOther Reddit scrapersThis actor
Top-level comments
Nested replies (full tree)
Configurable depth (1–10 levels)
parent_id to reconstruct threads
Plain text (markdown stripped)
Author karma + flair signals
Subreddit feed + search + direct URLs
No API key required
No browser / Playwright overhead

What you get per comment

Every comment comes out as a flat record with 23 fields:

{
"comment_id": "t1_o8z9p9x",
"author": "some_user",
"author_karma": 4821,
"author_flair": "ML Engineer",
"body": "This is the original **markdown** text",
"body_plain": "This is the original markdown text",
"score": 142,
"depth": 2,
"parent_id": "t1_o8z1abc",
"num_replies": 3,
"created_utc": "2025-03-15T10:22:00Z",
"is_edited": false,
"is_deleted": false,
"is_removed": false,
"distinguished": null,
"permalink": "https://www.reddit.com/r/.../comments/.../.../",
"post_title": "Parent post title",
"post_url": "https://www.reddit.com/r/.../comments/.../",
"subreddit": "MachineLearning",
"post_id": "1rihows"
}

body_plain is the key field — markdown stripped, quotes removed, clean text ready to feed directly into any LLM or sentiment model without preprocessing.

depth + parent_id let you reconstruct the full conversation tree from the flat output. Every comment knows exactly where it sits in the thread.


Three ways to run it

1. Direct post URLs

Paste one or more Reddit post URLs and get every comment from them.

startUrls:
- https://www.reddit.com/r/MachineLearning/comments/1abc123/post_title/
- https://www.reddit.com/r/LocalLLaMA/comments/1xyz789/another_post/

2. Subreddit feeds

Give it subreddit names and it fetches the top posts automatically, then scrapes all their comments.

Choose sort: hot / new / top / rising / controversial

subreddits: ["MachineLearning", "LocalLLaMA", "artificial"]
subredditSort: "top"
maxPostsPerSubreddit: 20

Enter a search query. The actor finds the most relevant Reddit posts and scrapes their comment threads.

searchQuery: "Claude vs GPT-4 performance comparison"
maxPostsPerSubreddit: 10

Key settings

SettingDefaultWhat it controls
maxDepth3How many reply levels to follow. Set to 1 for top-level only, 10 for the full thread
maxCommentsPerPost100Cap per post. Set to 0 for no limit
minScore0Skip low-quality noise. Set to 5+ to keep only upvoted comments
skipDeletedtrueDrop [deleted] and [removed] rows from the output
includeRepliestrueWhether to recurse into reply threads at all
dateFrom(none)Only return comments posted after this date (YYYY-MM-DD)
includePostInfotrueAttach the parent post title and URL to every comment record

Who uses this

AI researchers — Reddit comment trees are multi-turn human dialogue. The depth + parent_id fields let you reconstruct conversation chains for fine-tuning datasets, RLHF, or preference data collection.

Brand & product teams — Scrape subreddits where your product gets mentioned. Every comment comes with a score, author karma, and timestamp — so you can weight opinions by credibility, not just volume.

Market researchers — Find what people actually complain about in a product category. The pain points don't live in the top comment. They're in the third reply where someone says "actually the real issue is..."

Competitive intelligence — Monitor competitor brand mentions across Reddit with daily scheduled runs. Filter by minScore to surface only the comments that resonated.

Sentiment analystsbody_plain is clean, stripped text. Pipe it directly into VADER, TextBlob, or any transformer-based sentiment model without a preprocessing step.


My other actors

ActorDescription
Apify Store AnalyzerCompetitive intelligence across 10,000+ Apify actors
n8n Marketplace AnalyzerScrape and analyze n8n workflow templates
Zapier Template AnalyzerExtract and analyze Zapier automation templates
Make Template AnalyzerAnalyze Make.com templates and automation trends
Substack Newsletter ScraperScrape Substack posts, authors, and publication data
Beehiiv Newsletter ScraperExtract newsletters and publication data from Beehiiv
Mubawab Housing ScraperReal estate listings from Mubawab.ma

Built by Mustapha LIAICHI