Reddit Comments Scraper
Pricing
from $2.00 / 1,000 comment scrapeds
Reddit Comments Scraper
Pricing
from $2.00 / 1,000 comment scrapeds
Rating
0.0
(0)
Developer
Khadin Akbar
Maintained by CommunityActor stats
0
Bookmarked
1
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Scrape Reddit comments from a thread URL, a subreddit, a user profile, or a search keyword — in one actor, auto-detected per input. Returns one flat record per comment, with deep nested-reply expansion, scores, depth, author, OP flag, timestamps, awards, and permalinks. Fully managed data sourcing — no login, no Reddit account, and no API key on your side. Built for sentiment analysis, AI/LLM training data, and social listening, and ready to call from an AI agent over MCP.
What you can scrape
Put any mix of these in the queries list — each is detected automatically:
| Input type | Example | What you get |
|---|---|---|
| Thread / post URL | https://www.reddit.com/r/technology/comments/1abcd2e/title/ | Every comment on that thread, full nested reply tree |
| Subreddit | r/technology or technology | The newest comments across the whole subreddit |
| User profile | u/spez or https://www.reddit.com/user/spez/ | Every comment that user has posted, newest first (requires Reddit API credentials configured by the author — see How it works) |
| Keyword | ChatGPT alternatives | Comments on the threads matching that search (optionally filtered to comments containing the keyword) |
When to use it
- Sentiment & opinion mining — pull a viral thread and analyze how people actually feel.
- AI / LLM training data — clean, flat, deduplicated comment text with metadata.
- Social listening & brand monitoring — track what a subreddit or keyword's comments say about a product.
- Community / user research — pull a user's full comment history for context.
When NOT to use it: if you want post-level data (titles, scores, post bodies, subreddit analytics), use the companion reddit-posts-comments-scraper instead. This actor's output unit is always a comment.
Output
One record per comment. Example (concise format):
{"commentId": "abc123","parentId": "t3_1abcd2e","postId": "1abcd2e","subreddit": "technology","postTitle": "Some interesting thread","postUrl": "https://www.reddit.com/r/technology/comments/1abcd2e/some_thread/","author": "example_user","body": "This is the comment text.","score": 142,"depth": 0,"isSubmitter": false,"isEdited": false,"createdAt": "2026-06-02T14:30:00.000Z","permalink": "https://www.reddit.com/r/technology/comments/1abcd2e/some_thread/abc123/","awardsCount": 0,"numReplies": 3,"inputMode": "subreddit","sourceInput": "r/technology"}
Set responseFormat to detailed to additionally get bodyHtml, authorFlair, controversiality, gilded, stickied, and distinguished.
Pricing
$0.002 per comment returned, plus a tiny per-run start fee. You are billed only for comments actually written to the dataset, and never beyond your maxComments cap — so a 200-comment run costs about $0.40. Both pay-per-event and usage-based billing are available; pick whichever suits your volume.
Input options
| Option | Default | Description |
|---|---|---|
queries | — (required) | List of thread URLs, subreddits, users, or keywords |
mode | auto | Force thread / subreddit / user / search if auto-detect misreads an input |
maxComments | 200 | Hard cap on comments returned (and billed) this run |
sort | top | top, best, new, controversial, old, qa (thread ordering) |
includeReplies | true | Walk the full nested reply tree and expand "load more" stubs |
maxDepth | 10 | Maximum reply nesting depth to traverse |
responseFormat | concise | concise (agent-friendly) or detailed (all fields) |
filterCommentsByKeyword | false | Keyword mode: return only comments containing the keyword |
proxy | Residential | Proxy config — residential is required and on by default |
How it works
Comments are sourced through a managed, multi-tier chain, so you never supply credentials or proxies:
- Reddit OAuth API — used when the actor author has configured Reddit API credentials. Cheapest path, and the only one that supports user comment history.
- ScrapeCreators — a managed Reddit data API used by default and as the primary fallback. Serves thread, subreddit, and keyword inputs out of the box, with no setup on your side.
- SociaVault — a second managed API used for redundancy if ScrapeCreators is unavailable.
Each comment record carries a source field showing which tier served it. The actor walks the full nested reply tree up to maxDepth and stops exactly at your maxComments cap. User-comment-history mode requires Reddit OAuth credentials (configured by the actor author); without them, user inputs are skipped with a warning. If every configured source is unavailable, the run finishes with a clear status message and an empty dataset rather than erroring.
Use from an AI agent (MCP)
This actor is MCP-ready. Exposed through the Apify MCP server, an agent can pass a single Reddit reference (URL, subreddit, user, or keyword) and receive structured comment JSON — ideal for "summarize what people think about X on Reddit" style tasks. Keep responseFormat: concise for the smallest, cleanest payloads.
Related actors
- reddit-posts-comments-scraper — post-level data + subreddit analytics.
- x-tweet-scraper — comments/replies on X (Twitter).
- youtube-comments-scraper — comments on YouTube videos.
FAQ
Do I need a Reddit account or API key? No. Data sourcing is fully managed by the actor author — you just provide inputs. (The one exception is user-comment-history mode, which requires Reddit API credentials configured by the author.)
Can I scrape an entire subreddit's history? You get the newest comments via the subreddit comment feed (Reddit caps how far back this paginates). For older content, pass specific thread URLs.
Why didn't I get every single reply on a huge thread? Reddit collapses very deep branches behind "continue this thread" stubs and caps how many it serves anonymously. The actor expands as many as it can within maxComments and maxDepth.
Are deleted/removed comments included? Deleted comment bodies appear as [deleted] / [removed] exactly as Reddit returns them.
Legal & compliance
This actor collects only publicly available Reddit content. It does not log in, bypass authentication, or access private/quarantined communities you could not view in a logged-out browser. You are responsible for using scraped data in compliance with Reddit's Terms of Service, the Reddit User Agreement, applicable data-protection laws (GDPR/CCPA), and any rights of the comment authors. Do not use this data to identify, harass, or de-anonymize individuals. Scrape responsibly.