Reddit Scraper — Posts, Comments, Search & Users (Reliable)
Pricing
Pay per usage
Reddit Scraper — Posts, Comments, Search & Users (Reliable)
Scrape Reddit posts, comment trees, search results, and user activity as clean JSON — engineered for reliability: rate-limit-aware pacing, host + proxy rotation, and per-target fault isolation. HTTP-only, no browser, no login.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
William Fordyce
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
A Reddit scraper built for one thing above all: finishing every run. Scrape subreddit posts, full comment trees, Reddit search results, and user post/comment history as clean JSON — no login, no cookies, no official Reddit API key, and no headless browser. Built for social listening, brand monitoring, market research, and dataset building.
Why this scraper succeeds where others fail
Reddit aggressively defends its public endpoints: rate limits (HTTP 429), IP blocks (HTTP 403), and a JavaScript "Please wait for verification" bot wall that silently breaks plain .json scrapers. That wall is exactly why other Reddit actors fail 40%+ of their runs. This actor was engineered around those failure modes from day one:
- Native anonymous identity — instead of hammering the fragile public
.jsonpages, the actor performs the exact same anonymous handshake reddit.com runs for every logged-out visitor and reads data through Reddit's own application gateway. No account, no login, no credentials — and no JS-challenge wall. - Rate-limit-aware pacing — the actor reads Reddit's
x-ratelimit-remaining/x-ratelimit-resetresponse headers on every request and proactively slows down before a 429 ever happens, on top of politely jittered 1–2 s request spacing. - Four-host fallback ladder — if Reddit's gateway ever misbehaves, requests automatically fail over across four hosts that serve identical data (
oauth→www→old→api.reddit.com). - Proxy session rotation — when running with Apify residential proxies (the default), every blocked request retries from a brand-new IP session with a brand-new identity and a fresh rate-limit budget.
- Exponential backoff with jitter — retries start at ~2 s and back off up to 60 s (max 5 attempts per request), so transient hiccups never become failed runs.
- Per-target fault isolation — one private, banned, or misspelled subreddit never kills your run. Failures are recorded as
dataType: "error"items and every other target still delivers. - HTTP-only, no headless browser — runs in 256–512 MB of memory, fast and cheap.
What you can scrape
| Mode | Input | What you get |
|---|---|---|
| Subreddits | subreddits + sort / topPeriod | Posts from hot / new / top / rising listings — a complete subreddit scraper. |
| Search | searchQueries (+ optional searchSubreddit) | Posts matching any keyword across all of Reddit or inside one subreddit — ideal for brand monitoring. |
| Users | users + userDataType | Any user's submitted posts, comments, or both. |
| Start URLs | startUrls | Paste any Reddit URL — subreddit, post, user, or search page. The type is auto-detected. |
| Comments | includeComments: true | The full comment tree of every scraped post, flattened into one item per comment — a true Reddit comments scraper. |
All modes are combinable in a single run, and maxItems is split fairly across all your targets.
Input
| Field | Type | Default | Description |
|---|---|---|---|
startUrls | array | [] | Reddit URLs of any type (subreddit / post / user / search) — auto-detected. |
subreddits | array | [] | Subreddit names, with or without r/. |
sort | enum | new | hot, new, top, rising. |
topPeriod | enum | week | hour, day, week, month, year, all (only for top). |
searchQueries | array | [] | Keywords/phrases to search for. |
searchSort | enum | relevance | relevance, hot, top, new, comments. |
searchSubreddit | string | "" | Restrict all searches to one subreddit. |
users | array | [] | Usernames, with or without u/. |
userDataType | enum | posts | posts, comments, or both. |
includeComments | boolean | false | Also fetch each post's comment tree. |
maxItems | integer | 100 | Max posts/user items across all sources combined (budget is split fairly per target). |
maxCommentsPerPost | integer | 50 | Comment cap per post when includeComments is on. |
proxyConfiguration | object | Apify residential | Proxy settings. Residential strongly recommended. |
Example input — monitor two subreddits and a brand keyword, with comments:
{"subreddits": ["programming", "smallbusiness"],"sort": "new","searchQueries": ["apify"],"includeComments": true,"maxItems": 60,"maxCommentsPerPost": 20}
Output
One dataset item per post and per comment.
Post (dataType: "post"):
{"dataType": "post","id": "1u28egw","subreddit": "programming","url": "https://example.com/article","title": "The new unwritten laws of software engineering","author": "whiskeytown79","selftext": "","score": 1240,"upvoteRatio": 0.95,"numComments": 312,"createdAt": "2026-06-10T17:34:44.000Z","flair": "Discussion","isNsfw": false,"mediaUrls": ["https://external-preview.redd.it/..."],"permalink": "https://www.reddit.com/r/programming/comments/1u28egw/...","fetchedAt": "2026-06-10T21:12:20.233Z","commentsScraped": 20,"moreCommentsSkipped": 12}
commentsScraped / moreCommentsSkipped appear when includeComments is on — moreCommentsSkipped counts the comments hiding behind Reddit's "load more comments" stubs beyond your per-post cap.
Comment (dataType: "comment") — postId, parentId, and depth let you re-assemble the full thread tree (depth: 0 = top level, where parentId === postId):
{"dataType": "comment","id": "oqvw4be","postId": "1u28egw","parentId": "oqvmfr6","depth": 1,"subreddit": "programming","author": "vattenpuss","body": "Oh you didn't get the memo? ...","score": 21,"createdAt": "2026-06-10T18:02:11.000Z","flair": null,"permalink": "https://www.reddit.com/r/programming/comments/1u28egw/.../oqvw4be/","fetchedAt": "2026-06-10T21:12:24.108Z"}
Error (dataType: "error") — pushed instead of crashing when a single target fails; the run continues and still succeeds:
{"dataType": "error","target": "r/some_private_subreddit","error": "Reddit refused access (HTTP 403: private)","fetchedAt": "2026-06-10T21:17:48.974Z"}
Pricing (pay per event)
You only pay for data you actually receive:
| Event | Charged when | Suggested price |
|---|---|---|
item-scraped | One post or comment is pushed to the dataset | $0.002 |
Error items are never charged. Example: 100 posts with 50 comments each = 5,100 items ≈ $10.20; 500 posts without comments ≈ $1.00.
FAQ
Is it legal to scrape Reddit? This actor only collects publicly available data — the same posts and comments anyone can read in a browser without logging in. It collects no private data and accesses no user accounts. As always, consult your own counsel for your specific use case and respect Reddit's User Agreement when republishing content.
Do I need a Reddit account, API key, or cookies? No. The actor uses the same anonymous identity Reddit's own website creates for every logged-out visitor. There is nothing to configure, nothing to expire, and no account that can be banned.
How is this different from other Reddit scrapers? Reliability. The most popular Reddit actors fail a large share of their runs because they treat Reddit's rate limits and bot-wall responses as fatal errors. This actor paces itself using Reddit's own rate-limit headers, retries with exponential backoff, rotates proxy sessions and fallback hosts automatically, and isolates per-target failures — so one bad subreddit or one throttled request never costs you a run.
Why are some comments missing?
Reddit returns large threads partially, hiding deeper branches behind "load more comments" stubs. The actor scrapes up to maxCommentsPerPost comments per post and reports how many remained hidden as moreCommentsSkipped on the post item, so you always know what you got.
Can I monitor subreddits or keywords on a schedule?
Yes — add the actor to an Apify Schedule (e.g. every hour with sort: "new") and connect a webhook or one of Apify's integrations (Google Sheets, Slack, Make, Zapier) for an always-on social listening pipeline.
What proxies should I use? Apify residential proxies (the default). Reddit blocks most datacenter IP ranges outright; residential sessions combined with the actor's automatic rotation deliver the reliability this actor is built for.
Tips
- For monitoring use cases,
sort: "new"+ a schedule beatshot— you see every post once, as it appears. - Brand monitoring works best with
searchQueries+includeComments: true— the sentiment usually lives in the comments. - Use
searchSort: "comments"to find the most discussed posts about a topic. - Keep
maxItemsaligned with your schedule frequency (e.g. hourly runs rarely need more than 100 items per subreddit).