Reddit Scraper — Posts, Comments, Search & Users (Reliable) avatar

Reddit Scraper — Posts, Comments, Search & Users (Reliable)

Pricing

Pay per usage

Go to Apify Store
Reddit Scraper — Posts, Comments, Search & Users (Reliable)

Reddit Scraper — Posts, Comments, Search & Users (Reliable)

Scrape Reddit posts, comment trees, search results, and user activity as clean JSON — engineered for reliability: rate-limit-aware pacing, host + proxy rotation, and per-target fault isolation. HTTP-only, no browser, no login.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

William Fordyce

William Fordyce

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

A Reddit scraper built for one thing above all: finishing every run. Scrape subreddit posts, full comment trees, Reddit search results, and user post/comment history as clean JSON — no login, no cookies, no official Reddit API key, and no headless browser. Built for social listening, brand monitoring, market research, and dataset building.

Why this scraper succeeds where others fail

Reddit aggressively defends its public endpoints: rate limits (HTTP 429), IP blocks (HTTP 403), and a JavaScript "Please wait for verification" bot wall that silently breaks plain .json scrapers. That wall is exactly why other Reddit actors fail 40%+ of their runs. This actor was engineered around those failure modes from day one:

  • Native anonymous identity — instead of hammering the fragile public .json pages, the actor performs the exact same anonymous handshake reddit.com runs for every logged-out visitor and reads data through Reddit's own application gateway. No account, no login, no credentials — and no JS-challenge wall.
  • Rate-limit-aware pacing — the actor reads Reddit's x-ratelimit-remaining / x-ratelimit-reset response headers on every request and proactively slows down before a 429 ever happens, on top of politely jittered 1–2 s request spacing.
  • Four-host fallback ladder — if Reddit's gateway ever misbehaves, requests automatically fail over across four hosts that serve identical data (oauthwwwoldapi .reddit.com).
  • Proxy session rotation — when running with Apify residential proxies (the default), every blocked request retries from a brand-new IP session with a brand-new identity and a fresh rate-limit budget.
  • Exponential backoff with jitter — retries start at ~2 s and back off up to 60 s (max 5 attempts per request), so transient hiccups never become failed runs.
  • Per-target fault isolation — one private, banned, or misspelled subreddit never kills your run. Failures are recorded as dataType: "error" items and every other target still delivers.
  • HTTP-only, no headless browser — runs in 256–512 MB of memory, fast and cheap.

What you can scrape

ModeInputWhat you get
Subredditssubreddits + sort / topPeriodPosts from hot / new / top / rising listings — a complete subreddit scraper.
SearchsearchQueries (+ optional searchSubreddit)Posts matching any keyword across all of Reddit or inside one subreddit — ideal for brand monitoring.
Usersusers + userDataTypeAny user's submitted posts, comments, or both.
Start URLsstartUrlsPaste any Reddit URL — subreddit, post, user, or search page. The type is auto-detected.
CommentsincludeComments: trueThe full comment tree of every scraped post, flattened into one item per comment — a true Reddit comments scraper.

All modes are combinable in a single run, and maxItems is split fairly across all your targets.

Input

FieldTypeDefaultDescription
startUrlsarray[]Reddit URLs of any type (subreddit / post / user / search) — auto-detected.
subredditsarray[]Subreddit names, with or without r/.
sortenumnewhot, new, top, rising.
topPeriodenumweekhour, day, week, month, year, all (only for top).
searchQueriesarray[]Keywords/phrases to search for.
searchSortenumrelevancerelevance, hot, top, new, comments.
searchSubredditstring""Restrict all searches to one subreddit.
usersarray[]Usernames, with or without u/.
userDataTypeenumpostsposts, comments, or both.
includeCommentsbooleanfalseAlso fetch each post's comment tree.
maxItemsinteger100Max posts/user items across all sources combined (budget is split fairly per target).
maxCommentsPerPostinteger50Comment cap per post when includeComments is on.
proxyConfigurationobjectApify residentialProxy settings. Residential strongly recommended.

Example input — monitor two subreddits and a brand keyword, with comments:

{
"subreddits": ["programming", "smallbusiness"],
"sort": "new",
"searchQueries": ["apify"],
"includeComments": true,
"maxItems": 60,
"maxCommentsPerPost": 20
}

Output

One dataset item per post and per comment.

Post (dataType: "post"):

{
"dataType": "post",
"id": "1u28egw",
"subreddit": "programming",
"url": "https://example.com/article",
"title": "The new unwritten laws of software engineering",
"author": "whiskeytown79",
"selftext": "",
"score": 1240,
"upvoteRatio": 0.95,
"numComments": 312,
"createdAt": "2026-06-10T17:34:44.000Z",
"flair": "Discussion",
"isNsfw": false,
"mediaUrls": ["https://external-preview.redd.it/..."],
"permalink": "https://www.reddit.com/r/programming/comments/1u28egw/...",
"fetchedAt": "2026-06-10T21:12:20.233Z",
"commentsScraped": 20,
"moreCommentsSkipped": 12
}

commentsScraped / moreCommentsSkipped appear when includeComments is on — moreCommentsSkipped counts the comments hiding behind Reddit's "load more comments" stubs beyond your per-post cap.

Comment (dataType: "comment") — postId, parentId, and depth let you re-assemble the full thread tree (depth: 0 = top level, where parentId === postId):

{
"dataType": "comment",
"id": "oqvw4be",
"postId": "1u28egw",
"parentId": "oqvmfr6",
"depth": 1,
"subreddit": "programming",
"author": "vattenpuss",
"body": "Oh you didn't get the memo? ...",
"score": 21,
"createdAt": "2026-06-10T18:02:11.000Z",
"flair": null,
"permalink": "https://www.reddit.com/r/programming/comments/1u28egw/.../oqvw4be/",
"fetchedAt": "2026-06-10T21:12:24.108Z"
}

Error (dataType: "error") — pushed instead of crashing when a single target fails; the run continues and still succeeds:

{
"dataType": "error",
"target": "r/some_private_subreddit",
"error": "Reddit refused access (HTTP 403: private)",
"fetchedAt": "2026-06-10T21:17:48.974Z"
}

Pricing (pay per event)

You only pay for data you actually receive:

EventCharged whenSuggested price
item-scrapedOne post or comment is pushed to the dataset$0.002

Error items are never charged. Example: 100 posts with 50 comments each = 5,100 items ≈ $10.20; 500 posts without comments ≈ $1.00.

FAQ

Is it legal to scrape Reddit? This actor only collects publicly available data — the same posts and comments anyone can read in a browser without logging in. It collects no private data and accesses no user accounts. As always, consult your own counsel for your specific use case and respect Reddit's User Agreement when republishing content.

Do I need a Reddit account, API key, or cookies? No. The actor uses the same anonymous identity Reddit's own website creates for every logged-out visitor. There is nothing to configure, nothing to expire, and no account that can be banned.

How is this different from other Reddit scrapers? Reliability. The most popular Reddit actors fail a large share of their runs because they treat Reddit's rate limits and bot-wall responses as fatal errors. This actor paces itself using Reddit's own rate-limit headers, retries with exponential backoff, rotates proxy sessions and fallback hosts automatically, and isolates per-target failures — so one bad subreddit or one throttled request never costs you a run.

Why are some comments missing? Reddit returns large threads partially, hiding deeper branches behind "load more comments" stubs. The actor scrapes up to maxCommentsPerPost comments per post and reports how many remained hidden as moreCommentsSkipped on the post item, so you always know what you got.

Can I monitor subreddits or keywords on a schedule? Yes — add the actor to an Apify Schedule (e.g. every hour with sort: "new") and connect a webhook or one of Apify's integrations (Google Sheets, Slack, Make, Zapier) for an always-on social listening pipeline.

What proxies should I use? Apify residential proxies (the default). Reddit blocks most datacenter IP ranges outright; residential sessions combined with the actor's automatic rotation deliver the reliability this actor is built for.

Tips

  • For monitoring use cases, sort: "new" + a schedule beats hot — you see every post once, as it appears.
  • Brand monitoring works best with searchQueries + includeComments: true — the sentiment usually lives in the comments.
  • Use searchSort: "comments" to find the most discussed posts about a topic.
  • Keep maxItems aligned with your schedule frequency (e.g. hourly runs rarely need more than 100 items per subreddit).