Reddit Scraper Pro — Posts, Comments, Subreddits, No API Key avatar

Reddit Scraper Pro — Posts, Comments, Subreddits, No API Key

Pricing

Pay per usage

Go to Apify Store
Reddit Scraper Pro — Posts, Comments, Subreddits, No API Key

Reddit Scraper Pro — Posts, Comments, Subreddits, No API Key

Reddit scraper via public JSON — posts + comments, no login. 20 fields/post (score, ratio, flair, NSFW). CSV/JSON. 101 runs · 6 users · u30d=2 · 27/30d. Trend research + LLM training data. blog.spinov.online · dev.to/0012303 · spinov001@gmail.com

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Alex

Maintained by Community

Actor stats

0

Bookmarked

6

Total users

2

Monthly active users

a day ago

Last modified

Share

Reddit Discussion Scraper — JSON API, no HTML parsing

Pulls posts (and optionally comment threads) from any public subreddit or Reddit search via Reddit's native JSON endpoint. No login, no Reddit OAuth app, no headless browser.


What you get per post (21 fields verified against src/main.js)

{
"id": "1b2c3d4",
"title": "What tools do you use for market research?",
"author": "startup_founder",
"subreddit": "Entrepreneur",
"score": 847,
"upvoteRatio": 0.94,
"numComments": 234,
"createdUtc": "2026-03-17T15:30:00.000Z",
"url": "https://reddit.com/r/Entrepreneur/comments/...",
"selfText": "I've been looking for affordable tools.",
"linkUrl": "https://example.com/article",
"isVideo": false,
"thumbnail": "https://...",
"flair": "Discussion",
"awards": 3,
"isNSFW": false,
"isStickied": false,
"domain": "self.Entrepreneur",
"source": "r/Entrepreneur",
"scrapedAt": "2026-04-29T16:32:00.000Z",
"comments": [
{ "id": "abc123", "author": "data_analyst", "body": "I use a combination of...", "score": 156, "createdUtc": "2026-03-17T16:00:00.000Z", "depth": 0 }
]
}

Comment objects carry 6 fields: id, author, body, score, createdUtc, depth.


Input parameters (verified against .actor/input_schema.json)

ParameterTypeDefaultRangeDescription
subredditsArray[]Subreddit names; the r/ prefix is optional and stripped
searchQueriesArray[]Search terms across all of Reddit
maxPostsPerSourceInteger501–500Cap per subreddit / per search query — multi-source runs multiply
includeCommentsBooleantrueFetch comment threads for each post
maxCommentsPerPostInteger201–100Global cap on comments per post (see depth-first note below)
sortByString"hot"hot/new/top/risingFor search queries: "hot" is silently rewritten to "relevance"
timeFilterString"week"hour/day/week/month/year/allApplies primarily to top sort

If no subreddits and no searchQueries are provided, the actor falls back to scraping r/technology so the run does not error on empty input.


Use cases

  • Market research — what people say about your product, brand, or industry
  • Sentiment analysis — collect posts and comments for NLP / LLM pipelines
  • Trend monitoring — track emerging topics across your target subreddits
  • Competitive intelligence — monitor competitor mentions and complaints
  • Content research — find top questions and topics your audience cares about
  • Lead generation — identify users asking for your kind of product/service

How it works

  • Fetches https://old.reddit.com/r/<subreddit>/<sortBy>.json and https://old.reddit.com/search.json directly. The old.reddit.com host is less aggressive about IP blocking than www.reddit.com and exposes the same JSON shape.
  • Uses Apify Residential Proxy (US) when available; falls back to default Apify proxy; falls back to direct fetch in local development.
  • Random 2–5 second delay before every request (rate-limit hygiene).
  • Rotates through 4 desktop User-Agent strings per request (Chrome Win/Mac/Linux + Firefox Win).
  • Cursor-based pagination via Reddit's after parameter.
  • CheerioCrawler with maxConcurrency: 1, maxRequestRetries: 3, crawler-level cap maxRequestsPerCrawl: 500 (request budget — listing requests + comment requests share this pool).

Honest limitations

  • Comment cap is depth-first, not breadth-first. extractComments recurses into the first reply chain before walking siblings. With maxCommentsPerPost: 20 and a deep first thread, you may get 20 nested replies from a single root comment and zero from the other roots. To sample more roots, raise the cap.
  • Comments are fetched with sort=best only. No knob to switch to top/new/controversial for comment ordering — only the post listing sort is configurable.
  • sortBy: "hot" for search queries silently becomes "relevance". Reddit's /search.json does not honor hot, so the actor rewrites it. If you set top for search, top is preserved.
  • maxRequestsPerCrawl: 500 is a request budget, not a post budget. Each post-with-comments costs 2 requests (1 listing page returns up to 100 posts, plus 1 request per post for comments). 100 posts with comments = 1 + 100 = 101 requests. Multi-source runs share this 500-request pool — budget accordingly.
  • maxPostsPerSource is per source. 5 subreddits × 50 = up to 250 posts per run.
  • Apify Free plan = datacenter proxy only. Some subreddits return 403 from datacenter IPs. The actor falls back gracefully but yield drops. For reliable runs, use a paid Apify plan with residential access, or run locally.
  • old.reddit.com is a long-lived but not-officially-supported subdomain. Reddit could deprecate it; if that happens, this actor would need a host swap to www.reddit.com and likely OAuth.
  • No JSON Listing schema validation. Malformed responses (HTML 502 instead of JSON, anti-bot challenge) are caught and skipped silently with a log.warning. A run with all warnings looks like a successful zero-record run.
  • thumbnail is normalized to null when Reddit returns sentinel values "self" or "default".
  • selfText, linkUrl, flair, domain can be null for posts without those fields.
  • Reddit rate limits are unpublished. The 2–5s delay + concurrency=1 makes 429s rare in practice but not impossible — persistent 429 returns silent zero-record after the 3 internal retries.

Quick start

  1. Click Try for free above.
  2. Add subreddit names (without r/) to subreddits or search terms to searchQueries.
  3. Set maxPostsPerSource (default 50, max 500).
  4. Run. Download JSON or CSV from Storage → Dataset.

Programmatic example:

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("knotless_cadence/reddit-discussion-scraper").call(
run_input={
"subreddits": ["startups", "SaaS", "Entrepreneur"],
"maxPostsPerSource": 50,
"sortBy": "top",
"timeFilter": "month",
"includeComments": True,
"maxCommentsPerPost": 30,
}
)
for post in client.dataset(run["defaultDatasetId"]).iterate_items():
print(post["score"], post["title"])

FAQ

Why the JSON API instead of HTML scraping? HTML scrapers break every time Reddit updates their design. The .json endpoints return structured data in a format that has been stable for years — the same shape used by Reddit's own apps and third-party clients.

Can I scrape private subreddits? No. Only publicly accessible subreddits and public Reddit search results.

Does it need Reddit credentials? No. All requests go to public endpoints.

How many posts per run? Up to maxPostsPerSource per subreddit / per search query, capped at 500 by input_schema. Whole-run total = len(sources) × maxPostsPerSource, bounded by the 500-request crawler budget.


Proof of delivery: 31 published Apify actors (78 total in portfolio). The flagship Trustpilot scraper has 951 lifetime production runs; this Reddit scraper has 82+ runs. One paid 3-article series shipped in March 2026 ($150, proxy industry). Pilot pricing locked through May 2026.

Sample request? Reply sample to spinov001@gmail.com and we'll send 2 published case-study articles within 24 hours.


Need a custom Reddit variant?

Common asks delivered for paying clients:

  • Sentiment dashboard — daily sentiment scoring on a list of subreddits, fed into Looker/Metabase
  • Keyword alerts — webhook fires the moment a brand/term appears in target subreddits
  • Competitor tracking — pull all mentions of competitor names, summarize weekly
  • Comment-thread expansion — recursive comment graph for any post, exportable as edge list
  • Cross-source merge — Reddit + HackerNews + Bluesky into one normalized feed
TierPriceIncludes
Pilot$971 custom actor or modification, 7-day support
Standard$297Custom actor + Slack/email alerts on results, 30-day support
Premium$797Custom actor + dashboard + 90-day support + 1 modification round

Email: spinov001@gmail.com Blog (case studies + writeups): https://blog.spinov.online Telegram channel (scraping & data engineering tips): https://t.me/scraping_ai

Pilot pricing while we grow our public portfolio. Most pilots delivered inside 48–72 hours.


More from this author

Related tools by knotless_cadence on Apify:


Honest disclosure

  • Public Reddit data only. We do not scrape private user data, accounts, deleted content from caches, or anything behind a login.
  • Independent project — not affiliated with Reddit, Inc.
  • This actor is maintained by the same author who runs apify.com/knotless_cadence (78 actors, 31 public).