Reddit Scraper Pro — Posts, Comments, Subreddits, No API Key
Pricing
Pay per usage
Reddit Scraper Pro — Posts, Comments, Subreddits, No API Key
Reddit scraper via public JSON — posts + comments, no login. 20 fields/post (score, ratio, flair, NSFW). CSV/JSON. 101 runs · 6 users · u30d=2 · 27/30d. Trend research + LLM training data. blog.spinov.online · dev.to/0012303 · spinov001@gmail.com
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Alex
Maintained by CommunityActor stats
0
Bookmarked
6
Total users
2
Monthly active users
a day ago
Last modified
Categories
Share
Reddit Discussion Scraper — JSON API, no HTML parsing
Pulls posts (and optionally comment threads) from any public subreddit or Reddit search via Reddit's native JSON endpoint. No login, no Reddit OAuth app, no headless browser.
What you get per post (21 fields verified against src/main.js)
{"id": "1b2c3d4","title": "What tools do you use for market research?","author": "startup_founder","subreddit": "Entrepreneur","score": 847,"upvoteRatio": 0.94,"numComments": 234,"createdUtc": "2026-03-17T15:30:00.000Z","url": "https://reddit.com/r/Entrepreneur/comments/...","selfText": "I've been looking for affordable tools.","linkUrl": "https://example.com/article","isVideo": false,"thumbnail": "https://...","flair": "Discussion","awards": 3,"isNSFW": false,"isStickied": false,"domain": "self.Entrepreneur","source": "r/Entrepreneur","scrapedAt": "2026-04-29T16:32:00.000Z","comments": [{ "id": "abc123", "author": "data_analyst", "body": "I use a combination of...", "score": 156, "createdUtc": "2026-03-17T16:00:00.000Z", "depth": 0 }]}
Comment objects carry 6 fields: id, author, body, score, createdUtc, depth.
Input parameters (verified against .actor/input_schema.json)
| Parameter | Type | Default | Range | Description |
|---|---|---|---|---|
subreddits | Array | [] | — | Subreddit names; the r/ prefix is optional and stripped |
searchQueries | Array | [] | — | Search terms across all of Reddit |
maxPostsPerSource | Integer | 50 | 1–500 | Cap per subreddit / per search query — multi-source runs multiply |
includeComments | Boolean | true | — | Fetch comment threads for each post |
maxCommentsPerPost | Integer | 20 | 1–100 | Global cap on comments per post (see depth-first note below) |
sortBy | String | "hot" | hot/new/top/rising | For search queries: "hot" is silently rewritten to "relevance" |
timeFilter | String | "week" | hour/day/week/month/year/all | Applies primarily to top sort |
If no subreddits and no searchQueries are provided, the actor falls back to scraping r/technology so the run does not error on empty input.
Use cases
- Market research — what people say about your product, brand, or industry
- Sentiment analysis — collect posts and comments for NLP / LLM pipelines
- Trend monitoring — track emerging topics across your target subreddits
- Competitive intelligence — monitor competitor mentions and complaints
- Content research — find top questions and topics your audience cares about
- Lead generation — identify users asking for your kind of product/service
How it works
- Fetches
https://old.reddit.com/r/<subreddit>/<sortBy>.jsonandhttps://old.reddit.com/search.jsondirectly. Theold.reddit.comhost is less aggressive about IP blocking thanwww.reddit.comand exposes the same JSON shape. - Uses Apify Residential Proxy (US) when available; falls back to default Apify proxy; falls back to direct fetch in local development.
- Random 2–5 second delay before every request (rate-limit hygiene).
- Rotates through 4 desktop User-Agent strings per request (Chrome Win/Mac/Linux + Firefox Win).
- Cursor-based pagination via Reddit's
afterparameter. CheerioCrawlerwithmaxConcurrency: 1,maxRequestRetries: 3, crawler-level capmaxRequestsPerCrawl: 500(request budget — listing requests + comment requests share this pool).
Honest limitations
- Comment cap is depth-first, not breadth-first.
extractCommentsrecurses into the first reply chain before walking siblings. WithmaxCommentsPerPost: 20and a deep first thread, you may get 20 nested replies from a single root comment and zero from the other roots. To sample more roots, raise the cap. - Comments are fetched with
sort=bestonly. No knob to switch totop/new/controversialfor comment ordering — only the post listing sort is configurable. sortBy: "hot"for search queries silently becomes"relevance". Reddit's/search.jsondoes not honorhot, so the actor rewrites it. If you settopfor search,topis preserved.maxRequestsPerCrawl: 500is a request budget, not a post budget. Each post-with-comments costs 2 requests (1 listing page returns up to 100 posts, plus 1 request per post for comments). 100 posts with comments = 1 + 100 = 101 requests. Multi-source runs share this 500-request pool — budget accordingly.maxPostsPerSourceis per source. 5 subreddits × 50 = up to 250 posts per run.- Apify Free plan = datacenter proxy only. Some subreddits return 403 from datacenter IPs. The actor falls back gracefully but yield drops. For reliable runs, use a paid Apify plan with residential access, or run locally.
old.reddit.comis a long-lived but not-officially-supported subdomain. Reddit could deprecate it; if that happens, this actor would need a host swap towww.reddit.comand likely OAuth.- No JSON
Listingschema validation. Malformed responses (HTML 502 instead of JSON, anti-bot challenge) are caught and skipped silently with alog.warning. A run with all warnings looks like a successful zero-record run. thumbnailis normalized tonullwhen Reddit returns sentinel values"self"or"default".selfText,linkUrl,flair,domaincan benullfor posts without those fields.- Reddit rate limits are unpublished. The 2–5s delay + concurrency=1 makes 429s rare in practice but not impossible — persistent 429 returns silent zero-record after the 3 internal retries.
Quick start
- Click Try for free above.
- Add subreddit names (without
r/) tosubredditsor search terms tosearchQueries. - Set
maxPostsPerSource(default 50, max 500). - Run. Download JSON or CSV from Storage → Dataset.
Programmatic example:
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("knotless_cadence/reddit-discussion-scraper").call(run_input={"subreddits": ["startups", "SaaS", "Entrepreneur"],"maxPostsPerSource": 50,"sortBy": "top","timeFilter": "month","includeComments": True,"maxCommentsPerPost": 30,})for post in client.dataset(run["defaultDatasetId"]).iterate_items():print(post["score"], post["title"])
FAQ
Why the JSON API instead of HTML scraping?
HTML scrapers break every time Reddit updates their design. The .json endpoints return structured data in a format that has been stable for years — the same shape used by Reddit's own apps and third-party clients.
Can I scrape private subreddits? No. Only publicly accessible subreddits and public Reddit search results.
Does it need Reddit credentials? No. All requests go to public endpoints.
How many posts per run?
Up to maxPostsPerSource per subreddit / per search query, capped at 500 by input_schema. Whole-run total = len(sources) × maxPostsPerSource, bounded by the 500-request crawler budget.
Proof of delivery: 31 published Apify actors (78 total in portfolio). The flagship Trustpilot scraper has 951 lifetime production runs; this Reddit scraper has 82+ runs. One paid 3-article series shipped in March 2026 ($150, proxy industry). Pilot pricing locked through May 2026.
Sample request? Reply sample to spinov001@gmail.com and we'll send 2 published case-study articles within 24 hours.
Need a custom Reddit variant?
Common asks delivered for paying clients:
- Sentiment dashboard — daily sentiment scoring on a list of subreddits, fed into Looker/Metabase
- Keyword alerts — webhook fires the moment a brand/term appears in target subreddits
- Competitor tracking — pull all mentions of competitor names, summarize weekly
- Comment-thread expansion — recursive comment graph for any post, exportable as edge list
- Cross-source merge — Reddit + HackerNews + Bluesky into one normalized feed
| Tier | Price | Includes |
|---|---|---|
| Pilot | $97 | 1 custom actor or modification, 7-day support |
| Standard | $297 | Custom actor + Slack/email alerts on results, 30-day support |
| Premium | $797 | Custom actor + dashboard + 90-day support + 1 modification round |
Email: spinov001@gmail.com
Blog (case studies + writeups): https://blog.spinov.online
Telegram channel (scraping & data engineering tips): https://t.me/scraping_ai
Pilot pricing while we grow our public portfolio. Most pilots delivered inside 48–72 hours.
More from this author
- 31 published Apify actors (78 total in portfolio): apify.com/knotless_cadence
- Technical write-ups + case studies: blog.spinov.online
- Daily scraping/AI tips: t.me/scraping_ai
Related tools by knotless_cadence on Apify:
- Walmart Reviews Scraper — product reviews to CSV/JSON/Excel, 17 fields per review, bypasses Walmart's 100-review UI cap. Same pure-HTTP pipeline as this Reddit actor.
- Trustpilot Review Scraper — 951 lifetime production runs
- Email Extractor Pro — bulk email extraction from websites
- Google News Scraper — news mention tracking
- Hacker News Scraper — top/new/best/ask/show/job stories with comment trees
Honest disclosure
- Public Reddit data only. We do not scrape private user data, accounts, deleted content from caches, or anything behind a login.
- Independent project — not affiliated with Reddit, Inc.
- This actor is maintained by the same author who runs
apify.com/knotless_cadence(78 actors, 31 public).