Reddit Search Scraper — Posts, Comments & Users
Pricing
from $2.00 / 1,000 results
Reddit Search Scraper — Posts, Comments & Users
Scrape Reddit subreddit search with no API key or login. Export posts and comments to CSV/JSON — a Reddit API alternative for keyword monitoring.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer
Logiover
Maintained by CommunityActor stats
0
Bookmarked
23
Total users
17
Monthly active users
5 days ago
Last modified
Categories
Share
Reddit Search Scraper
Search within any subreddit by keyword + sort + time window, with no login or API key. Returns each matching post (or comment) with title, author, subreddit, body text, permalink and timestamps.
How it works (important)
As of mid-2026 Reddit hard-blocks the legacy search.json API (both www and old.reddit return 403, even over residential proxies with a browser fingerprint). The only logged-out search endpoint still served is the subreddit-scoped Atom feed (/r/{sub}/search.rss).
This actor uses that feed. Consequences you should know before buying:
- A
subredditis required on every search. All-of-Reddit (global) search has no working logged-out endpoint anymore and is skipped with a warning. - ~25 results per search, no pagination. The feed returns at most ~25 of the most relevant items per query and exposes no cursor. To get more coverage, run more searches (different keywords / subreddits / sort+time combos).
- No numeric signals. The RSS feed does not carry
score,upvoteRatio,numComments, orawards— those lived only on the now-dead.jsonAPI. They are not returned. If you need upvotes/comment counts, this is not the right tool.
Features
- Bulk multi-search: pass many
{query, subreddit, sort, time, type}objects, each runs independently - All sort modes the feed honors:
relevance,hot,top,new,comments - All time windows:
hour,day,week,month,year,all type:link(posts, default) orcomment- Residential-proxy session rotation + UA rotation + 429/403 backoff for reliable feed fetches
- Clean, deduplicated rows with decoded text (handles Reddit's double-encoded entities)
Input
{"searches": [{ "query": "ai agent", "subreddit": "MachineLearning", "sort": "new", "time": "month", "type": "link" },{ "query": "side hustle", "subreddit": "Entrepreneur", "sort": "top", "time": "all", "type": "link" }],"maxResultsPerSearch": 25}
| Field | Type | Default | Notes |
|---|---|---|---|
searches | array | (required) | Each {query, subreddit (required), sort?, time?, type?} |
maxResultsPerSearch | int | 25 | Cap per search. Feed returns ~25 max, so higher values have no effect. |
proxyConfig | object | residential | Reddit blocks datacenter IPs; residential proxy is used by default. |
Sort + time + type
| Field | Options |
|---|---|
sort | relevance, hot, top, new, comments |
time | hour, day, week, month, year, all |
type | link (posts, default), comment |
Output (one row per result)
{"resultType": "link","id": "1twtdob","fullname": "t3_1twtdob","subreddit": "MachineLearning","author": "Intellerce","title": "We built a source-available LLM reliability library","text": "TL;DR: Reliability techniques that boost an LLM's correctness...","url": "https://www.reddit.com/r/MachineLearning/comments/1twtdob/...","permalink": "https://www.reddit.com/r/MachineLearning/comments/1twtdob/...","createdAt": "2026-06-04T16:51:29+00:00","editedAt": "2026-06-04T16:51:29+00:00","searchQuery": { "query": "ai agent", "subreddit": "MachineLearning", "sort": "new", "time": "month", "type": "link" },"scrapedAt": "2026-06-06T17:59:00.000Z"}
The output object also contains
score,upvoteRatio,numComments,awards,flair,thumbnail,isNsfwand similar fields for schema stability, but they are alwaysnullin RSS mode (the feed does not carry them). Do not rely on them.
Use cases
- Brand / keyword monitoring inside specific communities (run on a schedule)
- Competitor & topic intel in niche subreddits
- Trend research / content discovery — pull
sort=top, time=weekper subreddit - Sentiment & NLP pipelines — bulk-ingest post/comment text
Notes
- Residential proxy recommended/used: Reddit blocks Apify datacenter IPs. The actor defaults to the residential pool and rotates sessions on 403/429.
- Want more than ~25 per topic? Add more
searchesentries (vary keyword, sort, and time window) — that is the only way to widen coverage given the feed's cap. - Need scores/comment counts or full comment threads? Reddit no longer exposes these to logged-out clients; use a dedicated OAuth-based Reddit tool instead.
FAQ
Is this a Reddit API alternative for searching subreddits?
Yes. Reddit's logged-out search.json API is hard-blocked as of mid-2026, so this acts as a no-API-key way to search any subreddit by keyword. It returns posts and comments from the subreddit-scoped feed, with a ~25-result cap per search.
How do I export Reddit posts and comments to CSV or JSON?
Run a search and Apify stores the matching rows in a dataset you can download as CSV or JSON (or pull via API). Each row carries title, author, subreddit, text, permalink and timestamps — ready for a spreadsheet or NLP pipeline.
Can I scrape Reddit without an API or login?
Yes — no login, OAuth, or developer app is required. The actor reads Reddit's public subreddit search feed over a residential proxy, so it works without a Reddit account or API credentials.
📝 Changelog
2026-06-07
- 📚 Docs: added coverage for using the actor as a Reddit API alternative, exporting Reddit posts/comments to CSV/JSON, and scraping Reddit without an API key or login.
2026-06-06
- 📚 Docs & schema accuracy pass: README now reflects the RSS-only reality (subreddit required, ~25/search cap, no score/comments). Removed always-null
score/numCommentscolumns from the dataset table; added the populatedtextcolumn.
2026-06-05
- 🛡️ Reliability fix: results no longer dropped by strict output validation — runs complete cleanly.
2026-06-04
- Verified live & refreshed build — reliability/maintenance pass.