Reddit Search Scraper — Posts, Comments & Users avatar

Reddit Search Scraper — Posts, Comments & Users

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Reddit Search Scraper — Posts, Comments & Users

Reddit Search Scraper — Posts, Comments & Users

Scrape Reddit subreddit search with no API key or login. Export posts and comments to CSV/JSON — a Reddit API alternative for keyword monitoring.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Logiover

Logiover

Maintained by Community

Actor stats

0

Bookmarked

23

Total users

17

Monthly active users

5 days ago

Last modified

Share

Reddit Search Scraper

Search within any subreddit by keyword + sort + time window, with no login or API key. Returns each matching post (or comment) with title, author, subreddit, body text, permalink and timestamps.

How it works (important)

As of mid-2026 Reddit hard-blocks the legacy search.json API (both www and old.reddit return 403, even over residential proxies with a browser fingerprint). The only logged-out search endpoint still served is the subreddit-scoped Atom feed (/r/{sub}/search.rss).

This actor uses that feed. Consequences you should know before buying:

  • A subreddit is required on every search. All-of-Reddit (global) search has no working logged-out endpoint anymore and is skipped with a warning.
  • ~25 results per search, no pagination. The feed returns at most ~25 of the most relevant items per query and exposes no cursor. To get more coverage, run more searches (different keywords / subreddits / sort+time combos).
  • No numeric signals. The RSS feed does not carry score, upvoteRatio, numComments, or awards — those lived only on the now-dead .json API. They are not returned. If you need upvotes/comment counts, this is not the right tool.

Features

  • Bulk multi-search: pass many {query, subreddit, sort, time, type} objects, each runs independently
  • All sort modes the feed honors: relevance, hot, top, new, comments
  • All time windows: hour, day, week, month, year, all
  • type: link (posts, default) or comment
  • Residential-proxy session rotation + UA rotation + 429/403 backoff for reliable feed fetches
  • Clean, deduplicated rows with decoded text (handles Reddit's double-encoded entities)

Input

{
"searches": [
{ "query": "ai agent", "subreddit": "MachineLearning", "sort": "new", "time": "month", "type": "link" },
{ "query": "side hustle", "subreddit": "Entrepreneur", "sort": "top", "time": "all", "type": "link" }
],
"maxResultsPerSearch": 25
}
FieldTypeDefaultNotes
searchesarray(required)Each {query, subreddit (required), sort?, time?, type?}
maxResultsPerSearchint25Cap per search. Feed returns ~25 max, so higher values have no effect.
proxyConfigobjectresidentialReddit blocks datacenter IPs; residential proxy is used by default.

Sort + time + type

FieldOptions
sortrelevance, hot, top, new, comments
timehour, day, week, month, year, all
typelink (posts, default), comment

Output (one row per result)

{
"resultType": "link",
"id": "1twtdob",
"fullname": "t3_1twtdob",
"subreddit": "MachineLearning",
"author": "Intellerce",
"title": "We built a source-available LLM reliability library",
"text": "TL;DR: Reliability techniques that boost an LLM's correctness...",
"url": "https://www.reddit.com/r/MachineLearning/comments/1twtdob/...",
"permalink": "https://www.reddit.com/r/MachineLearning/comments/1twtdob/...",
"createdAt": "2026-06-04T16:51:29+00:00",
"editedAt": "2026-06-04T16:51:29+00:00",
"searchQuery": { "query": "ai agent", "subreddit": "MachineLearning", "sort": "new", "time": "month", "type": "link" },
"scrapedAt": "2026-06-06T17:59:00.000Z"
}

The output object also contains score, upvoteRatio, numComments, awards, flair, thumbnail, isNsfw and similar fields for schema stability, but they are always null in RSS mode (the feed does not carry them). Do not rely on them.

Use cases

  • Brand / keyword monitoring inside specific communities (run on a schedule)
  • Competitor & topic intel in niche subreddits
  • Trend research / content discovery — pull sort=top, time=week per subreddit
  • Sentiment & NLP pipelines — bulk-ingest post/comment text

Notes

  • Residential proxy recommended/used: Reddit blocks Apify datacenter IPs. The actor defaults to the residential pool and rotates sessions on 403/429.
  • Want more than ~25 per topic? Add more searches entries (vary keyword, sort, and time window) — that is the only way to widen coverage given the feed's cap.
  • Need scores/comment counts or full comment threads? Reddit no longer exposes these to logged-out clients; use a dedicated OAuth-based Reddit tool instead.

FAQ

Is this a Reddit API alternative for searching subreddits?

Yes. Reddit's logged-out search.json API is hard-blocked as of mid-2026, so this acts as a no-API-key way to search any subreddit by keyword. It returns posts and comments from the subreddit-scoped feed, with a ~25-result cap per search.

How do I export Reddit posts and comments to CSV or JSON?

Run a search and Apify stores the matching rows in a dataset you can download as CSV or JSON (or pull via API). Each row carries title, author, subreddit, text, permalink and timestamps — ready for a spreadsheet or NLP pipeline.

Can I scrape Reddit without an API or login?

Yes — no login, OAuth, or developer app is required. The actor reads Reddit's public subreddit search feed over a residential proxy, so it works without a Reddit account or API credentials.

📝 Changelog

2026-06-07

  • 📚 Docs: added coverage for using the actor as a Reddit API alternative, exporting Reddit posts/comments to CSV/JSON, and scraping Reddit without an API key or login.

2026-06-06

  • 📚 Docs & schema accuracy pass: README now reflects the RSS-only reality (subreddit required, ~25/search cap, no score/comments). Removed always-null score/numComments columns from the dataset table; added the populated text column.

2026-06-05

  • 🛡️ Reliability fix: results no longer dropped by strict output validation — runs complete cleanly.

2026-06-04

  • Verified live & refreshed build — reliability/maintenance pass.