Reddit Comments Search Scraper

Try for free

Pricing

from $4.99 / 1,000 results

Rating

0.0

(0)

Developer

Scraper Engine

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

🔍 Reddit Search Scraper

Scrape Reddit search results and subreddit listings at scale — paste any Reddit URL (search, subreddit, or subreddit search) and the actor pulls clean structured records from public Reddit data archives (no Reddit login or API key required) and live-saves each post to the dataset.

ℹ️ How it works: Reddit shut down unauthenticated access to its public .json endpoints. This actor instead reads from two public Reddit data archives — PullPush (primary, full-text + subreddit search) and Arctic Shift (fallback for subreddit/author queries) — so it keeps working without you registering a Reddit OAuth app.

💡 Built for marketers, researchers, AI/LLM data pipelines, and competitive-intelligence teams who need clean, structured Reddit data without scraping headaches.

✨ Why choose this Actor?

🚀 Fast — pure async HTTP, no headless browser overhead.
🔓 No credentials needed — reads public Reddit archives, so there's no OAuth app, client ID, or rate-limited Reddit key to manage.
🛡️ Smart proxy ladder — starts direct, auto-falls-back to datacenter → residential if an archive rate-limits the request IP, and stays on residential once it kicks in.
🔁 Resilient — per-request retries with jittered backoff, and 3 retries on the residential tier before giving up.
💾 Live saving — every post is pushed to the dataset as it's scraped, so a mid-run crash never loses work.
🧱 Bulk URLs — feed it any number of Reddit URLs in one run.
📊 Pre-built dataset views — Overview, Post, Subreddit, Author, Content, and Full Record tabs in the Apify Console.

🎯 Key features

🌐 Bulk URL input (search URLs, subreddit URLs, subreddit search URLs)
🔎 Optional keyword fallback when no URLs are supplied
📊 Sort by Relevance / Hot / Top / New / Most Comments
🔞 Safe-search toggle
📦 Hard cap on total items via maxItems
🛡️ Default no-proxy, auto-escalating fallback ladder
📝 Detailed real-time logs so you can watch progress live

📥 Input

{
  "urls": [
    { "url": "https://www.reddit.com/search/?q=ai&sort=new" },
    { "url": "https://www.reddit.com/r/python/" }
  ],
  "query": "artificial intelligence",
  "sort": "relevance",
  "safeSearch": "off",
  "maxItems": 300,
  "maxRetries": 3,
  "proxyConfiguration": { "useApifyProxy": false }
}

Field	Type	Description
`urls`	array	Reddit URLs to scrape (search, subreddit, or subreddit search).
`query`	string	Keyword fallback used only when `urls` is empty.
`sort`	enum	`relevance` / `hot` / `top` / `new` / `comments`.
`safeSearch`	enum	`off` (include NSFW) or `on` (hide NSFW).
`maxItems`	integer	Hard cap on total posts across all URLs.
`maxRetries`	integer	Per-request retries before escalating proxy tier.
`proxyConfiguration`	object	Standard Apify proxy input. Defaults to no proxy.

📤 Output

Each dataset record matches the original reference shape exactly, plus a few top-level mirror fields so the table views work without nested-path lookups:

{
  "post": {
    "title": "The more young people use AI, the more they hate it",
    "url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/",
    "score": 22036,
    "comment_count": 1612
  },
  "subreddit": { "name": "technology" },
  "author":    { "name": "spherocytes" },
  "contentText": "",
  "content_type": "link",
  "created_timestamp": "2026-04-30T12:34:21.000000+0000",

  "title": "The more young people use AI, the more they hate it",
  "subreddit_name": "technology",
  "author_name": "spherocytes",
  "score": 22036,
  "comment_count": 1612,
  "url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/"
}

🚀 How to use the Actor (via Apify Console)

🔐 Log in at console.apify.com → Actors.
🔎 Find Reddit Search Scraper and open it.
📝 Paste one or more Reddit URLs (or type a keyword in the query field).
⚙️ Pick a sort (Relevance / Hot / Top / New / Most Comments) and set maxItems.
🛡️ Leave Proxy on default (no proxy) — the scraper auto-escalates if Reddit pushes back.
▶️ Click Start.
📊 Watch logs in real time; open the Output tab as records stream in.
📁 Export to JSON / CSV / Excel.

🛡️ Proxy strategy

The scraper uses a three-tier ladder (the archives can rate-limit a busy IP):

Tier	When it's used
🌐 Direct	Default — the archives usually serve fine without a proxy.
🏢 Datacenter	Auto-engaged if direct requests get 403 / 429 / rate-limited.
🏠 Residential	Auto-engaged if datacenter still fails. Retries then sticks for the rest of the run.

You can also start higher up the ladder by selecting a proxy group in the input.

📊 Sort & data-source notes

Source: PullPush handles global keyword search and subreddit/author search; Arctic Shift serves subreddit- and author-scoped queries as a fast fallback. Both are public Reddit archives.
Sort mapping — Reddit's sort intents map onto the archives' sort fields:
- 🎯 Relevance / ⭐ Top / 🔥 Hot → highest score first
- 🆕 New → newest created first
- 💬 Most Comments → highest comment count first
Coverage: archives index publicly posted content; very recent posts (last few minutes) or removed content may not appear. Pagination walks backward in time, so large maxItems runs are ordered newest-to-oldest within each time window.

💼 Best use cases

🤖 Building AI / LLM training datasets from Reddit discussion
📊 Brand monitoring & sentiment analysis
🧠 Market research and competitive intelligence
📝 Content trend discovery
🔬 Academic research on online communities

❓ Frequently asked questions

Q: Does it scrape comments? A: This actor returns post-level metadata (title, score, comment count, body text). For per-post comment threads, use an additional actor or extend this one to fetch <permalink>.json.

Q: Does it support private subreddits? A: No — only publicly accessible subreddits and search results.

Q: Do I need a Reddit account or API key? A: No. The actor reads public Reddit data archives, so there's nothing to register or authenticate.

Q: What happens if an archive rate-limits me? A: The scraper auto-escalates the proxy tier (direct → datacenter → residential) and retries. If every tier still fails, the run ends with a clear status message.