Reddit Comments Search Scraper avatar

Reddit Comments Search Scraper

Under maintenance

Pricing

from $4.99 / 1,000 results

Go to Apify Store
Reddit Comments Search Scraper

Reddit Comments Search Scraper

Under maintenance

Pricing

from $4.99 / 1,000 results

Rating

0.0

(0)

Developer

API Empire

API Empire

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

πŸ” Reddit Search Scraper

Scrape Reddit search results and subreddit listings at scale β€” paste any Reddit URL (search, subreddit, or subreddit search) and the actor paginates Reddit's public JSON API, returns clean structured records, and live-saves each post to the dataset.

πŸ’‘ Built for marketers, researchers, AI/LLM data pipelines, and competitive-intelligence teams who need clean, structured Reddit data without scraping headaches.


✨ Why choose this Actor?

  • πŸš€ Fast β€” pure async HTTP, no headless browser overhead.
  • πŸ›‘οΈ Smart proxy ladder β€” starts direct, auto-falls-back to datacenter β†’ residential on 403/429/blocked responses, and stays on residential once it kicks in.
  • πŸ” Resilient β€” per-request retries with jittered backoff, and 3 retries on the residential tier before giving up.
  • πŸ’Ύ Live saving β€” every post is pushed to the dataset as it's scraped, so a mid-run crash never loses work.
  • 🧱 Bulk URLs β€” feed it any number of Reddit URLs in one run.
  • πŸ“Š Pre-built dataset views β€” Overview, Post, Subreddit, Author, Content, and Full Record tabs in the Apify Console.

🎯 Key features

  • 🌐 Bulk URL input (search URLs, subreddit URLs, subreddit search URLs)
  • πŸ”Ž Optional keyword fallback when no URLs are supplied
  • πŸ“Š Sort by Relevance / Hot / Top / New / Most Comments
  • πŸ”ž Safe-search toggle
  • πŸ“¦ Hard cap on total items via maxItems
  • πŸ›‘οΈ Default no-proxy, auto-escalating fallback ladder
  • πŸ“ Detailed real-time logs so you can watch progress live

πŸ“₯ Input

{
"urls": [
{ "url": "https://www.reddit.com/search/?q=ai&sort=new" },
{ "url": "https://www.reddit.com/r/python/" }
],
"query": "artificial intelligence",
"sort": "relevance",
"safeSearch": "off",
"maxItems": 300,
"maxRetries": 3,
"proxyConfiguration": { "useApifyProxy": false }
}
FieldTypeDescription
urlsarrayReddit URLs to scrape (search, subreddit, or subreddit search).
querystringKeyword fallback used only when urls is empty.
sortenumrelevance / hot / top / new / comments.
safeSearchenumoff (include NSFW) or on (hide NSFW).
maxItemsintegerHard cap on total posts across all URLs.
maxRetriesintegerPer-request retries before escalating proxy tier.
proxyConfigurationobjectStandard Apify proxy input. Defaults to no proxy.

πŸ“€ Output

Each dataset record matches the original reference shape exactly, plus a few top-level mirror fields so the table views work without nested-path lookups:

{
"post": {
"title": "The more young people use AI, the more they hate it",
"url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/",
"score": 22036,
"comment_count": 1612
},
"subreddit": { "name": "technology" },
"author": { "name": "spherocytes" },
"contentText": "",
"content_type": "link",
"created_timestamp": "2026-04-30T12:34:21.000000+0000",
"title": "The more young people use AI, the more they hate it",
"subreddit_name": "technology",
"author_name": "spherocytes",
"score": 22036,
"comment_count": 1612,
"url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/"
}

πŸš€ How to use the Actor (via Apify Console)

  1. πŸ” Log in at console.apify.com β†’ Actors.
  2. πŸ”Ž Find Reddit Search Scraper and open it.
  3. πŸ“ Paste one or more Reddit URLs (or type a keyword in the query field).
  4. βš™οΈ Pick a sort (Relevance / Hot / Top / New / Most Comments) and set maxItems.
  5. πŸ›‘οΈ Leave Proxy on default (no proxy) β€” the scraper auto-escalates if Reddit pushes back.
  6. ▢️ Click Start.
  7. πŸ“Š Watch logs in real time; open the Output tab as records stream in.
  8. πŸ“ Export to JSON / CSV / Excel.

πŸ›‘οΈ Proxy strategy

The scraper uses a three-tier ladder:

TierWhen it's used
🌐 DirectDefault β€” Reddit's public JSON API rarely needs a proxy.
🏒 DatacenterAuto-engaged if direct requests get 403 / 429 / blocked.
🏠 ResidentialAuto-engaged if datacenter still fails. Retries up to 3Γ— then sticks for the rest of the run.

You can also start higher up the ladder by selecting a proxy group in the input.


πŸ’Ό Best use cases

  • πŸ€– Building AI / LLM training datasets from Reddit discussion
  • πŸ“Š Brand monitoring & sentiment analysis
  • 🧠 Market research and competitive intelligence
  • πŸ“ Content trend discovery
  • πŸ”¬ Academic research on online communities

❓ Frequently asked questions

Q: Does it scrape comments? A: This actor returns post-level metadata (title, score, comment count, body text). For per-post comment threads, use an additional actor or extend this one to fetch <permalink>.json.

Q: Does it support private subreddits? A: No β€” only publicly accessible subreddits and search results.

Q: What happens if Reddit blocks me? A: The scraper auto-escalates the proxy tier and retries. If even residential fails after 3 retries, the run ends with a clear status message.


πŸ“¨ Support and feedback

For issues, custom features, or feedback: dev.scraperengine@gmail.com


  • Only collect data from publicly accessible Reddit pages.
  • Respect Reddit's terms of service and applicable privacy laws (GDPR / CCPA).
  • The end user is responsible for downstream use of the data.