Reddit Comments Search Scraper avatar

Reddit Comments Search Scraper

Under maintenance

Pricing

from $4.99 / 1,000 results

Go to Apify Store
Reddit Comments Search Scraper

Reddit Comments Search Scraper

Under maintenance

Pricing

from $4.99 / 1,000 results

Rating

0.0

(0)

Developer

Scraper Engine

Scraper Engine

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Share

๐Ÿ” Reddit Search Scraper

Scrape Reddit search results and subreddit listings at scale โ€” paste any Reddit URL (search, subreddit, or subreddit search) and the actor paginates Reddit's public JSON API, returns clean structured records, and live-saves each post to the dataset.

๐Ÿ’ก Built for marketers, researchers, AI/LLM data pipelines, and competitive-intelligence teams who need clean, structured Reddit data without scraping headaches.


โœจ Why choose this Actor?

  • ๐Ÿš€ Fast โ€” pure async HTTP, no headless browser overhead.
  • ๐Ÿ›ก๏ธ Smart proxy ladder โ€” starts direct, auto-falls-back to datacenter โ†’ residential on 403/429/blocked responses, and stays on residential once it kicks in.
  • ๐Ÿ” Resilient โ€” per-request retries with jittered backoff, and 3 retries on the residential tier before giving up.
  • ๐Ÿ’พ Live saving โ€” every post is pushed to the dataset as it's scraped, so a mid-run crash never loses work.
  • ๐Ÿงฑ Bulk URLs โ€” feed it any number of Reddit URLs in one run.
  • ๐Ÿ“Š Pre-built dataset views โ€” Overview, Post, Subreddit, Author, Content, and Full Record tabs in the Apify Console.

๐ŸŽฏ Key features

  • ๐ŸŒ Bulk URL input (search URLs, subreddit URLs, subreddit search URLs)
  • ๐Ÿ”Ž Optional keyword fallback when no URLs are supplied
  • ๐Ÿ“Š Sort by Relevance / Hot / Top / New / Most Comments
  • ๐Ÿ”ž Safe-search toggle
  • ๐Ÿ“ฆ Hard cap on total items via maxItems
  • ๐Ÿ›ก๏ธ Default no-proxy, auto-escalating fallback ladder
  • ๐Ÿ“ Detailed real-time logs so you can watch progress live

๐Ÿ“ฅ Input

{
"urls": [
{ "url": "https://www.reddit.com/search/?q=ai&sort=new" },
{ "url": "https://www.reddit.com/r/python/" }
],
"query": "artificial intelligence",
"sort": "relevance",
"safeSearch": "off",
"maxItems": 300,
"maxRetries": 3,
"proxyConfiguration": { "useApifyProxy": false }
}
FieldTypeDescription
urlsarrayReddit URLs to scrape (search, subreddit, or subreddit search).
querystringKeyword fallback used only when urls is empty.
sortenumrelevance / hot / top / new / comments.
safeSearchenumoff (include NSFW) or on (hide NSFW).
maxItemsintegerHard cap on total posts across all URLs.
maxRetriesintegerPer-request retries before escalating proxy tier.
proxyConfigurationobjectStandard Apify proxy input. Defaults to no proxy.

๐Ÿ“ค Output

Each dataset record matches the original reference shape exactly, plus a few top-level mirror fields so the table views work without nested-path lookups:

{
"post": {
"title": "The more young people use AI, the more they hate it",
"url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/",
"score": 22036,
"comment_count": 1612
},
"subreddit": { "name": "technology" },
"author": { "name": "spherocytes" },
"contentText": "",
"content_type": "link",
"created_timestamp": "2026-04-30T12:34:21.000000+0000",
"title": "The more young people use AI, the more they hate it",
"subreddit_name": "technology",
"author_name": "spherocytes",
"score": 22036,
"comment_count": 1612,
"url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/"
}

๐Ÿš€ How to use the Actor (via Apify Console)

  1. ๐Ÿ” Log in at console.apify.com โ†’ Actors.
  2. ๐Ÿ”Ž Find Reddit Search Scraper and open it.
  3. ๐Ÿ“ Paste one or more Reddit URLs (or type a keyword in the query field).
  4. โš™๏ธ Pick a sort (Relevance / Hot / Top / New / Most Comments) and set maxItems.
  5. ๐Ÿ›ก๏ธ Leave Proxy on default (no proxy) โ€” the scraper auto-escalates if Reddit pushes back.
  6. โ–ถ๏ธ Click Start.
  7. ๐Ÿ“Š Watch logs in real time; open the Output tab as records stream in.
  8. ๐Ÿ“ Export to JSON / CSV / Excel.

๐Ÿ›ก๏ธ Proxy strategy

The scraper uses a three-tier ladder:

TierWhen it's used
๐ŸŒ DirectDefault โ€” Reddit's public JSON API rarely needs a proxy.
๐Ÿข DatacenterAuto-engaged if direct requests get 403 / 429 / blocked.
๐Ÿ  ResidentialAuto-engaged if datacenter still fails. Retries up to 3ร— then sticks for the rest of the run.

You can also start higher up the ladder by selecting a proxy group in the input.


๐Ÿ’ผ Best use cases

  • ๐Ÿค– Building AI / LLM training datasets from Reddit discussion
  • ๐Ÿ“Š Brand monitoring & sentiment analysis
  • ๐Ÿง  Market research and competitive intelligence
  • ๐Ÿ“ Content trend discovery
  • ๐Ÿ”ฌ Academic research on online communities

โ“ Frequently asked questions

Q: Does it scrape comments? A: This actor returns post-level metadata (title, score, comment count, body text). For per-post comment threads, use an additional actor or extend this one to fetch <permalink>.json.

Q: Does it support private subreddits? A: No โ€” only publicly accessible subreddits and search results.

Q: What happens if Reddit blocks me? A: The scraper auto-escalates the proxy tier and retries. If even residential fails after 3 retries, the run ends with a clear status message.


๐Ÿ“จ Support and feedback

For issues, custom features, or feedback: dev.scraperengine@gmail.com


  • Only collect data from publicly accessible Reddit pages.
  • Respect Reddit's terms of service and applicable privacy laws (GDPR / CCPA).
  • The end user is responsible for downstream use of the data.