Reddit Comments Search Scraper avatar

Reddit Comments Search Scraper

Under maintenance

Pricing

from $4.99 / 1,000 results

Go to Apify Store
Reddit Comments Search Scraper

Reddit Comments Search Scraper

Under maintenance

Scrape Reddit comments by URL or keyword. Returns structured records with subreddit, author, score, comment count, content, and timestamps. Auto-falls-back through direct โ†’ datacenter โ†’ residential proxies if Reddit rate-limits the request.

Pricing

from $4.99 / 1,000 results

Rating

0.0

(0)

Developer

Scrapier

Scrapier

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

๐Ÿ” Reddit Search Scraper

Scrape Reddit search results and subreddit listings at scale โ€” paste any Reddit URL (search, subreddit, or subreddit search) and the actor paginates Reddit's public JSON API, returns clean structured records, and live-saves each post to the dataset.

๐Ÿ’ก Built for marketers, researchers, AI/LLM data pipelines, and competitive-intelligence teams who need clean, structured Reddit data without scraping headaches.


โœจ Why choose this Actor?

  • ๐Ÿš€ Fast โ€” pure async HTTP, no headless browser overhead.
  • ๐Ÿ›ก๏ธ Smart proxy ladder โ€” starts direct, auto-falls-back to datacenter โ†’ residential on 403/429/blocked responses, and stays on residential once it kicks in.
  • ๐Ÿ” Resilient โ€” per-request retries with jittered backoff, and 3 retries on the residential tier before giving up.
  • ๐Ÿ’พ Live saving โ€” every post is pushed to the dataset as it's scraped, so a mid-run crash never loses work.
  • ๐Ÿงฑ Bulk URLs โ€” feed it any number of Reddit URLs in one run.
  • ๐Ÿ“Š Pre-built dataset views โ€” Overview, Post, Subreddit, Author, Content, and Full Record tabs in the Apify Console.

๐ŸŽฏ Key features

  • ๐ŸŒ Bulk URL input (search URLs, subreddit URLs, subreddit search URLs)
  • ๐Ÿ”Ž Optional keyword fallback when no URLs are supplied
  • ๐Ÿ“Š Sort by Relevance / Hot / Top / New / Most Comments
  • ๐Ÿ”ž Safe-search toggle
  • ๐Ÿ“ฆ Hard cap on total items via maxItems
  • ๐Ÿ›ก๏ธ Default no-proxy, auto-escalating fallback ladder
  • ๐Ÿ“ Detailed real-time logs so you can watch progress live

๐Ÿ“ฅ Input

{
"urls": [
{ "url": "https://www.reddit.com/search/?q=ai&sort=new" },
{ "url": "https://www.reddit.com/r/python/" }
],
"query": "artificial intelligence",
"sort": "relevance",
"safeSearch": "off",
"maxItems": 300,
"maxRetries": 3,
"proxyConfiguration": { "useApifyProxy": false }
}
FieldTypeDescription
urlsarrayReddit URLs to scrape (search, subreddit, or subreddit search).
querystringKeyword fallback used only when urls is empty.
sortenumrelevance / hot / top / new / comments.
safeSearchenumoff (include NSFW) or on (hide NSFW).
maxItemsintegerHard cap on total posts across all URLs.
maxRetriesintegerPer-request retries before escalating proxy tier.
proxyConfigurationobjectStandard Apify proxy input. Defaults to no proxy.

๐Ÿ“ค Output

Each dataset record matches the original reference shape exactly, plus a few top-level mirror fields so the table views work without nested-path lookups:

{
"post": {
"title": "The more young people use AI, the more they hate it",
"url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/",
"score": 22036,
"comment_count": 1612
},
"subreddit": { "name": "technology" },
"author": { "name": "spherocytes" },
"contentText": "",
"content_type": "link",
"created_timestamp": "2026-04-30T12:34:21.000000+0000",
"title": "The more young people use AI, the more they hate it",
"subreddit_name": "technology",
"author_name": "spherocytes",
"score": 22036,
"comment_count": 1612,
"url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/"
}

๐Ÿš€ How to use the Actor (via Apify Console)

  1. ๐Ÿ” Log in at console.apify.com โ†’ Actors.
  2. ๐Ÿ”Ž Find Reddit Search Scraper and open it.
  3. ๐Ÿ“ Paste one or more Reddit URLs (or type a keyword in the query field).
  4. โš™๏ธ Pick a sort (Relevance / Hot / Top / New / Most Comments) and set maxItems.
  5. ๐Ÿ›ก๏ธ Leave Proxy on default (no proxy) โ€” the scraper auto-escalates if Reddit pushes back.
  6. โ–ถ๏ธ Click Start.
  7. ๐Ÿ“Š Watch logs in real time; open the Output tab as records stream in.
  8. ๐Ÿ“ Export to JSON / CSV / Excel.

๐Ÿ›ก๏ธ Proxy strategy

The scraper uses a three-tier ladder:

TierWhen it's used
๐ŸŒ DirectDefault โ€” Reddit's public JSON API rarely needs a proxy.
๐Ÿข DatacenterAuto-engaged if direct requests get 403 / 429 / blocked.
๐Ÿ  ResidentialAuto-engaged if datacenter still fails. Retries up to 3ร— then sticks for the rest of the run.

You can also start higher up the ladder by selecting a proxy group in the input.


๐Ÿ’ผ Best use cases

  • ๐Ÿค– Building AI / LLM training datasets from Reddit discussion
  • ๐Ÿ“Š Brand monitoring & sentiment analysis
  • ๐Ÿง  Market research and competitive intelligence
  • ๐Ÿ“ Content trend discovery
  • ๐Ÿ”ฌ Academic research on online communities

โ“ Frequently asked questions

Q: Does it scrape comments? A: This actor returns post-level metadata (title, score, comment count, body text). For per-post comment threads, use an additional actor or extend this one to fetch <permalink>.json.

Q: Does it support private subreddits? A: No โ€” only publicly accessible subreddits and search results.

Q: What happens if Reddit blocks me? A: The scraper auto-escalates the proxy tier and retries. If even residential fails after 3 retries, the run ends with a clear status message.


๐Ÿ“จ Support and feedback

For issues, custom features, or feedback: dev.scraperengine@gmail.com


  • Only collect data from publicly accessible Reddit pages.
  • Respect Reddit's terms of service and applicable privacy laws (GDPR / CCPA).
  • The end user is responsible for downstream use of the data.