Reddit Comments Search Scraper
Under maintenancePricing
from $4.99 / 1,000 results
Reddit Comments Search Scraper
Under maintenancePricing
from $4.99 / 1,000 results
Rating
0.0
(0)
Developer
API Empire
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 days ago
Last modified
Categories
Share
π Reddit Search Scraper
Scrape Reddit search results and subreddit listings at scale β paste any Reddit URL (search, subreddit, or subreddit search) and the actor paginates Reddit's public JSON API, returns clean structured records, and live-saves each post to the dataset.
π‘ Built for marketers, researchers, AI/LLM data pipelines, and competitive-intelligence teams who need clean, structured Reddit data without scraping headaches.
β¨ Why choose this Actor?
- π Fast β pure async HTTP, no headless browser overhead.
- π‘οΈ Smart proxy ladder β starts direct, auto-falls-back to datacenter β residential on 403/429/blocked responses, and stays on residential once it kicks in.
- π Resilient β per-request retries with jittered backoff, and 3 retries on the residential tier before giving up.
- πΎ Live saving β every post is pushed to the dataset as it's scraped, so a mid-run crash never loses work.
- π§± Bulk URLs β feed it any number of Reddit URLs in one run.
- π Pre-built dataset views β Overview, Post, Subreddit, Author, Content, and Full Record tabs in the Apify Console.
π― Key features
- π Bulk URL input (search URLs, subreddit URLs, subreddit search URLs)
- π Optional keyword fallback when no URLs are supplied
- π Sort by Relevance / Hot / Top / New / Most Comments
- π Safe-search toggle
- π¦ Hard cap on total items via
maxItems - π‘οΈ Default no-proxy, auto-escalating fallback ladder
- π Detailed real-time logs so you can watch progress live
π₯ Input
{"urls": [{ "url": "https://www.reddit.com/search/?q=ai&sort=new" },{ "url": "https://www.reddit.com/r/python/" }],"query": "artificial intelligence","sort": "relevance","safeSearch": "off","maxItems": 300,"maxRetries": 3,"proxyConfiguration": { "useApifyProxy": false }}
| Field | Type | Description |
|---|---|---|
urls | array | Reddit URLs to scrape (search, subreddit, or subreddit search). |
query | string | Keyword fallback used only when urls is empty. |
sort | enum | relevance / hot / top / new / comments. |
safeSearch | enum | off (include NSFW) or on (hide NSFW). |
maxItems | integer | Hard cap on total posts across all URLs. |
maxRetries | integer | Per-request retries before escalating proxy tier. |
proxyConfiguration | object | Standard Apify proxy input. Defaults to no proxy. |
π€ Output
Each dataset record matches the original reference shape exactly, plus a few top-level mirror fields so the table views work without nested-path lookups:
{"post": {"title": "The more young people use AI, the more they hate it","url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/","score": 22036,"comment_count": 1612},"subreddit": { "name": "technology" },"author": { "name": "spherocytes" },"contentText": "","content_type": "link","created_timestamp": "2026-04-30T12:34:21.000000+0000","title": "The more young people use AI, the more they hate it","subreddit_name": "technology","author_name": "spherocytes","score": 22036,"comment_count": 1612,"url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/"}
π How to use the Actor (via Apify Console)
- π Log in at console.apify.com β Actors.
- π Find Reddit Search Scraper and open it.
- π Paste one or more Reddit URLs (or type a keyword in the
queryfield). - βοΈ Pick a
sort(Relevance / Hot / Top / New / Most Comments) and setmaxItems. - π‘οΈ Leave Proxy on default (no proxy) β the scraper auto-escalates if Reddit pushes back.
- βΆοΈ Click Start.
- π Watch logs in real time; open the Output tab as records stream in.
- π Export to JSON / CSV / Excel.
π‘οΈ Proxy strategy
The scraper uses a three-tier ladder:
| Tier | When it's used |
|---|---|
| π Direct | Default β Reddit's public JSON API rarely needs a proxy. |
| π’ Datacenter | Auto-engaged if direct requests get 403 / 429 / blocked. |
| π Residential | Auto-engaged if datacenter still fails. Retries up to 3Γ then sticks for the rest of the run. |
You can also start higher up the ladder by selecting a proxy group in the input.
πΌ Best use cases
- π€ Building AI / LLM training datasets from Reddit discussion
- π Brand monitoring & sentiment analysis
- π§ Market research and competitive intelligence
- π Content trend discovery
- π¬ Academic research on online communities
β Frequently asked questions
Q: Does it scrape comments?
A: This actor returns post-level metadata (title, score, comment count, body text). For per-post comment threads, use an additional actor or extend this one to fetch <permalink>.json.
Q: Does it support private subreddits? A: No β only publicly accessible subreddits and search results.
Q: What happens if Reddit blocks me? A: The scraper auto-escalates the proxy tier and retries. If even residential fails after 3 retries, the run ends with a clear status message.
π¨ Support and feedback
For issues, custom features, or feedback: dev.scraperengine@gmail.com
β οΈ Legal & ethical use
- Only collect data from publicly accessible Reddit pages.
- Respect Reddit's terms of service and applicable privacy laws (GDPR / CCPA).
- The end user is responsible for downstream use of the data.