Reddit Comments Search Scraper
Under maintenancePricing
from $4.99 / 1,000 results
Reddit Comments Search Scraper
Under maintenanceScrape Reddit comments by URL or keyword. Returns structured records with subreddit, author, score, comment count, content, and timestamps. Auto-falls-back through direct โ datacenter โ residential proxies if Reddit rate-limits the request.
Pricing
from $4.99 / 1,000 results
Rating
0.0
(0)
Developer
Scrapier
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 days ago
Last modified
Categories
Share
๐ Reddit Search Scraper
Scrape Reddit search results and subreddit listings at scale โ paste any Reddit URL (search, subreddit, or subreddit search) and the actor paginates Reddit's public JSON API, returns clean structured records, and live-saves each post to the dataset.
๐ก Built for marketers, researchers, AI/LLM data pipelines, and competitive-intelligence teams who need clean, structured Reddit data without scraping headaches.
โจ Why choose this Actor?
- ๐ Fast โ pure async HTTP, no headless browser overhead.
- ๐ก๏ธ Smart proxy ladder โ starts direct, auto-falls-back to datacenter โ residential on 403/429/blocked responses, and stays on residential once it kicks in.
- ๐ Resilient โ per-request retries with jittered backoff, and 3 retries on the residential tier before giving up.
- ๐พ Live saving โ every post is pushed to the dataset as it's scraped, so a mid-run crash never loses work.
- ๐งฑ Bulk URLs โ feed it any number of Reddit URLs in one run.
- ๐ Pre-built dataset views โ Overview, Post, Subreddit, Author, Content, and Full Record tabs in the Apify Console.
๐ฏ Key features
- ๐ Bulk URL input (search URLs, subreddit URLs, subreddit search URLs)
- ๐ Optional keyword fallback when no URLs are supplied
- ๐ Sort by Relevance / Hot / Top / New / Most Comments
- ๐ Safe-search toggle
- ๐ฆ Hard cap on total items via
maxItems - ๐ก๏ธ Default no-proxy, auto-escalating fallback ladder
- ๐ Detailed real-time logs so you can watch progress live
๐ฅ Input
{"urls": [{ "url": "https://www.reddit.com/search/?q=ai&sort=new" },{ "url": "https://www.reddit.com/r/python/" }],"query": "artificial intelligence","sort": "relevance","safeSearch": "off","maxItems": 300,"maxRetries": 3,"proxyConfiguration": { "useApifyProxy": false }}
| Field | Type | Description |
|---|---|---|
urls | array | Reddit URLs to scrape (search, subreddit, or subreddit search). |
query | string | Keyword fallback used only when urls is empty. |
sort | enum | relevance / hot / top / new / comments. |
safeSearch | enum | off (include NSFW) or on (hide NSFW). |
maxItems | integer | Hard cap on total posts across all URLs. |
maxRetries | integer | Per-request retries before escalating proxy tier. |
proxyConfiguration | object | Standard Apify proxy input. Defaults to no proxy. |
๐ค Output
Each dataset record matches the original reference shape exactly, plus a few top-level mirror fields so the table views work without nested-path lookups:
{"post": {"title": "The more young people use AI, the more they hate it","url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/","score": 22036,"comment_count": 1612},"subreddit": { "name": "technology" },"author": { "name": "spherocytes" },"contentText": "","content_type": "link","created_timestamp": "2026-04-30T12:34:21.000000+0000","title": "The more young people use AI, the more they hate it","subreddit_name": "technology","author_name": "spherocytes","score": 22036,"comment_count": 1612,"url": "https://www.reddit.com/r/technology/comments/1szusu6/the_more_young_people_use_ai_the_more_they_hate_it/"}
๐ How to use the Actor (via Apify Console)
- ๐ Log in at console.apify.com โ Actors.
- ๐ Find Reddit Search Scraper and open it.
- ๐ Paste one or more Reddit URLs (or type a keyword in the
queryfield). - โ๏ธ Pick a
sort(Relevance / Hot / Top / New / Most Comments) and setmaxItems. - ๐ก๏ธ Leave Proxy on default (no proxy) โ the scraper auto-escalates if Reddit pushes back.
- โถ๏ธ Click Start.
- ๐ Watch logs in real time; open the Output tab as records stream in.
- ๐ Export to JSON / CSV / Excel.
๐ก๏ธ Proxy strategy
The scraper uses a three-tier ladder:
| Tier | When it's used |
|---|---|
| ๐ Direct | Default โ Reddit's public JSON API rarely needs a proxy. |
| ๐ข Datacenter | Auto-engaged if direct requests get 403 / 429 / blocked. |
| ๐ Residential | Auto-engaged if datacenter still fails. Retries up to 3ร then sticks for the rest of the run. |
You can also start higher up the ladder by selecting a proxy group in the input.
๐ผ Best use cases
- ๐ค Building AI / LLM training datasets from Reddit discussion
- ๐ Brand monitoring & sentiment analysis
- ๐ง Market research and competitive intelligence
- ๐ Content trend discovery
- ๐ฌ Academic research on online communities
โ Frequently asked questions
Q: Does it scrape comments?
A: This actor returns post-level metadata (title, score, comment count, body text). For per-post comment threads, use an additional actor or extend this one to fetch <permalink>.json.
Q: Does it support private subreddits? A: No โ only publicly accessible subreddits and search results.
Q: What happens if Reddit blocks me? A: The scraper auto-escalates the proxy tier and retries. If even residential fails after 3 retries, the run ends with a clear status message.
๐จ Support and feedback
For issues, custom features, or feedback: dev.scraperengine@gmail.com
โ ๏ธ Legal & ethical use
- Only collect data from publicly accessible Reddit pages.
- Respect Reddit's terms of service and applicable privacy laws (GDPR / CCPA).
- The end user is responsible for downstream use of the data.