Reddit Subreddit Scraper
Pricing
from $3.50 / 1,000 search results
Reddit Subreddit Scraper
Scrape posts from any subreddit - title, author, score, comments, flair, text and timestamps. Run it on a schedule for social listening, brand monitoring, lead generation or market research.
Pricing
from $3.50 / 1,000 search results
Rating
0.0
(0)
Developer
Logiover
Maintained by CommunityActor stats
0
Bookmarked
15
Total users
3
Monthly active users
19 hours ago
Last modified
Categories
Share
π½ Reddit Subreddit Scraper β Scrape Reddit Posts, Scores & Comments

Bulk-scrape posts from any subreddit on Reddit β title, author, score, upvote ratio, comment count, flair, self text, video flag, NSFW flag and timestamps β across multiple subreddits in one run. Sort by new, hot, top or rising; time-window control for top (hour/day/week/month/year/all). Built on Reddit's public JSON endpoints with residential proxy support to sidestep Reddit's datacenter-IP blocks. No login, no Reddit API key, no client secret required.
Built for brand managers tracking mentions, growth marketers mining lead-generation subreddits, market researchers studying community sentiment, alpha hunters scanning crypto/finance subs, content teams sourcing trending posts, and ML engineers building social-listening corpora.
π’ No Reddit account. No API key. No client secret. Residential proxy handled automatically.
π Why this scraper
Reddit is the most candid, threaded, opinionated and structured social network on the internet. Every subreddit is its own self-organizing community with stable interests β r/startups for indie founders, r/cryptocurrency for retail traders, r/forhire for freelance demand, r/MachineLearning for ML practitioners, r/Frugal for personal finance signals, r/Entrepreneur, r/SaaS, r/marketing, r/sales, r/personalfinance, r/wallstreetbets, r/AskHistorians, r/legaladvice, r/relationships, r/parenting, r/buildapc, r/cscareerquestions and 10,000+ others. Reading any of them gives you direct, unfiltered insight into a specific audience's pain points, desires, language and trends.
Pulling Reddit at scale yourself runs into:
- Reddit aggressively blocking datacenter IPs (HTTP 403 / 429 / "Too Many Requests")
- The PRAW client requiring a registered app, client ID and client secret
.jsonendpoint pagination requiringafter/beforetokens you must thread manually- Time-window semantics on
topsort (Reddit calls itt=hour|day|week|month|year|all) - Differentiating self-posts, link posts and video posts in the response
- Flattening Reddit's deep, snake_cased response into clean camelCase rows
- Persisting flat output for warehouses, BI tools, NLP pipelines or social-listening dashboards
This Actor handles all of it: residential proxies on by default, clean schema, single-call multi-subreddit support, full pagination, schedule-ready output.
β¨ Key features
| Feature | What it gives you |
|---|---|
π Any subreddit, with or without r/ | Pass ["startups", "r/Entrepreneur", "cscareerquestions"] β all normalized |
| π Multi-subreddit per run | Process many subreddits in one Actor run, one dataset |
| π‘οΈ Residential proxy by default | Avoids Reddit's datacenter-IP blocks out of the box β set and forget |
| π’ Four sort orders | new, hot, top, rising |
β³ Time-window control for top | hour, day, week, month, year, all |
| π Rich post metadata | 15 fields per post: title, author, URL, permalink, self text, score, upvote ratio, comment count, flair, video flag, NSFW flag, timestamps |
| βΎοΈ Unlimited mode | Leave maxPostsPerSub empty to pull as much as Reddit serves for the sort |
| π§± Flat, export-ready schema | No nested JSON β drop straight into a spreadsheet or warehouse |
| π¦ All export formats | JSON, CSV, Excel, HTML, XML, JSONL via the Apify Dataset |
| β±οΈ Schedule-friendly | Idempotent and deterministic β great for hourly / daily community monitoring |
| π No Reddit account, no API key | Bypasses the OAuth dance β anonymous public JSON access |
| π§° Built-in Overview view | Pre-configured Apify Dataset view with the most-useful columns visible |
π― Built for these use cases
1. Brand & competitor monitoring
Watch r/<yourindustry>, plus generic subs like r/SaaS, r/Entrepreneur and r/marketing. Pull mentions of your product, competitor and category daily β sentiment, recency, volume. Surface complaints before they trend, catch product feedback in real time.
2. Community insight & audience research
Before launching to a niche, scrape the dominant subreddit for 30 days. Read the language people actually use, the recurring complaints, the products they recommend, the price points they balk at. Better than any focus group.
3. Lead generation
Subreddits like r/forhire, r/slavelabour, r/hireawriter and industry-specific job/help subs are open lead pipelines for service providers. Schedule new sort hourly to catch fresh demand the moment it posts.
4. Alpha signals & financial sentiment
r/wallstreetbets, r/cryptocurrency, r/options, r/personalfinance, r/SecurityAnalysis β pull top posts daily, parse for ticker mentions, score sentiment, feed your trading bot or your investing newsletter.
5. Content & trend discovery
For media, newsletters and creator economy: weekly scrape of top posts (t=week) across the subs your audience lives in. Best-performing posts β next week's content ideas, podcast topics, video scripts.
6. Market research & PR
For corporate communications and crisis management: monitor your brand's name across all relevant subreddits. Catch issues at the post stage, not after they hit Twitter or the press.
7. NLP / LLM training corpora
Reddit text is informal, opinionated, code-mixed and dense with topic-specific vocabulary. Pull niche subs to build domain-targeted fine-tuning sets (medical, legal, gaming, finance, parenting).
8. Academic & journalism research
Study online community dynamics, the spread of misinformation, language evolution, generational differences β Reddit is one of the richest research substrates available. This Actor gives you a clean, structured pull on a schedule.
π₯ Inputs
| Field | Type | Required | Description |
|---|---|---|---|
subreddits | array of strings | β Yes | Subreddit names to scrape (e.g. startups, forhire, cryptocurrency). With or without the r/ prefix β both forms are accepted. |
sort | enum | No | Post sort order: new, hot, top, rising. Default new. |
timeFilter | enum | No | Time window applied to top sort: hour, day, week, month, year, all. Ignored for non-top sorts. Default day. |
maxPostsPerSub | integer | No | Hard cap per subreddit. Leave empty / 0 for as many posts as Reddit returns for the chosen sort. |
proxyConfiguration | object | No | Proxy settings. Reddit blocks datacenter IPs, so residential is used by default β leave as-is unless you have a reason to change it. |
Example inputs
Daily startup-community monitoring:
{"subreddits": ["startups", "Entrepreneur", "SaaS"],"sort": "new","maxPostsPerSub": 200,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Lead generation in freelance subs (newest first, no cap):
{"subreddits": ["forhire", "slavelabour", "hireawriter"],"sort": "new","maxPostsPerSub": 0,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Top crypto posts of the week (alpha sweep):
{"subreddits": ["cryptocurrency", "CryptoMarkets", "ethfinance"],"sort": "top","timeFilter": "week","maxPostsPerSub": 500,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Trending content ideas for a newsletter:
{"subreddits": ["popular", "AskReddit", "todayilearned"],"sort": "top","timeFilter": "day","maxPostsPerSub": 100,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
π€ Output
One Apify dataset row per post. Sample:
{"postId": "1abc234","subreddit": "startups","title": "How we got our first 100 paying customers without ads","author": "founder_jane","url": "https://www.reddit.com/r/startups/comments/1abc234/how_we_got_our_first_100_paying_customers/","permalink": "/r/startups/comments/1abc234/how_we_got_our_first_100_paying_customers/","selftext": "We spent three months on cold outreach to a list of 800 small SaaS founders...","score": 842,"upvoteRatio": 0.97,"numComments": 134,"flair": "Share Your Startup","isVideo": false,"over18": false,"createdAt": "2026-05-15T18:22:00.000Z","scrapedAt": "2026-05-16T08:00:00.000Z"}
Full field reference
| Field | Type | Meaning |
|---|---|---|
postId | string | Reddit post ID (the part after /comments/ in the URL) |
subreddit | string | The subreddit the post belongs to |
title | string | Post title |
author | string | Reddit username of the post's author |
url | string | URL the post points to (external link, image, video, or Reddit URL for self-posts) |
permalink | string | Permanent Reddit path to the post (prefix with https://www.reddit.com for the full URL) |
selftext | string | Body text of self/text posts (empty for link posts) |
score | integer | Net upvotes (upvotes minus downvotes) |
upvoteRatio | number | Ratio of upvotes to total votes (0β1) |
numComments | integer | Total comment count on the post |
flair | string | Post flair label assigned by the author/mods |
isVideo | boolean | Whether the post hosts a video |
over18 | boolean | Whether the post is marked NSFW |
createdAt | string | ISO 8601 timestamp the post was created |
scrapedAt | string | ISO 8601 timestamp of the scrape |
βοΈ How it works
- Parses input β normalizes subreddit names (strips
r/if present), sort, time filter, cap. - Picks endpoint β uses Reddit's public
.jsonlistings:/r/<sub>/new.json,/hot.json,/top.json?t={timeFilter},/rising.jsonwithlimit=100. - Routes via residential proxy β uses Apify's
RESIDENTIALproxy group by default to dodge Reddit's datacenter-IP blocks. - Walks pagination with the
aftertoken until the cap is hit or Reddit returns no more pages. - Backs off on HTTP 429 / 5xx with exponential retry.
- Flattens the deep Reddit response into the clean 15-field camelCase schema above.
- Streams each post as one flat row directly into the Apify Dataset.
The Actor uses ONLY Reddit's public JSON listing endpoints β no PRAW, no OAuth, no client secret, no HTML scraping, no headless browser.
β‘ Performance
| Workload | Approx time |
|---|---|
| 1 subreddit, 100 posts | ~5 seconds |
| 5 subreddits, 200 posts each | ~30 seconds |
| 10 subreddits, 1,000 posts each | ~3 minutes |
| Daily monitoring (20 subs Γ 200 new posts) | ~2 minutes |
Weekly top sweep (10 subs Γ 500 posts) | ~5 minutes |
Reddit's public listings return up to 100 posts per page and typically allow a few hundred posts of pagination per sort before the listing exhausts.
π° Cost model
Pay-Per-Result for post rows + proxy traffic for residential bandwidth. You pay only for the post rows actually saved.
Typical cost shape:
- Hourly lead-gen monitor (5 subs Γ 50 new posts) β small
- Daily brand-mention sweep (20 subs Γ 200 posts) β small-to-moderate
- Weekly community insight pull (10 subs Γ 1,000 posts) β moderate
- One-off market research (50 subs Γ full pagination) β bounded and predictable
π Schedule for continuous monitoring
Common scheduling patterns:
- Every 15 minutes for high-velocity lead-gen subs (
r/forhire,r/slavelabour) - Hourly for brand-mention alerts in your category subs
- Daily for community insight and content curation
- Weekly for top-of-week trend reports and newsletter generation
Pipe each new row into Slack, Discord, Notion, Airtable, Sheets, your CRM, Postgres, BigQuery, your sentiment-analysis API or your own HTTP endpoint via Apify Webhooks.
π οΈ FAQ
Do I need a Reddit account, API key or client secret?
No. The Actor uses Reddit's public .json listing endpoints β no OAuth, no app registration, no PRAW.
Is scraping Reddit legal? The Actor reads publicly visible subreddit content. You are responsible for using the data in compliance with Reddit's terms of service, content policy and applicable law (especially for NSFW content, personal data and minors).
Why does it use a residential proxy? Reddit aggressively blocks datacenter IPs with 403/429 errors. Residential proxies use real consumer IPs and are reliable. The Actor turns this on by default β leave as-is unless you have a specific reason to change it.
How many posts can I get from a subreddit?
Reddit's listings cap depth per sort to a few hundred up to roughly 1,000 posts. Set maxPostsPerSub=0 to pull as many as Reddit serves. To go deeper than the listing depth, use a historical/archive scraper (see Related scrapers).
Can I scrape multiple subreddits in one run?
Yes. Pass any number of subreddit names in subreddits and all are scraped into the same dataset.
What does timeFilter do?
It applies to sort=top only. hour = top of the last hour; day = top of the last day; week, month, year, all likewise. Ignored for new, hot and rising.
Can I get comments too?
This Actor returns posts (with numComments count). For full comment threads, use a dedicated comment scraper that takes a post URL and walks the tree.
Can I scrape NSFW subreddits?
Yes, technically. NSFW posts are returned with over18=true. Use the data responsibly and within Reddit's terms.
Is the data fresh?
Yes β Reddit's .json listings serve real-time data within seconds.
What's the difference between score and upvoteRatio?
score = net votes (upvotes minus downvotes). upvoteRatio = upvotes Γ· total votes β Reddit's measure of how polarizing a post is. A 1000-score post with 0.6 ratio is much more divisive than one with 0.97.
Can I integrate with Slack / Sheets / Notion / n8n / Zapier? Yes. Apify provides official integrations and webhooks. Push every new row anywhere your stack can receive HTTP.
What output formats are supported? JSON, CSV, Excel, HTML, XML, JSONL via the Apify Dataset, plus REST API and webhooks for live integrations.
π Related scrapers
Adjacent data sources in the social/dev/content suite:
| Scraper | Purpose |
|---|---|
reddit-subreddit-scraper | You are here. Bulk posts from any subreddit with sort + time window. |
reddit-historical-archive-scraper | Years of subreddit history at scale, beyond the listing depth cap. |
hacker-news-search-scraper | HN stories/comments/Show HN/Ask HN/front page by keyword. |
hacker-news-who-is-hiring-scraper | Monthly HN "Who is hiring?" thread parsed by company/role/stack. |
stack-exchange-questions-scraper | Q&A across 170+ Stack Exchange sites by tag/site/sort. |
github-repository-scraper | Public GitHub repo metadata by search query. |
devto-articles-scraper | Dev.to articles by tag, author, latest feed. |
product-hunt-daily-launches-scraper | Today's Product Hunt launches with votes and makers. |
linkedin-top-content-scraper | Top-performing LinkedIn posts by keyword/author. |
linkedin-ad-library-scraper | LinkedIn Ad Library β competitor ad creative & spend signals. |
letterboxd-film-review-scraper | Film reviews from Letterboxd for culture/sentiment work. |
instagram-media-downloader | Reels/Posts/Stories HD download URLs in bulk. |
π Keyword cloud
Core: reddit scraper, subreddit scraper, reddit data export, reddit json api, reddit posts scraper, reddit hot posts scraper, reddit new posts scraper, reddit top posts scraper, reddit rising posts scraper, reddit api free, reddit no api key, reddit residential proxy scraper, reddit comments count scraper, reddit upvote tracker.
Niche: r forhire scraper, r startups scraper, r entrepreneur scraper, r saas scraper, r marketing scraper, r cryptocurrency scraper, r wallstreetbets scraper, r personalfinance scraper, r cscareerquestions scraper, r machinelearning scraper, r popular scraper, r askreddit scraper, r todayilearned scraper, multi subreddit scraper, reddit flair filter, reddit nsfw filter.
Use case: social listening, brand monitoring, competitor monitoring, community insight, market research, audience research, lead generation, freelance lead gen, alpha signals, financial sentiment, crypto sentiment, ticker mention tracking, content discovery, trending posts curation, newsletter content sourcing, journalism research, academic community research, nlp corpus building, llm fine tuning dataset, sentiment analysis pipeline.
Audience: brand managers, growth marketers, founders, indie hackers, content creators, newsletter writers, freelancers, recruiters, crypto traders, retail investors, financial analysts, market researchers, pr teams, journalists, academics, ml engineers, nlp researchers, social listening teams, support and community managers.
Changelog
- 2026-06-01 β Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.
-
2026-05-25 β Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.
-
2026-05-20 β Maintenance pass: reviewed the input schema and default values for a smooth one-click start, and rebuilt the Actor on the latest base image.
Last reviewed: 2026-06-01.
π Changelog
2026-06-04
- Verified live & refreshed build β reliability/maintenance pass.
