Reddit Subreddit Scraper avatar

Reddit Subreddit Scraper

Pricing

from $3.50 / 1,000 search results

Go to Apify Store
Reddit Subreddit Scraper

Reddit Subreddit Scraper

Scrape posts from any subreddit - title, author, score, comments, flair, text and timestamps. Run it on a schedule for social listening, brand monitoring, lead generation or market research.

Pricing

from $3.50 / 1,000 search results

Rating

0.0

(0)

Developer

Logiover

Logiover

Maintained by Community

Actor stats

0

Bookmarked

15

Total users

3

Monthly active users

19 hours ago

Last modified

Share

πŸ‘½ Reddit Subreddit Scraper β€” Scrape Reddit Posts, Scores & Comments

Reddit Subreddit Scraper

Bulk-scrape posts from any subreddit on Reddit β€” title, author, score, upvote ratio, comment count, flair, self text, video flag, NSFW flag and timestamps β€” across multiple subreddits in one run. Sort by new, hot, top or rising; time-window control for top (hour/day/week/month/year/all). Built on Reddit's public JSON endpoints with residential proxy support to sidestep Reddit's datacenter-IP blocks. No login, no Reddit API key, no client secret required.

Built for brand managers tracking mentions, growth marketers mining lead-generation subreddits, market researchers studying community sentiment, alpha hunters scanning crypto/finance subs, content teams sourcing trending posts, and ML engineers building social-listening corpora.

🟒 No Reddit account. No API key. No client secret. Residential proxy handled automatically.


πŸš€ Why this scraper

Reddit is the most candid, threaded, opinionated and structured social network on the internet. Every subreddit is its own self-organizing community with stable interests β€” r/startups for indie founders, r/cryptocurrency for retail traders, r/forhire for freelance demand, r/MachineLearning for ML practitioners, r/Frugal for personal finance signals, r/Entrepreneur, r/SaaS, r/marketing, r/sales, r/personalfinance, r/wallstreetbets, r/AskHistorians, r/legaladvice, r/relationships, r/parenting, r/buildapc, r/cscareerquestions and 10,000+ others. Reading any of them gives you direct, unfiltered insight into a specific audience's pain points, desires, language and trends.

Pulling Reddit at scale yourself runs into:

  • Reddit aggressively blocking datacenter IPs (HTTP 403 / 429 / "Too Many Requests")
  • The PRAW client requiring a registered app, client ID and client secret
  • .json endpoint pagination requiring after/before tokens you must thread manually
  • Time-window semantics on top sort (Reddit calls it t=hour|day|week|month|year|all)
  • Differentiating self-posts, link posts and video posts in the response
  • Flattening Reddit's deep, snake_cased response into clean camelCase rows
  • Persisting flat output for warehouses, BI tools, NLP pipelines or social-listening dashboards

This Actor handles all of it: residential proxies on by default, clean schema, single-call multi-subreddit support, full pagination, schedule-ready output.


✨ Key features

FeatureWhat it gives you
🌐 Any subreddit, with or without r/Pass ["startups", "r/Entrepreneur", "cscareerquestions"] β€” all normalized
πŸ” Multi-subreddit per runProcess many subreddits in one Actor run, one dataset
πŸ›‘οΈ Residential proxy by defaultAvoids Reddit's datacenter-IP blocks out of the box β€” set and forget
πŸ”’ Four sort ordersnew, hot, top, rising
⏳ Time-window control for tophour, day, week, month, year, all
πŸ“Š Rich post metadata15 fields per post: title, author, URL, permalink, self text, score, upvote ratio, comment count, flair, video flag, NSFW flag, timestamps
♾️ Unlimited modeLeave maxPostsPerSub empty to pull as much as Reddit serves for the sort
🧱 Flat, export-ready schemaNo nested JSON β€” drop straight into a spreadsheet or warehouse
πŸ“¦ All export formatsJSON, CSV, Excel, HTML, XML, JSONL via the Apify Dataset
⏱️ Schedule-friendlyIdempotent and deterministic β€” great for hourly / daily community monitoring
πŸ”“ No Reddit account, no API keyBypasses the OAuth dance β€” anonymous public JSON access
🧰 Built-in Overview viewPre-configured Apify Dataset view with the most-useful columns visible

🎯 Built for these use cases

1. Brand & competitor monitoring

Watch r/<yourindustry>, plus generic subs like r/SaaS, r/Entrepreneur and r/marketing. Pull mentions of your product, competitor and category daily β€” sentiment, recency, volume. Surface complaints before they trend, catch product feedback in real time.

2. Community insight & audience research

Before launching to a niche, scrape the dominant subreddit for 30 days. Read the language people actually use, the recurring complaints, the products they recommend, the price points they balk at. Better than any focus group.

3. Lead generation

Subreddits like r/forhire, r/slavelabour, r/hireawriter and industry-specific job/help subs are open lead pipelines for service providers. Schedule new sort hourly to catch fresh demand the moment it posts.

4. Alpha signals & financial sentiment

r/wallstreetbets, r/cryptocurrency, r/options, r/personalfinance, r/SecurityAnalysis β€” pull top posts daily, parse for ticker mentions, score sentiment, feed your trading bot or your investing newsletter.

5. Content & trend discovery

For media, newsletters and creator economy: weekly scrape of top posts (t=week) across the subs your audience lives in. Best-performing posts β†’ next week's content ideas, podcast topics, video scripts.

6. Market research & PR

For corporate communications and crisis management: monitor your brand's name across all relevant subreddits. Catch issues at the post stage, not after they hit Twitter or the press.

7. NLP / LLM training corpora

Reddit text is informal, opinionated, code-mixed and dense with topic-specific vocabulary. Pull niche subs to build domain-targeted fine-tuning sets (medical, legal, gaming, finance, parenting).

8. Academic & journalism research

Study online community dynamics, the spread of misinformation, language evolution, generational differences β€” Reddit is one of the richest research substrates available. This Actor gives you a clean, structured pull on a schedule.


πŸ“₯ Inputs

FieldTypeRequiredDescription
subredditsarray of stringsβœ… YesSubreddit names to scrape (e.g. startups, forhire, cryptocurrency). With or without the r/ prefix β€” both forms are accepted.
sortenumNoPost sort order: new, hot, top, rising. Default new.
timeFilterenumNoTime window applied to top sort: hour, day, week, month, year, all. Ignored for non-top sorts. Default day.
maxPostsPerSubintegerNoHard cap per subreddit. Leave empty / 0 for as many posts as Reddit returns for the chosen sort.
proxyConfigurationobjectNoProxy settings. Reddit blocks datacenter IPs, so residential is used by default β€” leave as-is unless you have a reason to change it.

Example inputs

Daily startup-community monitoring:

{
"subreddits": ["startups", "Entrepreneur", "SaaS"],
"sort": "new",
"maxPostsPerSub": 200,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Lead generation in freelance subs (newest first, no cap):

{
"subreddits": ["forhire", "slavelabour", "hireawriter"],
"sort": "new",
"maxPostsPerSub": 0,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Top crypto posts of the week (alpha sweep):

{
"subreddits": ["cryptocurrency", "CryptoMarkets", "ethfinance"],
"sort": "top",
"timeFilter": "week",
"maxPostsPerSub": 500,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Trending content ideas for a newsletter:

{
"subreddits": ["popular", "AskReddit", "todayilearned"],
"sort": "top",
"timeFilter": "day",
"maxPostsPerSub": 100,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

πŸ“€ Output

One Apify dataset row per post. Sample:

{
"postId": "1abc234",
"subreddit": "startups",
"title": "How we got our first 100 paying customers without ads",
"author": "founder_jane",
"url": "https://www.reddit.com/r/startups/comments/1abc234/how_we_got_our_first_100_paying_customers/",
"permalink": "/r/startups/comments/1abc234/how_we_got_our_first_100_paying_customers/",
"selftext": "We spent three months on cold outreach to a list of 800 small SaaS founders...",
"score": 842,
"upvoteRatio": 0.97,
"numComments": 134,
"flair": "Share Your Startup",
"isVideo": false,
"over18": false,
"createdAt": "2026-05-15T18:22:00.000Z",
"scrapedAt": "2026-05-16T08:00:00.000Z"
}

Full field reference

FieldTypeMeaning
postIdstringReddit post ID (the part after /comments/ in the URL)
subredditstringThe subreddit the post belongs to
titlestringPost title
authorstringReddit username of the post's author
urlstringURL the post points to (external link, image, video, or Reddit URL for self-posts)
permalinkstringPermanent Reddit path to the post (prefix with https://www.reddit.com for the full URL)
selftextstringBody text of self/text posts (empty for link posts)
scoreintegerNet upvotes (upvotes minus downvotes)
upvoteRationumberRatio of upvotes to total votes (0–1)
numCommentsintegerTotal comment count on the post
flairstringPost flair label assigned by the author/mods
isVideobooleanWhether the post hosts a video
over18booleanWhether the post is marked NSFW
createdAtstringISO 8601 timestamp the post was created
scrapedAtstringISO 8601 timestamp of the scrape

βš™οΈ How it works

  1. Parses input β€” normalizes subreddit names (strips r/ if present), sort, time filter, cap.
  2. Picks endpoint β€” uses Reddit's public .json listings: /r/<sub>/new.json, /hot.json, /top.json?t={timeFilter}, /rising.json with limit=100.
  3. Routes via residential proxy β€” uses Apify's RESIDENTIAL proxy group by default to dodge Reddit's datacenter-IP blocks.
  4. Walks pagination with the after token until the cap is hit or Reddit returns no more pages.
  5. Backs off on HTTP 429 / 5xx with exponential retry.
  6. Flattens the deep Reddit response into the clean 15-field camelCase schema above.
  7. Streams each post as one flat row directly into the Apify Dataset.

The Actor uses ONLY Reddit's public JSON listing endpoints β€” no PRAW, no OAuth, no client secret, no HTML scraping, no headless browser.


⚑ Performance

WorkloadApprox time
1 subreddit, 100 posts~5 seconds
5 subreddits, 200 posts each~30 seconds
10 subreddits, 1,000 posts each~3 minutes
Daily monitoring (20 subs Γ— 200 new posts)~2 minutes
Weekly top sweep (10 subs Γ— 500 posts)~5 minutes

Reddit's public listings return up to 100 posts per page and typically allow a few hundred posts of pagination per sort before the listing exhausts.


πŸ’° Cost model

Pay-Per-Result for post rows + proxy traffic for residential bandwidth. You pay only for the post rows actually saved.

Typical cost shape:

  • Hourly lead-gen monitor (5 subs Γ— 50 new posts) β†’ small
  • Daily brand-mention sweep (20 subs Γ— 200 posts) β†’ small-to-moderate
  • Weekly community insight pull (10 subs Γ— 1,000 posts) β†’ moderate
  • One-off market research (50 subs Γ— full pagination) β†’ bounded and predictable

πŸ”„ Schedule for continuous monitoring

Common scheduling patterns:

  • Every 15 minutes for high-velocity lead-gen subs (r/forhire, r/slavelabour)
  • Hourly for brand-mention alerts in your category subs
  • Daily for community insight and content curation
  • Weekly for top-of-week trend reports and newsletter generation

Pipe each new row into Slack, Discord, Notion, Airtable, Sheets, your CRM, Postgres, BigQuery, your sentiment-analysis API or your own HTTP endpoint via Apify Webhooks.


πŸ› οΈ FAQ

Do I need a Reddit account, API key or client secret? No. The Actor uses Reddit's public .json listing endpoints β€” no OAuth, no app registration, no PRAW.

Is scraping Reddit legal? The Actor reads publicly visible subreddit content. You are responsible for using the data in compliance with Reddit's terms of service, content policy and applicable law (especially for NSFW content, personal data and minors).

Why does it use a residential proxy? Reddit aggressively blocks datacenter IPs with 403/429 errors. Residential proxies use real consumer IPs and are reliable. The Actor turns this on by default β€” leave as-is unless you have a specific reason to change it.

How many posts can I get from a subreddit? Reddit's listings cap depth per sort to a few hundred up to roughly 1,000 posts. Set maxPostsPerSub=0 to pull as many as Reddit serves. To go deeper than the listing depth, use a historical/archive scraper (see Related scrapers).

Can I scrape multiple subreddits in one run? Yes. Pass any number of subreddit names in subreddits and all are scraped into the same dataset.

What does timeFilter do? It applies to sort=top only. hour = top of the last hour; day = top of the last day; week, month, year, all likewise. Ignored for new, hot and rising.

Can I get comments too? This Actor returns posts (with numComments count). For full comment threads, use a dedicated comment scraper that takes a post URL and walks the tree.

Can I scrape NSFW subreddits? Yes, technically. NSFW posts are returned with over18=true. Use the data responsibly and within Reddit's terms.

Is the data fresh? Yes β€” Reddit's .json listings serve real-time data within seconds.

What's the difference between score and upvoteRatio? score = net votes (upvotes minus downvotes). upvoteRatio = upvotes Γ· total votes β€” Reddit's measure of how polarizing a post is. A 1000-score post with 0.6 ratio is much more divisive than one with 0.97.

Can I integrate with Slack / Sheets / Notion / n8n / Zapier? Yes. Apify provides official integrations and webhooks. Push every new row anywhere your stack can receive HTTP.

What output formats are supported? JSON, CSV, Excel, HTML, XML, JSONL via the Apify Dataset, plus REST API and webhooks for live integrations.


Adjacent data sources in the social/dev/content suite:

ScraperPurpose
reddit-subreddit-scraperYou are here. Bulk posts from any subreddit with sort + time window.
reddit-historical-archive-scraperYears of subreddit history at scale, beyond the listing depth cap.
hacker-news-search-scraperHN stories/comments/Show HN/Ask HN/front page by keyword.
hacker-news-who-is-hiring-scraperMonthly HN "Who is hiring?" thread parsed by company/role/stack.
stack-exchange-questions-scraperQ&A across 170+ Stack Exchange sites by tag/site/sort.
github-repository-scraperPublic GitHub repo metadata by search query.
devto-articles-scraperDev.to articles by tag, author, latest feed.
product-hunt-daily-launches-scraperToday's Product Hunt launches with votes and makers.
linkedin-top-content-scraperTop-performing LinkedIn posts by keyword/author.
linkedin-ad-library-scraperLinkedIn Ad Library β€” competitor ad creative & spend signals.
letterboxd-film-review-scraperFilm reviews from Letterboxd for culture/sentiment work.
instagram-media-downloaderReels/Posts/Stories HD download URLs in bulk.

πŸ”‘ Keyword cloud

Core: reddit scraper, subreddit scraper, reddit data export, reddit json api, reddit posts scraper, reddit hot posts scraper, reddit new posts scraper, reddit top posts scraper, reddit rising posts scraper, reddit api free, reddit no api key, reddit residential proxy scraper, reddit comments count scraper, reddit upvote tracker.

Niche: r forhire scraper, r startups scraper, r entrepreneur scraper, r saas scraper, r marketing scraper, r cryptocurrency scraper, r wallstreetbets scraper, r personalfinance scraper, r cscareerquestions scraper, r machinelearning scraper, r popular scraper, r askreddit scraper, r todayilearned scraper, multi subreddit scraper, reddit flair filter, reddit nsfw filter.

Use case: social listening, brand monitoring, competitor monitoring, community insight, market research, audience research, lead generation, freelance lead gen, alpha signals, financial sentiment, crypto sentiment, ticker mention tracking, content discovery, trending posts curation, newsletter content sourcing, journalism research, academic community research, nlp corpus building, llm fine tuning dataset, sentiment analysis pipeline.

Audience: brand managers, growth marketers, founders, indie hackers, content creators, newsletter writers, freelancers, recruiters, crypto traders, retail investors, financial analysts, market researchers, pr teams, journalists, academics, ml engineers, nlp researchers, social listening teams, support and community managers.


Changelog

  • 2026-06-01 β€” Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.
  • 2026-05-25 β€” Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.

  • 2026-05-20 β€” Maintenance pass: reviewed the input schema and default values for a smooth one-click start, and rebuilt the Actor on the latest base image.

Last reviewed: 2026-06-01.

πŸ“ Changelog

2026-06-04

  • Verified live & refreshed build β€” reliability/maintenance pass.