🟠 Reddit Subreddit Scraper avatar

🟠 Reddit Subreddit Scraper

Pricing

from $3.50 / 1,000 reddit post or comment records

Go to Apify Store
🟠 Reddit Subreddit Scraper

🟠 Reddit Subreddit Scraper

Scrape posts and comments from any subreddit. Get title, body, author, score, upvote ratio, comments, awards, flair, ISO timestamps, AI-ready markdown. Watchlist mode emits only new records since last run. Export, run via API, schedule, or integrate with other tools.

Pricing

from $3.50 / 1,000 reddit post or comment records

Rating

0.0

(0)

Developer

Skootle

Skootle

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

18 days ago

Last modified

Share

Reddit Subreddit Scraper hero

TL;DR

Monitor any subreddit for new posts and comments. Returns clean structured JSON: title, body, score, upvote ratio, comment count, awards, author karma + account age, ISO timestamps. Watchlist mode emits only NEW records since the previous run, so daily schedules become a clean diff feed. Built on Reddit's public listing endpoints. No Reddit OAuth required. Premium tier $0.015/record at GOLD.


Try it on a small dataset, then let us know what you think in a review.


What does Reddit Subreddit Monitor do?

Reddit Subreddit Monitor extracts posts and comments from any subreddit you specify. Pick the subreddits and a sort order (new, hot, top, best, rising, controversial); get back clean structured JSON for every post: title, body, author, score, upvote ratio, comment count, awards, post type, flair, ISO timestamps, and a short markdown summary ready for an LLM.

Comments are optional. Turn on fetchComments and you get one record per comment with depth, parent linkage, body, score, and author info.

Watchlist mode (watchlistMode: true) makes this scraper schedulable. State persists across runs in the actor's key-value store, so a daily cron only emits posts and comments NEW since the last run.

Why scrape Reddit?

Reddit moves fast and almost every niche, brand, sentiment, and trend has a dedicated subreddit. Brand-listening teams, content marketers, recruiters, investors, and journalists all watch specific subs every day, but the Reddit UI doesn't give you a clean export. The manual workflow is refreshing 10 subreddits a few times a day and copy-pasting interesting posts into a spreadsheet.

We deliver new posts plus optional comments as structured JSON. Watchlist mode runs daily and only emits what changed since yesterday, so a content team or AI agent gets a clean diff instead of re-ingesting the same posts. No Reddit account needed.

Who needs this?

  • AI / LLM teams building training corpora, RAG sources, or fine-tuning datasets from high-quality discourse
  • Brand monitoring teams watching for new mentions of their products on subreddits where their customers actually live
  • Market researchers tracking sentiment on niche subreddits over time (watchlistMode: true + a daily schedule)
  • Recruiters and DevRel scanning industry subreddits for hiring or product-launch signals
  • Investors and traders mining r/wallstreetbets, r/stocks, r/CryptoCurrency, r/options for sentiment shifts
  • AI agent builders (Claude, ChatGPT, Cursor, n8n, Make.com) feeding agents a steady stream of normalized Reddit data

How to use Reddit Subreddit Monitor

  1. Open the Input tab on the actor page
  2. Add one or more subreddit names to the subreddits field (without the r/ prefix; the actor strips it if you include it)
  3. Pick a sort order (new is the default; use top with timeFilter: 'week' for the week's best)
  4. Set postsPerSubreddit (1-100; default 10) and maxItems (default 50, conservative for the 5-minute auto-test)
  5. Optionally enable fetchComments and set commentsPerPost if you want comment threads
  6. Optionally enable watchlistMode for daily scheduled runs that only return new records
  7. Click Start or call the actor via the Apify REST API or the Apify CLI

You can run this scraper on demand from the Console, schedule it (Apify lets you schedule any actor with cron), call it via API, integrate with Make, Zapier, n8n, Slack, Google Sheets, or pipe it directly into a Claude or ChatGPT custom agent.

How much will scraping Reddit cost?

This actor is priced per event. Two events:

  • Actor Start: $0.01 once per run
  • Reddit record (post or comment): tiered, charged per record written
Apify plan$/1000 records
FREE$25.00
BRONZE$21.25
SILVER$17.50
GOLD$15.00
PLATINUM$15.00
DIAMOND$13.50

A typical daily watchlist run on 5 subreddits with postsPerSubreddit: 25 returns ~125 records (or fewer in watchlist mode after the first run): roughly $1.88 per run on the GOLD plan, less on PLATINUM/DIAMOND.

You only pay for records actually saved. If a subreddit's listing fails or returns nothing, you don't pay for empty.

Yes, with caveats: this actor only accesses publicly accessible subreddits and public posts (the same data anyone can read in a browser without logging in). It does not access NSFW or quarantined content gated behind logged-in walls; it does not authenticate as a Reddit account; it does not return private messages, modmail, or moderator-only data; it does not bypass any technical access control.

You are responsible for how you use the scraped data. Reddit's content is licensed under Reddit's User Agreement and individual users retain rights to their own content. For commercial redistribution of Reddit content (reselling raw posts), consult Reddit's API terms and your legal counsel. For personal research, AI training, brand monitoring, internal analytics, and similar uses, the scraped data is generally treated the same as any other public web content.

Examples

Example 1: Monitor r/programming for new posts

{
"subreddits": ["programming"],
"sort": "new",
"postsPerSubreddit": 25,
"maxItems": 25
}

Example 2: Top 50 from r/MachineLearning this week

{
"subreddits": ["MachineLearning"],
"sort": "top",
"timeFilter": "week",
"postsPerSubreddit": 50,
"maxItems": 50
}

Example 3: Daily watchlist across 3 finance subreddits

{
"subreddits": ["wallstreetbets", "stocks", "options"],
"sort": "new",
"watchlistMode": true,
"postsPerSubreddit": 30,
"maxItems": 90
}

Schedule this with the Apify scheduler at 0 14 * * * (daily at 14:00 UTC). The first run captures everything; subsequent runs only emit posts new since the previous run.

Example 4: Brand mention monitoring across niche subreddits

{
"subreddits": ["SaaS", "Entrepreneur", "smallbusiness", "marketing"],
"sort": "new",
"watchlistMode": true,
"maxItems": 100
}

Run daily, pipe the output to a Claude agent that scores each post for "mentions our brand" and routes hits to Slack.

Example 5: Pull comments for top 5 posts on r/AskHistorians

{
"subreddits": ["AskHistorians"],
"sort": "top",
"timeFilter": "week",
"postsPerSubreddit": 5,
"fetchComments": true,
"commentsPerPost": 100,
"maxItems": 505
}

Example 6: Crisis-signal early-warning on r/all

{
"subreddits": ["all"],
"sort": "rising",
"watchlistMode": true,
"postsPerSubreddit": 50
}

Run every 30 minutes; alert when a new post matches your brand keyword. The rising sort picks posts gaining momentum fast.

Example 7: Build an AI training corpus from r/explainlikeimfive

{
"subreddits": ["explainlikeimfive"],
"sort": "top",
"timeFilter": "month",
"postsPerSubreddit": 100,
"fetchComments": true,
"commentsPerPost": 50,
"maxItems": 5100
}

Run weekly; accumulate 20-50K labeled question-and-explanation pairs per month for fine-tuning.

Example 8: Hiring-signal scanner on r/cscareerquestions

{
"subreddits": ["cscareerquestions", "ExperiencedDevs", "ITCareerQuestions"],
"sort": "new",
"watchlistMode": true,
"postsPerSubreddit": 25
}

Pipe into a recruiter inbox: filter for posts mentioning "looking for" or "where should I apply".

Input parameters

FieldTypeDefaultDescription
subredditsstring[]["programming"]Subreddit names without r/ prefix. One run handles many.
sortenumnewnew, hot, top, best, rising, controversial
timeFilterenumdayWindow for top/controversial: hour, day, week, month, year, all
postsPerSubredditint101-100
fetchCommentsboolfalseWalks the comment tree per post
commentsPerPostint0Hard cap on comments per post (max 500)
watchlistModeboolfalseIdempotent diff against KV-stored seen IDs
maxItemsint50Hard cap on total records (posts + comments)
useApifyProxybooltrueApify residential proxy. Recommended.
apifyProxyGroupsstring[]["RESIDENTIAL"]

Reddit output format

The dataset has two record types. Filter by recordType.

reddit_post

FieldTypeDescription
outputSchemaVersionstringVersioned schema literal ('2026-05-08')
recordTypeliteral'reddit_post'
recordIdstringreddit:post:<postId> (idempotent, cross-run dedupe-friendly)
postId, fullnamestringReddit's t3_<id> IDs
url, permalinkPathstringDirect URL + canonical path
titlestringPost title
selftext, selftextHtmlstringBody text (markdown + HTML)
subreddit, subredditId, subredditSubscribersstring/intSubreddit context
authorobject{ username, fullname, isPremium, flairText }
createdAt, scrapedAtISO 8601Standard timestamps
score, ups, downs, upvoteRationumberEngagement (upvoteRatio is 0-1 float)
numComments, totalAwards, gildedintEngagement counts
mediaobject{ type (image/video/gallery/link/self), url, thumbnail, domain }
isVideo, isSelf, isOriginalContentboolPost-type flags
isPinned, isStickied, isLocked, isArchived, isOver18, isSpoilerboolMod/state flags
linkFlairText, linkFlairBackgroundColorstringFlair
fieldCompletenessScoreint 0-100Self-filtering signal
agentMarkdownstring300-500 char LLM-ready summary

reddit_comment

FieldTypeDescription
outputSchemaVersion, recordType, recordIdstringDiscriminated identity
commentId, fullname, url, permalinkPathstringReddit IDs
postId, parentId, depthstring/intTree linkage
body, bodyHtmlstringMarkdown + HTML body
subreddit, subredditIdstringSubreddit
authorobject{ username, fullname, isPremium, flairText }
createdAt, scrapedAtISO 8601Timestamps
score, ups, downs, totalAwards, gildedintEngagement
isSubmitter, isStickied, isControversial, editedboolStatus flags
fieldCompletenessScore, agentMarkdownint / stringQuality signals

Reddit scraper output example (post)

{
"outputSchemaVersion": "2026-05-08",
"recordType": "reddit_post",
"recordId": "reddit:post:1t7qsfi",
"postId": "1t7qsfi",
"url": "https://www.reddit.com/r/programming/comments/1t7qsfi/...",
"title": "How database work pulls you deep into systems engineering",
"subreddit": "programming",
"subredditSubscribers": 6874584,
"author": { "username": "clairegiordano", "isPremium": false, "flairText": null },
"createdAt": "2026-05-08T19:42:53.000Z",
"score": 234,
"upvoteRatio": 0.94,
"numComments": 89,
"media": { "type": "link", "url": "https://talkingpostgres.com/...", "domain": "talkingpostgres.com" },
"isSelf": false,
"linkFlairText": null,
"fieldCompletenessScore": 92,
"agentMarkdown": "**🟠 r/programming · How database work pulls you deep into systems engineering**\n- ⬆ 234 · 94% · 💬 89\n- 👤 u/clairegiordano\n- 👥 6.9M subscribers\n- 📅 posted 2026-05-08\n- 🔗 https://talkingpostgres.com/...",
"scrapedAt": "2026-05-08T22:30:00.000Z"
}

Reddit scraper output example (comment)

{
"outputSchemaVersion": "2026-05-08",
"recordType": "reddit_comment",
"recordId": "reddit:comment:k7q2xyz",
"commentId": "k7q2xyz",
"postId": "1t7qsfi",
"parentId": "1t7qsfi",
"depth": 0,
"body": "Great talk. The MemSQL → HorizonDB story is wild.",
"subreddit": "programming",
"author": { "username": "dbnerd", "isPremium": false, "flairText": null },
"createdAt": "2026-05-08T20:12:15.000Z",
"score": 17,
"fieldCompletenessScore": 100,
"agentMarkdown": "**💬 HN comment by u/dbnerd** (depth 0)\n> Great talk. The MemSQL → HorizonDB story is wild.\n- 🔗 https://www.reddit.com/r/programming/comments/1t7qsfi/_/k7q2xyz/"
}

During the Actor run

Subreddits are fetched serially under Reddit's published 60-requests-per-minute unauthenticated throttle, so reliability stays above 99%. No Reddit OAuth, no account, no API quota to manage.

The actor writes three things to its key-value store:

  1. OUTPUT, compact run summary: subreddits queried, posts saved, comments saved, watchlist deltas, error counts, finished timestamp
  2. AGENT_BRIEFING, markdown digest with the top 5 posts and top 5 comments by score, ready to paste into an LLM context
  3. WATCHLIST_STATE, (only when watchlistMode: true) the seen post and comment IDs, capped at 5,000 each, used to compute the next run's diff

FAQ

How does Reddit Subreddit Monitor work?

Input: a list of subreddits, a sort order, and how many posts per subreddit. Optionally turn on fetchComments to include comment threads. Output: one structured record per post (and optionally per comment) in your Apify dataset, exportable as JSON, CSV, or Excel. No Reddit account, no OAuth, no API quota. Typical runtime is seconds to a minute or two depending on how many subreddits and comments you pull.

Can I scrape multiple subreddits in one run?

Yes. Pass an array to subreddits. The actor iterates each one in order, with rate limiting between requests.

Can I monitor a subreddit for new posts only?

Yes. Set watchlistMode: true. The first run emits everything; subsequent runs emit only posts new since the previous run. Schedule with the Apify cron scheduler.

Can I get comments along with posts?

Yes. Set fetchComments: true and commentsPerPost to the cap you want (max 500 per post tree). One additional request per post, walked depth-first.

Can I use this with the Apify API?

Yes. Every Apify actor exposes a REST API: POST https://api.apify.com/v2/acts/skootle~reddit-subreddit-monitor/runs with your input as JSON body and your Apify API token as the Authorization: Bearer header. Full API docs are linked from the actor page.

Can I integrate this with Make / Zapier / n8n / Slack?

Yes. Apify provides native integrations for all of these. From the actor page, click Integrations and pick your destination.

Can I use this scraper as a Reddit API replacement?

For most read-only use cases on public subreddits, yes. Caveats: this actor does not return private content, modmail, or anything requiring an authenticated Reddit session, and it doesn't write (no posting, no voting, no commenting).

Can I increase the speed?

The actor rate-limits itself to 1.1 seconds between Reddit requests to stay under Reddit's published throttle. Reddit will block aggressive callers; this is the throttle that keeps reliability above 99%.

Can I get Reddit data in Python / JavaScript / TypeScript?

Yes. The output is plain JSON in an Apify dataset. Pull it via the Apify Python or JavaScript clients, the REST API, or by exporting from the Console as CSV/JSONL/Excel.

Can the actor return one comment per row?

Yes, that's the default with fetchComments: true. Each comment is a separate record. Filter on recordType: 'reddit_comment' to isolate the comment rows.

What if Reddit changes its layout?

This actor uses Reddit's stable public JSON endpoints (the same ones their own apps use), not HTML scraping. Reddit changes the endpoints rarely; when they do, the actor adapts within 24-48 hours. Issues are tracked on the actor page and resolved fast.

Why does this actor cost more than a free Reddit scraper?

Free actors trade reliability and feature depth for cost. This actor ships a versioned schema, idempotent record IDs (so cross-run dedupe just works), agentMarkdown for direct LLM consumption, watchlist diff mode, normalized author + media objects, ISO 8601 timestamps, and continuous maintenance. If you're feeding the data into an automated pipeline or AI agent, those features pay back the per-record cost in saved engineering hours.

Your feedback

Hit a bug or want a feature? Open an issue on the Issues tab rather than the reviews page, and we'll fix it fast (typically within 48 hours).

Why choose Reddit Subreddit Monitor

  • Replaces refreshing 10 subreddits manually: one run pulls posts, comments, scores, awards, and flair across every subreddit on your list
  • Watchlist mode emits only what's new since last run: safe to schedule daily without paying for duplicates
  • No Reddit OAuth required: no account or API quota to manage
  • Posts and comments in one dataset: filter on recordType to split them downstream
  • Author profile join: posts carry username, verified-premium flag, and flair so trust filtering is one field away
  • AI agents can self-filter sparse rows and paste the per-record markdown summary straight into Claude or ChatGPT
  • Hand-tuned for this source. Fixes ship the same week the source changes, typically within 24-48 hours
  • Safe to dedupe across re-runs, stable reddit:post:<id> and reddit:comment:<id> IDs upsert cleanly
  • Schema doesn't break your pipeline, versioned and date-stamped on every record

Other Skootle actors you might want to check

Support and contact

File issues directly on this actor's page (Issues tab), replies within 48 hours. For feature requests, drop them in the same issue tracker tagged enhancement.