🟠 Reddit Subreddit Scraper avatar

🟠 Reddit Subreddit Scraper

Pricing

from $3.50 / 1,000 reddit post or comment records

Go to Apify Store
🟠 Reddit Subreddit Scraper

🟠 Reddit Subreddit Scraper

Scrape posts and comments from any subreddit. Get title, body, author, score, upvote ratio, comments, awards, flair, ISO timestamps, AI-ready markdown. Watchlist mode emits only new records since last run. Export, run via API, schedule, or integrate with other tools.

Pricing

from $3.50 / 1,000 reddit post or comment records

Rating

0.0

(0)

Developer

Skootle

Skootle

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 hours ago

Last modified

Share

Reddit Subreddit Scraper hero

TL;DR

Monitor any subreddit for new posts and comments. Returns clean structured JSON: title, body, score, upvote ratio, comment count, awards, author karma + account age, ISO timestamps. Watchlist mode emits only NEW records since the previous run, so daily schedules become a clean diff feed. Built on Reddit's public listing endpoints. No Reddit OAuth required. Premium tier $0.015/record at GOLD.


Try it on a small dataset, then let us know what you think in a review.


What does Reddit Subreddit Monitor do?

Reddit Subreddit Monitor extracts posts and comments from any subreddit you specify. You give it a list of subreddits and a sort order (new, hot, top, best, rising, controversial); it returns clean, agent-ready JSON for every post: title, body, author, score, upvote ratio, comment count, awards, post type (image, video, gallery, link, self-post), flair, ISO timestamps, and a 300-500 character markdown summary you can drop straight into an LLM context.

Comments are optional. With fetchComments: true, the actor walks the full comment tree per post (configurable depth, max 500 per post) and emits one record per comment with depth, parent linkage, body, score, and author info.

Watchlist mode (watchlistMode: true) makes this scraper schedulable. State persists across runs in the actor's key-value store, so a daily cron only emits posts and comments NEW since the last run.

Why scrape Reddit?

Reddit is the largest open community-content corpus on the internet, and almost every niche, brand, sentiment, and trend has a dedicated subreddit. Resellers monitor product subreddits for buying intent. AI teams build training corpora from r/explainlikeimfive, r/MachineLearning, r/programming. Brand teams watch for crisis signals on r/all and industry subreddits. Recruiters mine r/cscareerquestions for hiring intent. Investors track sentiment on r/wallstreetbets, r/stocks, r/CryptoCurrency.

The data is public. The Reddit API has tight rate limits and demands a registered Reddit account; this actor uses the same public listing endpoints your browser hits, with proper rate limiting and a compliant User-Agent, so you don't have to manage Reddit credentials or worry about API quotas.

Who needs this?

  • AI / LLM teams building training corpora, RAG sources, or fine-tuning datasets from high-quality discourse
  • Brand monitoring teams watching for new mentions of their products on subreddits where their customers actually live
  • Market researchers tracking sentiment on niche subreddits over time (watchlistMode: true + a daily schedule)
  • Recruiters and DevRel scanning industry subreddits for hiring or product-launch signals
  • Investors and traders mining r/wallstreetbets, r/stocks, r/CryptoCurrency, r/options for sentiment shifts
  • AI agent builders (Claude, ChatGPT, Cursor, n8n, Make.com) feeding agents a steady stream of normalized Reddit data

How to use Reddit Subreddit Monitor

  1. Open the Input tab on the actor page
  2. Add one or more subreddit names to the subreddits field (without the r/ prefix; the actor strips it if you include it)
  3. Pick a sort order (new is the default; use top with timeFilter: 'week' for the week's best)
  4. Set postsPerSubreddit (1-100; default 10) and maxItems (default 50, conservative for the 5-minute auto-test)
  5. Optionally enable fetchComments and set commentsPerPost if you want comment threads
  6. Optionally enable watchlistMode for daily scheduled runs that only return new records
  7. Click Start or call the actor via the Apify REST API or the Apify CLI

You can run this scraper on demand from the Console, schedule it (Apify lets you schedule any actor with cron), call it via API, integrate with Make, Zapier, n8n, Slack, Google Sheets, or pipe it directly into a Claude or ChatGPT custom agent.

How much will scraping Reddit cost?

This actor is priced per event. Two events:

  • Actor Start: $0.01 once per run
  • Reddit record (post or comment): tiered, charged per record written
Apify plan$/1000 records
FREE$25.00
BRONZE$21.25
SILVER$17.50
GOLD$15.00
PLATINUM$15.00
DIAMOND$13.50

A typical daily watchlist run on 5 subreddits with postsPerSubreddit: 25 returns ~125 records (or fewer in watchlist mode after the first run): roughly $1.88 per run on the GOLD plan, less on PLATINUM/DIAMOND.

You only pay for records actually saved. If a subreddit's listing fails or returns nothing, you don't pay for empty.

Yes, with caveats: this actor only accesses publicly accessible subreddits and public posts (the same data anyone can read in a browser without logging in). It does not access NSFW or quarantined content gated behind logged-in walls; it does not authenticate as a Reddit account; it does not return private messages, modmail, or moderator-only data; it does not bypass any technical access control.

You are responsible for how you use the scraped data. Reddit's content is licensed under Reddit's User Agreement and individual users retain rights to their own content. For commercial redistribution of Reddit content (reselling raw posts), consult Reddit's API terms and your legal counsel. For personal research, AI training, brand monitoring, internal analytics, and similar uses, the scraped data is generally treated the same as any other public web content.

Examples

Example 1: Monitor r/programming for new posts

{
"subreddits": ["programming"],
"sort": "new",
"postsPerSubreddit": 25,
"maxItems": 25
}

Example 2: Top 50 from r/MachineLearning this week

{
"subreddits": ["MachineLearning"],
"sort": "top",
"timeFilter": "week",
"postsPerSubreddit": 50,
"maxItems": 50
}

Example 3: Daily watchlist across 3 finance subreddits

{
"subreddits": ["wallstreetbets", "stocks", "options"],
"sort": "new",
"watchlistMode": true,
"postsPerSubreddit": 30,
"maxItems": 90
}

Schedule this with the Apify scheduler at 0 14 * * * (daily at 14:00 UTC). The first run captures everything; subsequent runs only emit posts new since the previous run.

Example 4: Brand mention monitoring across niche subreddits

{
"subreddits": ["SaaS", "Entrepreneur", "smallbusiness", "marketing"],
"sort": "new",
"watchlistMode": true,
"maxItems": 100
}

Run daily, pipe the output to a Claude agent that scores each post for "mentions our brand" and routes hits to Slack.

Example 5: Pull comments for top 5 posts on r/AskHistorians

{
"subreddits": ["AskHistorians"],
"sort": "top",
"timeFilter": "week",
"postsPerSubreddit": 5,
"fetchComments": true,
"commentsPerPost": 100,
"maxItems": 505
}

Example 6: Crisis-signal early-warning on r/all

{
"subreddits": ["all"],
"sort": "rising",
"watchlistMode": true,
"postsPerSubreddit": 50
}

Run every 30 minutes; alert when a new post matches your brand keyword. The rising sort picks posts gaining momentum fast.

Example 7: Build an AI training corpus from r/explainlikeimfive

{
"subreddits": ["explainlikeimfive"],
"sort": "top",
"timeFilter": "month",
"postsPerSubreddit": 100,
"fetchComments": true,
"commentsPerPost": 50,
"maxItems": 5100
}

Run weekly; accumulate 20-50K labeled question-and-explanation pairs per month for fine-tuning.

Example 8: Hiring-signal scanner on r/cscareerquestions

{
"subreddits": ["cscareerquestions", "ExperiencedDevs", "ITCareerQuestions"],
"sort": "new",
"watchlistMode": true,
"postsPerSubreddit": 25
}

Pipe into a recruiter inbox: filter for posts mentioning "looking for" or "where should I apply".

Input parameters

FieldTypeDefaultDescription
subredditsstring[]["programming"]Subreddit names without r/ prefix. One run handles many.
sortenumnewnew, hot, top, best, rising, controversial
timeFilterenumdayWindow for top/controversial: hour, day, week, month, year, all
postsPerSubredditint101-100
fetchCommentsboolfalseWalks the comment tree per post
commentsPerPostint0Hard cap on comments per post (max 500)
watchlistModeboolfalseIdempotent diff against KV-stored seen IDs
maxItemsint50Hard cap on total records (posts + comments)
useApifyProxybooltrueApify residential proxy. Recommended.
apifyProxyGroupsstring[]["RESIDENTIAL"]

Reddit output format

The dataset has two record types. Filter by recordType.

reddit_post

FieldTypeDescription
outputSchemaVersionstringVersioned schema literal ('2026-05-08')
recordTypeliteral'reddit_post'
recordIdstringreddit:post:<postId> (idempotent, cross-run dedupe-friendly)
postId, fullnamestringReddit's t3_<id> IDs
url, permalinkPathstringDirect URL + canonical path
titlestringPost title
selftext, selftextHtmlstringBody text (markdown + HTML)
subreddit, subredditId, subredditSubscribersstring/intSubreddit context
authorobject{ username, fullname, isPremium, flairText }
createdAt, scrapedAtISO 8601Standard timestamps
score, ups, downs, upvoteRationumberEngagement (upvoteRatio is 0-1 float)
numComments, totalAwards, gildedintEngagement counts
mediaobject{ type (image/video/gallery/link/self), url, thumbnail, domain }
isVideo, isSelf, isOriginalContentboolPost-type flags
isPinned, isStickied, isLocked, isArchived, isOver18, isSpoilerboolMod/state flags
linkFlairText, linkFlairBackgroundColorstringFlair
fieldCompletenessScoreint 0-100Self-filtering signal
agentMarkdownstring300-500 char LLM-ready summary

reddit_comment

FieldTypeDescription
outputSchemaVersion, recordType, recordIdstringDiscriminated identity
commentId, fullname, url, permalinkPathstringReddit IDs
postId, parentId, depthstring/intTree linkage
body, bodyHtmlstringMarkdown + HTML body
subreddit, subredditIdstringSubreddit
authorobject{ username, fullname, isPremium, flairText }
createdAt, scrapedAtISO 8601Timestamps
score, ups, downs, totalAwards, gildedintEngagement
isSubmitter, isStickied, isControversial, editedboolStatus flags
fieldCompletenessScore, agentMarkdownint / stringQuality signals

Reddit scraper output example (post)

{
"outputSchemaVersion": "2026-05-08",
"recordType": "reddit_post",
"recordId": "reddit:post:1t7qsfi",
"postId": "1t7qsfi",
"url": "https://www.reddit.com/r/programming/comments/1t7qsfi/...",
"title": "How database work pulls you deep into systems engineering",
"subreddit": "programming",
"subredditSubscribers": 6874584,
"author": { "username": "clairegiordano", "isPremium": false, "flairText": null },
"createdAt": "2026-05-08T19:42:53.000Z",
"score": 234,
"upvoteRatio": 0.94,
"numComments": 89,
"media": { "type": "link", "url": "https://talkingpostgres.com/...", "domain": "talkingpostgres.com" },
"isSelf": false,
"linkFlairText": null,
"fieldCompletenessScore": 92,
"agentMarkdown": "**🟠 r/programming · How database work pulls you deep into systems engineering**\n- ⬆ 234 · 94% · 💬 89\n- 👤 u/clairegiordano\n- 👥 6.9M subscribers\n- 📅 posted 2026-05-08\n- 🔗 https://talkingpostgres.com/...",
"scrapedAt": "2026-05-08T22:30:00.000Z"
}

Reddit scraper output example (comment)

{
"outputSchemaVersion": "2026-05-08",
"recordType": "reddit_comment",
"recordId": "reddit:comment:k7q2xyz",
"commentId": "k7q2xyz",
"postId": "1t7qsfi",
"parentId": "1t7qsfi",
"depth": 0,
"body": "Great talk. The MemSQL → HorizonDB story is wild.",
"subreddit": "programming",
"author": { "username": "dbnerd", "isPremium": false, "flairText": null },
"createdAt": "2026-05-08T20:12:15.000Z",
"score": 17,
"fieldCompletenessScore": 100,
"agentMarkdown": "**💬 HN comment by u/dbnerd** (depth 0)\n> Great talk. The MemSQL → HorizonDB story is wild.\n- 🔗 https://www.reddit.com/r/programming/comments/1t7qsfi/_/k7q2xyz/"
}

During the Actor run

Each subreddit is fetched serially with a 1.1-second delay between requests (well under Reddit's published 60 requests/minute unauthenticated limit). When fetchComments is enabled, comment threads are walked depth-first per post, capped at commentsPerPost per post.

Each record is validated against a Zod schema before push. Validation failures, parse errors, and fetch errors are tracked in counters; you'll see them in the run summary OUTPUT object alongside the success counts.

The actor writes three things to its key-value store:

  1. OUTPUT — compact run summary: subreddits queried, posts saved, comments saved, watchlist deltas, error counts, finished timestamp
  2. AGENT_BRIEFING — markdown digest with the top 5 posts and top 5 comments by score, ready to paste into an LLM context
  3. WATCHLIST_STATE — (only when watchlistMode: true) the seen post and comment IDs, capped at 5,000 each, used to compute the next run's diff

FAQ

How does Reddit Subreddit Monitor work?

The actor calls Reddit's public listing endpoints (reddit.com/r/<subreddit>/<sort>.json) with a Reddit-compliant User-Agent, parses the response, normalizes each post into a versioned Zod schema, and pushes the records to your dataset. Optionally walks the comment tree for each post via reddit.com/r/<subreddit>/comments/<postId>.json.

Can I scrape multiple subreddits in one run?

Yes. Pass an array to subreddits. The actor iterates each one in order, with rate limiting between requests.

Can I monitor a subreddit for new posts only?

Yes. Set watchlistMode: true. The first run emits everything; subsequent runs emit only posts new since the previous run. Schedule with the Apify cron scheduler.

Can I get comments along with posts?

Yes. Set fetchComments: true and commentsPerPost to the cap you want (max 500 per post tree). One additional request per post, walked depth-first.

Can I use this with the Apify API?

Yes. Every Apify actor exposes a REST API: POST https://api.apify.com/v2/acts/skootle~reddit-subreddit-monitor/runs with your input as JSON body and your Apify API token as the Authorization: Bearer header. Full API docs are linked from the actor page.

Can I integrate this with Make / Zapier / n8n / Slack?

Yes. Apify provides native integrations for all of these. From the actor page, click Integrations and pick your destination.

Can I use this scraper as a Reddit API replacement?

For most read-only use cases on public subreddits, yes. Caveats: this actor does not return private content, modmail, or anything requiring an authenticated Reddit session, and it doesn't write (no posting, no voting, no commenting).

Can I increase the speed?

The actor rate-limits itself to 1.1 seconds between Reddit requests to stay under Reddit's published throttle. Reddit will block aggressive callers; this is the throttle that keeps reliability above 99%.

Can I get Reddit data in Python / JavaScript / TypeScript?

Yes. The output is plain JSON in an Apify dataset. Pull it via the Apify Python or JavaScript clients, the REST API, or by exporting from the Console as CSV/JSONL/Excel.

Can the actor return one comment per row?

Yes — that's the default with fetchComments: true. Each comment is a separate record. Filter on recordType: 'reddit_comment' to isolate the comment rows.

What if Reddit changes its layout?

This actor uses Reddit's stable public JSON endpoints (the same ones their own apps use), not HTML scraping. Reddit changes the endpoints rarely; when they do, the actor adapts within 24-48 hours. Issues are tracked on the actor page and resolved fast.

Why does this actor cost more than a free Reddit scraper?

Free actors trade reliability and feature depth for cost. This actor ships a versioned schema, idempotent record IDs (so cross-run dedupe just works), agentMarkdown for direct LLM consumption, watchlist diff mode, normalized author + media objects, ISO 8601 timestamps, and continuous maintenance. If you're feeding the data into an automated pipeline or AI agent, those features pay back the per-record cost in saved engineering hours.

Your feedback

Hit a bug or want a feature? Open an issue on the Issues tab rather than the reviews page, and we'll fix it fast (typically within 48 hours).

Why choose Reddit Subreddit Monitor

  • Agent-grade by design: every record carries agentMarkdown (300-500 chars) and a fieldCompletenessScore (0-100), so AI agents can self-filter and consume records directly without wrapping logic
  • Watchlist diff mode: only emits NEW records since the last run; safe to schedule daily without paying for duplicates
  • Versioned schema: outputSchemaVersion: '2026-05-08' literal on every record; never break a downstream pipeline
  • Idempotent record IDs: reddit:post:<id> and reddit:comment:<id> are stable across runs, so cross-run dedupe is trivial
  • Author profile join: posts carry author.{username, fullname, isPremium, flairText} so trust filtering is one field away
  • Discriminated union: posts and comments share one dataset (recordType: 'reddit_post' | 'reddit_comment'), filterable downstream
  • No Reddit OAuth required: uses Reddit's public listing endpoints with a compliant User-Agent
  • Continuous maintenance: when Reddit changes its layout, the actor adapts within 24-48 hours

Other Skootle actors you might want to check

Support and contact

File issues directly on this actor's page (Issues tab) — replies within 48 hours. For feature requests, drop them in the same issue tracker tagged enhancement.