Reddit Scraper - Monitoring, Signals & Attention Queue avatar

Reddit Scraper - Monitoring, Signals & Attention Queue

Pricing

Pay per usage

Go to Apify Store
Reddit Scraper - Monitoring, Signals & Attention Queue

Reddit Scraper - Monitoring, Signals & Attention Queue

Scrapes Reddit and returns a ranked attention queue: brand monitoring, mention tracking, sentiment analysis, and breakout detection in one run. Drop-in compatible with existing Reddit scraper workflows. $0.002 per record.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Ryan Clinton

Ryan Clinton

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 hours ago

Last modified

Share

Reddit Scraper - stop reading Reddit rows, start tracking what changed

In one sentence

Reddit Scraper is a Reddit monitoring and attention-routing engine that searches public Reddit posts, comments, communities, and users and returns a ranked attention queue of what changed, what matters, and what needs a look right now.

Category: Reddit monitoring tool. Reddit mention tracker. Reddit sentiment analysis actor. Primary use case: Track a brand or keyword across Reddit and get back the few threads that need attention now, ranked, with a reason and a recommended action on each. Can also be used for trend discovery, community research, search reranking, and drop-in row scraping.

Also known as: Reddit scraper, Reddit brand monitoring, Reddit mention tracking, Reddit sentiment signals, Reddit attention queue, subreddit monitor.

What you want to know → what Reddit Scraper tells you

You want to knowReddit Scraper tells youField
Is Reddit talking about us more?Mention spike vs baselinesignalEventsmention_spike
Is sentiment changing?Sentiment shift, with evidencesignalEventssentiment_shift
Which threads matter most?Ranked attention queueattentionIndex, watchStatus
Is this getting worse or better?Direction + volatilitysignalTrajectory
Is this recurring or structural?Recurrence class + historypersistentSignal, narrativeMemory
What changed since last run?Delta since last comparisonstateTransition, delta arrays
What did you ignore, and why?Suppressed-noise auditsuppressedSignals, trustDiagnostics

What this actor does

  • What it is: the first operational Reddit intelligence layer of its kind on the Store. It scrapes public Reddit and adds a decision layer on top of the rows.
  • What it checks: breakout posts, mention spikes, sentiment shifts, topic surges, community acceleration, and author momentum, each as a typed, evidenced signal.
  • What it returns: a sortable attentionIndex (0-100), a watchStatus, plain-English whyNow reasons, a recommendedAction, and the full substrate fields a standard Reddit scraper emits.
  • What it does NOT do: it never logs in, never reads private content, and performs no in-Reddit actions (no posting, replying, voting, or DMing). It is not a brand-safety or controversy scorer.
  • Who it's for: brand and social teams, PR and comms, market researchers, trend and VC scouts, AI/RAG teams, competitor analysts.

Reddit Scraper functions as a Reddit signals API rather than a row dumper. Where a plain Reddit scraper hands back thousands of flat post, comment, community, and user records and leaves you to read every one, Reddit Scraper detects breakout posts, mention spikes, and sentiment shifts, deduplicates threads, and ranks the output so the first rows are the ones worth opening. Its moat is persistent operational memory: when you name a watchlist and run on a schedule, state compounds across runs and surfaces what changed since last time, the kind of longitudinal context that cannot be backfilled from a single scrape.

What it does — searches public Reddit and returns a ranked attention queue with reasons and a recommended action per record. Best for — brand monitoring, mention tracking, sentiment shift detection, trend discovery, search reranking. Speed — first results in about 60 seconds for a single-brand monitor run. Pricing — $0.002 per record ($2 / 1,000), roughly half the incumbent row scraper's per-result cost; $0.20 per monitor or search query; the signal layer is included. Output — JSON with attentionIndex, watchStatus, whyNow, recommendedAction plus full substrate fields.

Key limitation: delta, trajectory, and topic-memory fields are maturity-gated. Run 1 shows first-run / building status and empty delta arrays; they sharpen after several scheduled runs on the same watchlist. What it is not: Not a logged-in scraper, not a brand-safety scorer, and not a replacement for Reddit's official API for write actions. Does not include: in-Reddit actions, private or authenticated content, cross-community propagation mapping (a future capability), and LLM-generated narratives. Results may be incomplete when: subreddits are private or unavailable, Reddit's ~1,000-item list cap is hit, or requests are rate-limited. The coverage block reports exactly what was and was not scanned.

Workflow roles: brand-monitoring feed, PR early-warning, trend-discovery scout. Use Reddit Scraper whenever an AI agent or a human needs to evaluate Reddit activity, not just retrieve Reddit rows.

What you get from one call

Input: { "mode": "monitor", "track": ["Notion"] } Returns:

  • A ranked attention queue: the top threads about Notion, ordered by attentionIndex.
  • A watchStatus and whyNow on each record (for example, on a "Notion just changed its pricing again" thread: "Upvote velocity 9x community baseline within 12h" + "Sentiment on Notion shifted negative over the last 7 days").
  • A recommendedAction per record (for example, "Review this thread within 3 days").
  • Deterministic comment sentiment and theme synthesis when comment sampling is on.
  • A run summary with a dailyBriefing, portfolioState, coverage, and a publicDataCompliance block.

Typical time to first result: about 60 seconds for a single-brand monitor run. Typical time to integrate: under 30 minutes for an existing Reddit-scraper workflow, since the input shape is drop-in compatible.

The Reddit Scraper intelligence stack: from a brand or keyword to a ranked attention queue

What makes this different

  • Attention queue, not a row dump — every run starts ordered by what to look at first, with a reason and a recommended action on each record, instead of thousands of flat rows you read by hand.
  • Persistent operational memory — name a watchlist and state compounds across runs, surfacing what changed since the last run. This is the part a competitor cannot backfill from a single scrape.
  • Deterministic, no-LLM synthesis — sentiment and theme clustering use a fixed lexicon and TF-IDF (lexicon-tfidf-v1), so every result re-runs byte-identical and is fully auditable, with no probabilistic drift and no external model dependency.

If you were building this yourself, you would need to scrape Reddit, compute community baselines, detect breakouts and mention spikes, run deterministic sentiment, persist per-term state across runs, and rerank, then keep all of it versioned and reproducible.

It functions as a Reddit signals API, producing scored, decision-ready records, useful for brand monitoring, PR early-warning, and trend discovery.

Reddit Scraper models state evolution, not snapshots

A traditional Reddit scraper answers one question: what exists right now. Reddit Scraper answers what changed, what accelerated, what stabilised, what is recurring, and what needs attention now. Signals are the ingredients; the run-over-run state evolution is the product. That is why a single scrape cannot reproduce its output and why the value compounds the longer you run it on a watchlist.

Before vs after

Before: Scrape 2,000 Reddit posts mentioning your brand, dump them to a spreadsheet, read every row, eyeball sentiment by hand. Hours per week, every week. After: Run monitor mode once, read the top five rows of the attention queue, act on the ones flagged urgent or critical. Minutes per day.

What Reddit Scraper replaces

Instead of:

  • scraping Reddit into spreadsheets and reading rows by hand
  • eyeballing sentiment thread by thread
  • comparing this week's export against last week's by hand
  • writing and maintaining custom Reddit monitoring scripts
  • paying for a general social-listening suite to cover Reddit

Reddit Scraper automatically:

  • detects mention spikes and sentiment shifts, with evidence
  • ranks attention-worthy threads into one queue
  • tracks what changed between runs
  • suppresses duplicate-crosspost noise and shows you what it ignored
  • builds persistent monitoring memory that compounds over time

In short: it replaces the manual Reddit-triage workflow, not just the scraper that feeds it.

Quick answers

What is it? Reddit Scraper is a Reddit monitoring and attention-routing engine. It scrapes public Reddit and returns a ranked attention queue with a reason and a recommended action per record.

How do I monitor a brand on Reddit? Run monitor mode with your brand or keyword in track. You get back the threads that need attention now, ranked by a sortable attentionIndex, each with whyNow and a recommendedAction.

What makes it different? It ships decisions, not just rows, and its persistent operational memory compounds across scheduled runs to show what changed, which a single scrape cannot reconstruct.

What data sources does it use? Public Reddit posts, comments, communities, and users only. No login, no private content, no Reddit write actions.

What does it return? A sortable attentionIndex (0-100), a watchStatus, whyNow reasons, a recommendedAction, and the full substrate fields a standard Reddit scraper emits.

How much does it cost? $0.002 per record ($2 per 1,000) plus $0.20 per monitor or search query. The signal layer is included; monitoring adds no extra per-record cost.

Is it deterministic? Yes. Sentiment and themes use a fixed lexicon and TF-IDF, so every run reproduces byte-identical results.

Reddit Scraper at a Glance

Quick facts:

  • Input: a brand or keyword (track), subreddits (communities), post URLs, usernames, or a search query.
  • Output: attentionIndex, watchStatus, whyNow, recommendedAction, plus full substrate fields (title, url, upVotes, numberOfComments, communityName, body).
  • Pricing: $0.002 per record; $0.20 per monitor/search query.
  • Batch size: up to 1,000 records per run (Reddit's platform-wide list cap; use scheduled monitor runs to capture beyond it incrementally).
  • Modes: monitor, communities, posts, users, search.
  • Output profiles: signals (default), compat (drop-in substrate), agent (compact decisions), minimal (IDs only).
  • Determinism: every result re-runs byte-identical (no external LLM).
  • Compliance: public-content-only, no login, no in-Reddit actions.

Input -> Output:

  • Input: a brand, keyword, subreddit, post URL, username, or search query.
  • Process: scrape public Reddit, detect typed signals, score, rank, and (in monitor mode) diff against persisted state.
  • Output: a ranked attention queue of decision-ready records plus a run summary.

Best fit: brand and product mention tracking, PR issue detection, trend and topic discovery, community momentum research, search reranking, drop-in migration from a row scraper. Not ideal for: logged-in or private content, real-time webhook alerting (use scheduling), brand-safety or toxicity scoring, write actions on Reddit. Does not include: in-Reddit actions, private content, cross-community propagation mapping, LLM-generated prose.

Problems this solves:

  • How to track brand mentions across Reddit without reading every row.
  • How to catch a brewing PR issue on Reddit before it blows up.
  • How to find emerging Reddit topics and shifting opinion.
  • How to know what changed on Reddit since your last check.

Data trust: all data is scraped from public Reddit. Delta, trajectory, and memory fields are maturity-gated and stay null or building until enough scheduled runs accumulate. The actor never fabricates history; the coverage and historicalProfile blocks state exactly how much it knows.

Best fit / Less suitable

Best fit:

  • Brand and social teams running a daily mention-tracking feed across several subreddits.
  • PR and comms teams that need a sentiment shift or mention spike surfaced early, with evidence.
  • Researchers and trend scouts watching topic surges and community acceleration over weeks.

Less suitable:

  • Reading private, deleted, or quarantined content. Reddit Scraper reads public content only.
  • Real-time second-by-second alerting. Schedule monitor runs instead (daily or hourly).
  • Judging whether a post is "problematic." contested_thread is a descriptive engagement-divergence signal, not a controversy or safety call.

Scope disclaimer: Reddit Scraper is a read-only public-monitoring actor. It does not perform any in-Reddit action, and it does not score brand safety or toxicity. (Field-by-field definitions are in the Definitions section below.)

What is a Reddit monitoring tool?

A Reddit monitoring tool watches Reddit over time for changes that matter to you (mentions, sentiment, emerging topics) and tells you what needs attention, rather than just exporting rows. Most Reddit actors on the Store stop at extraction; Reddit Scraper adds the interpretation, ranking, and run-over-run state that turns a scrape into a monitoring feed.

Why Reddit monitoring is hard

Reddit monitoring is harder than it looks, which is why most tools stop at extraction:

  • Reddit lists cap at roughly 1,000 items, so a single scrape can silently miss the rest.
  • Communities behave differently. A spike that matters in a 20k-member subreddit is noise in a 2M-member one.
  • Viral spikes create noisy false positives. One crosspost duplicated across communities looks like a trend.
  • Sentiment shifts are usually gradual, not a single dramatic post, so they hide in the row dump.
  • Most tools forget every prior run completely, so they can never tell you whether something is recurring or structural.

Reddit Scraper solves these with community-relative baselines, deterministic scoring, duplicate-crosspost suppression with a trust audit, and persistent operational memory that accumulates state across scheduled runs. That last piece is why a competitor cannot replicate the output by scraping the same data tomorrow.

Common Reddit monitoring problems

Reddit exports too many rows

Traditional Reddit scrapers export thousands of rows with no prioritisation. Reddit Scraper returns a ranked attention queue instead, so the first rows are the ones worth opening.

Reddit sentiment changes slowly

Most Reddit issues emerge gradually across comments and communities, not in one dramatic post. Reddit Scraper tracks sentiment shifts across runs and surfaces the trend before it is obvious.

Reddit spikes create false positives

Crossposts and viral reposts duplicate the same content across communities and look like a trend. Reddit Scraper suppresses duplicate-crosspost artifacts and exposes exactly what it ignored in trustDiagnostics.

Most monitoring tools forget prior runs

A one-shot scrape cannot tell you whether an issue is new or recurring. Reddit Scraper persists operational memory per watchlist, so recurring narratives and structural issues are detected over time via persistentSignal and narrativeMemory.

What data can you extract?

Data PointSourceAvailabilityExample
Post titlePublic postAlways"Notion just changed its pricing again"
Up votesPublic postAlways1,842
Number of commentsPublic postAlways326
Community namePost / communityAlwaysr/Notion
Comment bodyPublic commentWhen comments sampled"The new plan is way too expensive"
User karmaPublic profileUsers modepostKarma 4, commentKarma 10
Attention indexComputedsignals profile78
Watch statusComputedsignals profileattention-required
Why nowComputedsignals profile"Upvote velocity 9x baseline within 12h"
Comment sentimentComputed (deterministic)When comments sampledpositive 0.41, negative 0.33
Comment themesComputed (deterministic)When comments sampledPricing complaints (weight 0.34)
Signal profileComputedCommunity recordsemerging

Four features: ranked not dumped, tracks recurrence, quiet when it's quiet, shows what it ignored

Why use Reddit Scraper?

Before: scraping Reddit for brand mentions ends the same way for every team: a spreadsheet, read row by row, sentiment eyeballed by hand, every day. The bottleneck in 2026 is not getting Reddit data, it is operationalising it fast enough.

Reddit Scraper closes that gap in one run. Instead of 2,000 rows you triage by hand, you get the threads that need attention now, ranked, with a reason and a recommended action on each. The real competitor here is not another scraper; it is the spreadsheet workflow that sits downstream of one.

Key difference: a plain Reddit scraper hands you rows and stops. Reddit Scraper hands you decisions and remembers what it told you last run.

The monitoring workflow, step by step. A row scraper does step one and leaves the rest to you:

Workflow stepStandard Reddit scraperReddit Scraper
Scrape posts and commentsYesYes
Detect breakout threadsManualAutomatic
Detect sentiment shiftsManualAutomatic
Rank what mattersManualAutomatic
Compare against prior runsManualAutomatic
Suppress duplicate-crosspost noiseManualAutomatic
Build persistent memory of recurring issuesImpossibleBuilt in
Read thousands of rows by handRequiredReplaced

The grounded feature comparison against the incumbent:

FeatureReddit Scrapertrudax/reddit-scraper-lite
Public Reddit substrate (posts, comments, communities, users)YesYes
Ranked attention queue (attentionIndex)YesNot a feature
Mention spike + sentiment shift detectionYesNot a feature
Deterministic comment sentiment + theme synthesisYes (no LLM)Not a feature
Breakout + topic-surge detectionYesNot a feature
Persistent monitoring state across runsYesNot a feature
Search reranking (breakoutPotential, momentum)YesNative sort only
Drop-in compatible input + substrate fieldsYes (compat profile)n/a
Per-result price$0.002 / record ($2 / 1k)$0.004 / result ($4 / 1k)
Free-tier headroom ($5 credits)~2,500 records~1,250 results

Pricing and features based on publicly available information as of May 2026 and may change. Re-verify the incumbent's live price before relying on the comparison.

Unlike a row scraper, which is built to export everything and let you sort it out, Reddit Scraper is built for automation-first monitoring workflows where the first rows are the ones that matter.

Platform capabilities

  • Scheduling — run monitor mode daily or hourly with the same watchlistName to build a persistent Reddit feed.
  • API access — trigger from Python, JavaScript, or any HTTP client via the Apify API.
  • Proxy rotation — Apify residential proxies by default, with conservative rate limits and a circuit breaker on consecutive blocks.
  • Monitoring — Slack or email alerts when runs fail, via Apify integrations.
  • Integrations — Zapier, Make, Google Sheets, webhooks, and MCP/agent consumers via the agent output profile.

The five modes

Each mode produces one buyer-facing outcome.

ModeOne-line outcome
monitor (hero)"These Reddit threads about your brand need attention right now." Track a brand or keyword over time; name a watchlist to build a feed of only what changed.
communities"These subreddits, ranked by what's heating up." Scan communities, each tagged with a signal profile.
posts"Here's what's notable about each thread." Analyse post URLs; each carries whyThisMatters plus comment-theme synthesis.
users"This author is gaining momentum." Surface posting and karma velocity per user.
search"Of 500 posts matching this query, these 4 are popping." Rerank search results by breakout potential, momentum, or attention.

Reddit Monitoring Features

Reddit Scraper layers a deterministic signal engine and a persistent state engine on top of a drop-in Reddit substrate. Signals are typed and evidenced, scoring is bounded so no single component dominates, and every version constant is pinned so a run is reproducible and auditable.

Signal detection

  • Eight typed signal eventsbreakout_post, community_acceleration, community_deceleration, topic_surge, sentiment_shift, mention_spike, author_emergence, contested_thread. Each carries signalStrength (heuristic 0-1), evidenceGrade (weak/moderate/strong), a decayStatus (fresh/active/fading/expired), and an evidence object with the underlying z-scores and windows.
  • Eight community signal profilesemerging, breakout, stable-authority, viral-fragile, high-engagement-niche, decelerating, dormant, unclassified, with strength and plain-English evidence.
  • Suppressed signals — signals that fired but were judged noise are surfaced with a reason and a noiseRisk, never silently dropped, so you can see what the actor ignored across communities.

Decision and routing

  • attentionIndex (0-100) — the single sortable composite, with a breakdown audit trail and top-3 paste-ready drivers.
  • watchStatus — the routing primitive to branch on (no-action, monitor, attention-required, urgent, critical).
  • whyNow + recommendedAction — plain-English reasons plus a prioritisation instruction (Review, Read, Monitor, Track, Re-check, Compare, Investigate), never an in-Reddit engagement instruction.
  • agentContract + agentDecision — a compact decision surface (review_now / monitor / ignore) for MCP, AI-agent, and RAG consumers.

Comment intelligence (deterministic, no LLM)

  • Sentiment — positive/negative/neutral with a confidenceBand that scales with sample size, via a fixed lexicon.
  • Themes — TF-IDF clustering mapped to a fixed theme dictionary (stable themeCode enum), with keywords and example comment IDs.

Persistent state (monitor mode)

  • signalTrajectory — direction, velocity, stability, phase, and trajectoryClass in one block. Maturity-gated to unknown until 3+ runs.
  • persistentSignal — recurrence classification (transient, recurring, persistent, cyclical, dormant) from accumulated state.
  • narrativeMemory — cross-run topic memory with historical peaks and cycle length; advanced fields stay null until enough cycles exist.
  • stateTransition + changeFlags — what changed since the last run (NEW_BREAKOUT, MENTION_SPIKE, SENTIMENT_DOWN, SIGNAL_EXPIRED).

Run summary and trust

  • dailyBriefing + portfolioState — a morning brand-monitoring surface and a CIO-glance over the run.
  • runOutcome — a quiet-mode status (quiet / active / high-activity) so a monitor that says "nothing fired" is trusted when it does fire.
  • coverage — requested vs scanned communities, typed skip reasons, and a coverageStatus.
  • publicDataCompliance — a machine-readable read-only posture with inRedditActionsPerformed hard-wired to false.
  • trustDiagnostics + runManifest — suppression counts, dedup counts, coverage confidence, and every pinned version constant for audit.

Use cases for Reddit monitoring

Reddit monitoring for product marketing

Use when you need to know how Reddit is talking about your product this week. Set mode: monitor, put your brand in track, name a watchlistName, and schedule daily. PMM teams use Reddit Scraper to surface mention spikes and recurring complaint narratives without reading every thread. Key outputs: attentionIndex, whyNow, signalEvents, commentIntelligence.

Reddit monitoring for PR teams

Use when a brewing issue needs to be caught before it blows up. Set persona: pr-comms so sentiment shifts and mention spikes escalate immediately, with evidence on each. Key outputs: watchStatus, attentionWindow, signalTrajectory, changeFlags.

Reddit monitoring for researchers and trend scouts

Use when you are scouting which topics or communities are accelerating before they are obvious. Set persona: trend-research and rankBy: breakoutPotential. Key outputs: signalProfile, topic_surge events, communityHealth, narrativeConcentration.

Reddit monitoring for competitor analysis

Use when you track how a community discusses rivals over time. Schedule monitor runs and read the run-over-run delta to see what shifted. Key outputs: signalTrajectory, persistentSignal, stateTransition, portfolioState.

Reddit monitoring for investors and scouts

Use when you want to spot a community or product narrative accelerating early. Schedule weekly monitor runs on a watchlist; signalTrajectory and persistentSignal tell you whether momentum is building or fading and whether an issue is recurring. Key outputs: signalTrajectory, persistentSignal, narrativeMemory, communityHealth.

Reddit monitoring for AI agents and RAG pipelines

Use when an agent or pipeline needs Reddit content with quality and sentiment signals attached. Set outputProfile: agent for a compact, deterministic decision surface, or signals for the full envelope. Key outputs: agentContract, commentIntelligence, attentionIndex, materiality.

When to use Reddit Scraper

Best for:

  • Daily brand monitoring across 5-10 subreddits with a named watchlist.
  • Weekly trend and competitor research that compares against the prior run.
  • Search reranking across hundreds of results to surface what is popping.
  • Migrating an existing Reddit-scraper workflow to a cheaper, signal-rich substrate.

Not ideal for:

  • Logged-in or private content. Reddit Scraper reads public content only.
  • Real-time alerting. Schedule monitor runs (hourly or daily) instead.
  • Brand-safety or toxicity scoring. Use a dedicated content-moderation tool.

How to monitor a brand on Reddit

  1. Enter your brand or keyword — set mode to monitor and add your term to track, for example ["Notion"]. Add subreddits in communities to focus the scan.
  2. Configure options — name a watchlistName to persist state across runs (leave empty for a one-shot run). Defaults cover most cases: rankBy: attention, persona: brand-monitoring.
  3. Run the actor — click Start. A single-brand monitor run returns first results in about 60 seconds.
  4. Download results — open the Attention Queue view, or export JSON, CSV, or Excel from the Dataset tab.

First run tips

  • Start with the monitor demo — leave the fields at their defaults to run the built-in track: ["Notion"] example and see the attention queue before pointing it at your own brand.
  • Run 1 will not show deltashistoricalProfile reports first-run and delta arrays are empty by design. Schedule a second run with the same watchlistName to start the memory clock; it cannot be backfilled.
  • Turn on comment sampling for sentiment — set includeCommentsSample: true to unlock commentIntelligence (sentiment shift detection needs sampled comments).
  • Name your watchlist — without watchlistName, the run is one-shot with no persistence and no delta intelligence.
  • Test small — keep maxResults low (for example 50) for your first run before scaling up.

Typical performance

Observed in internal testing (May 2026, small sample). Reddit is rate-limited and client-rendered, so figures vary by mode, depth, and proxy conditions.

MetricTypical value
Records per runup to 1,000 (Reddit list cap)
Run time (single-brand monitor)~60-120 seconds
Run time (multi-community deep scan)several minutes
First result latency~60 seconds
Cost per 1,000 records$2.00 + $0.20 per query

Input parameters

ParameterTypeRequiredDefaultDescription
modestringNomonitorEntry point: monitor, communities, posts, users, search.
trackarrayNo["Notion"]Monitor mode: brand names or keywords to track across Reddit.
communitiesarrayNo[]Subreddits to scan or monitor (r/Notion, Notion, or a URL).
postsarrayNo[]Posts mode: Reddit post permalinks to analyse.
usersarrayNo[]Users mode: usernames or profile URLs.
searchQuerystringNo""Search mode: keyword or phrase to search and rerank.
searchTypestringNopostsSearch mode target: posts, communities, users.
rankBystringNoattentionOrdering axis: attention, breakoutPotential, momentum, engagement, recency, relevance.
watchlistNamestringNo""Name a watchlist to persist state across runs and unlock delta intelligence.
deltaWindowDaysintegerNo7Window for run-over-run change detection.
personastringNobrand-monitoringReshapes materiality weights: brand-monitoring, trend-research, community-discovery, pr-comms, generic.
includeCommentsSamplebooleanNofalseSample comments and run deterministic theme + sentiment synthesis.
commentsSamplePerPostintegerNo100Comments sampled per post (raises sentiment confidence band).
outputProfilestringNosignalssignals (full), compat (drop-in substrate), agent (compact decisions), minimal (IDs only).
analysisDepthstringNostandardCoverage/runtime tradeoff: fast, standard, deep.
explainabilitystringNostandardVerbosity: standard (decision surface) or full (all evidence blocks).
maxResultsintegerNo100Hard cap on records emitted (max 1,000).
maxPostsPerCommunityintegerNo100Per-community post cap.
maxRecentItemsintegerNo50Users mode: recent posts/comments per user.
sortstringNohotReddit listing/search sort.
timestringNomonthTime window for top/search sorts.
includeNSFWbooleanNofalseInclude over-18 communities/posts.
maxRuntimeSecondsintegerNo3600Runtime budget; emits partial results plus a summary before timeout.
startUrlsarrayNo[]Compatibility alias: paste post/user/community URLs from an existing workflow.
searchesarrayNo[]Compatibility alias for keyword search.
searchCommunityNamestringNo""Compatibility alias: restrict search to one community.
proxyConfigurationobjectNoresidentialProxy settings; residential recommended for Reddit.

Input examples

  • Monitor a brand (hero): { "mode": "monitor", "track": ["Notion"], "rankBy": "attention" }
  • Daily brand feed with persistence: { "mode": "monitor", "track": ["Notion", "Obsidian"], "communities": ["r/productivity", "r/Notion"], "watchlistName": "notion-brand", "deltaWindowDays": 7, "persona": "pr-comms", "includeCommentsSample": true }
  • Drop-in migration (compat): { "startUrls": [{ "url": "https://www.reddit.com/r/Notion/" }], "outputProfile": "compat", "maxResults": 200 }

Input tips

  • Start with defaults — the default monitor mode covers most first runs.
  • Name a watchlist for monitoringwatchlistName is what unlocks delta intelligence; without it the run is one-shot.
  • Use the compat profile to verify migrationoutputProfile: compat returns the exact substrate field set with no signal fields.
  • Sample comments for sentimentincludeCommentsSample: true is required for sentiment shift detection.

Output example

{
"schemaVersion": "1.0",
"recordType": "post",
"eventId": "t3_144w7sn",
"attentionIndex": {
"value": 78,
"breakdown": { "activeSignals": 22, "mentionSpike": 18, "sentimentShift": 14, "breakoutStrength": 16, "communityMomentum": 8 },
"drivers": [
"Breakout post: upvote velocity 9x community baseline within 12h.",
"Mention volume up 3x vs 30-day baseline.",
"Sentiment shifted negative (-0.22) over 7 days."
]
},
"watchStatus": "attention-required",
"whyNow": [
"Breakout post detected 12h ago, upvote velocity 9x community baseline.",
"Sentiment on 'Notion' shifted negative over the last 7 days."
],
"agentDecision": "review_now",
"recommendedAction": "Review this thread within 3 days.",
"attentionWindow": { "urgency": "high", "recommendedReviewWithinHours": 24 },
"attentionPriority": "high",
"communityName": "r/Notion",
"signalEvents": [
{
"type": "breakout_post",
"signalStrength": 0.91,
"active": true,
"firstDetectedAt": "2026-05-21T09:00:00Z",
"peakAt": "2026-05-20T15:00:00Z",
"daysSincePeak": 1,
"decayStatus": "fresh",
"evidenceGrade": "strong",
"reason": "Upvote velocity 9.2x subreddit p90 within 12h of posting.",
"evidence": { "postId": "t3_144w7sn", "upvoteZScore": 4.1, "ageHours": 12, "communityP90": 4200 }
}
],
"commentIntelligence": {
"status": "full",
"method": "lexicon-tfidf-v1",
"sampleSize": 100,
"sentiment": { "positive": 0.41, "negative": 0.33, "neutral": 0.26, "confidenceBand": "medium" },
"themes": [
{ "themeCode": "pricing_complaints", "label": "Pricing complaints", "weight": 0.34, "keywords": ["price", "expensive", "plan"], "exampleCommentIds": ["t1_x", "t1_y"] }
],
"audienceSignals": ["Comments referencing professional/team use."],
"limitations": ["Theme labels are deterministic keyword labels, not generated summaries."]
},
"signalTrajectory": {
"direction": "accelerating",
"velocity": 0.81,
"stability": 0.22,
"phase": "expansion",
"trendDurationDays": 12,
"trajectoryClass": "emerging-risk"
},
"title": "Notion just changed its pricing again",
"url": "https://www.reddit.com/r/Notion/comments/144w7sn/",
"upVotes": 1842,
"numberOfComments": 326,
"dataType": "post"
}

Sample Reddit Scraper output: a ranked attention queue with watch status and recommended action per row

Definitions

  • Attention queue — a ranked list of Reddit threads ordered by operational importance, not by date or raw upvotes.
  • Attention index — a bounded 0-100 composite score for how worth reviewing a Reddit thread is right now.
  • Watch status — the routing state for a record: no-action, monitor, attention-required, urgent, or critical.
  • Persistent operational memory — cross-run Reddit monitoring state that accumulates over time and cannot be reconstructed from a single scrape.
  • Signal trajectory — whether a narrative is accelerating, rising, stable, fading, or volatile.
  • Persistent signal — a recurring signal classified as transient, recurring, persistent, cyclical, or dormant from accumulated history.
  • Suppressed signal — a signal that fired but was filtered as noise, surfaced with its reason rather than silently dropped.

Output fields

FieldTypeDescription
schemaVersionstringOutput schema version (additive-only). Pin automations to this.
recordTypestringDiscriminator: post, comment, community, user, search_result, summary, error.
eventIdstringStable dedup key (native Reddit fullname, or syn_ sha256 for synthetic records).
attentionIndexobjectThe single sortable composite (0-100) with breakdown and top-3 drivers.
watchStatusstringRouting primitive: no-action, monitor, attention-required, urgent, critical.
whyNowarrayPlain-English reasons this record needs attention now.
recommendedActionstringPrioritisation instruction (never an in-Reddit engagement instruction).
attentionWindowobjectMachine primitive: urgency plus recommendedReviewWithinHours.
agentDecisionstringFlat decision enum: review_now, monitor, ignore.
agentContractobjectCompact decision surface for MCP/AI/RAG consumers.
signalEventsarrayTyped, evidenced events with signalStrength, evidenceGrade, decayStatus.
signalProfilestringCommunity archetype (8-value enum).
whyThisMattersarrayPer-post operational notes (posts mode).
commentIntelligenceobjectDeterministic sentiment + themes (lexicon-tfidf-v1).
actionabilitynumber0-1: how worth acting on now.
materialitynumber0-100: how much this matters to the tracked term/persona.
signalTrajectoryobjectState backbone: direction, velocity, stability, phase, class.
persistentSignalobjectRecurrence classification from accumulated state.
narrativeMemoryobjectCross-run topic memory (maturity-gated).
suppressedSignalsarraySignals judged noise, with reason and noiseRisk.
coverageobjectRequested vs scanned communities, skip reasons, status (summary record).
publicDataComplianceobjectRead-only posture; inRedditActionsPerformed hard-wired false (summary record).
runOutcomeobjectQuiet-mode status: quiet, active, high-activity (summary record).
portfolioStateobjectCIO-glance over the run (summary record).
trustDiagnosticsobjectSuppression, dedup, and coverage confidence (summary record).
title, url, upVotes, numberOfComments, body, communityName, dataTypemixedSubstrate fields, identical name and type to a standard Reddit scraper (compat).

How much does it cost to monitor Reddit?

Reddit Scraper uses pay-per-event pricing — you pay $0.002 per record ($2 per 1,000) plus $0.20 per monitor or search query. The signal layer is included, and monitor runs cost the same per record as one-shot runs. Platform compute costs are included.

ScenarioRecordsCost per recordQuery costTotal cost
Quick test50$0.002$0.20$0.30
Small batch200$0.002$0.20$0.60
Daily brand feed500$0.002$0.20$1.20
Large pull1,000$0.002$0.20$2.20
Heavy monitoring10,000$0.002$2.00 (10 queries)$22.00

That is roughly half the effective per-result cost of a standard row scraper, with the signal layer included. Apify's free tier ($5 monthly credits) runs about 2,500 records here. Set a spending limit on the actor to cap costs.

Monitor Reddit using the API

All three examples use the same input — the canonical monitor call { "mode": "monitor", "track": ["Notion"], "rankBy": "attention" }.

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/reddit-scraper").call(run_input={
"mode": "monitor",
"track": ["Notion"],
"rankBy": "attention",
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
ai = item.get("attentionIndex") or {}
print(f"{ai.get('value')} {item.get('watchStatus')} {item.get('title')}")

JavaScript

import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/reddit-scraper").call({
mode: "monitor",
track: ["Notion"],
rankBy: "attention",
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
console.log(`${item.attentionIndex?.value} ${item.watchStatus} ${item.title}`);
}

cURL

curl -X POST "https://api.apify.com/v2/acts/ryanclinton~reddit-scraper/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "mode": "monitor", "track": ["Notion"], "rankBy": "attention" }'

The run response's defaultDatasetId is then fetched from /v2/datasets/{id}/items.

Why deterministic Reddit monitoring matters

Most "AI-powered" monitoring tools generate different outputs from the same Reddit data on different runs, because an LLM sits in the scoring path. Reddit Scraper uses deterministic sentiment and theme synthesis (lexicon-tfidf-v1):

  • the same comments always produce the same sentiment and themes
  • signals are reproducible and the scoring is auditable
  • outputs are stable across runs, so automation can branch on them safely
  • there is no external LLM dependency and no probabilistic drift

That makes it suitable for operational monitoring, compliance-sensitive environments, AI pipelines that need stable inputs, and any automation that branches on the output. Unlike tools that paraphrase Reddit through a model, Reddit Scraper's numbers mean the same thing every run.

How Reddit Scraper works

Mental model: Reddit -> substrate fetch -> signal detection -> state engine -> decision -> ranked attention queue.

LayerWhat happens
1 ExtractionScrape public posts, comments, communities, users (the compat substrate).
2 InterpretationDetect typed signal events with evidence and a decay status.
3 State engineDiff against persisted watchlist state; compute trajectory, persistence, memory.
4 DecisionCompute attentionIndex, watchStatus, agentDecision, attentionWindow.
5 PortfolioRoll up portfolioState, narrativeConcentration, trustDiagnostics over the run.

Signal detection

After the substrate fetch, deterministic detectors fire against community baselines. A breakout_post needs an upvote z-score at or above 2 within 48 hours; a mention_spike needs current-window volume at or above 2x baseline; a sentiment_shift needs a delta magnitude at or above 0.15 on a 100+ comment sample. Thresholds are pinned to signalDetectionVersion: 1.0.

Comment intelligence

Comments are tokenised, stopword-stripped, scored with TF-IDF, clustered by term co-occurrence, and mapped to a fixed theme dictionary. Sentiment uses the VADER lexicon. There is no external LLM, so the same comments produce the same themes every run.

State engine (the moat)

When a watchlist is named, per-term and per-community history accumulates in a separate named key-value store (reddit-monitor-<name>). Trajectory, recurrence, and topic memory are derived from that history. The maturity model is honest: advanced fields stay null or building until enough runs exist.

Tips for best results

  1. Schedule the same watchlist daily. The product is the run-over-run delta; one run cannot show what changed.
  2. Pick the persona that matches your job. pr-comms escalates sentiment and mention spikes; trend-research escalates topic surges and community acceleration.
  3. Use rankBy to change the lens. breakoutPotential for scouts, attention for monitoring, momentum for trend acceleration.
  4. Filter on watchStatus, not raw scores. Branch automation on WHERE watchStatus IN ('urgent','critical').
  5. Read the Suppressed Signals view. Seeing what the actor ignored, and why, builds trust in the alerts it raises.
  6. Use outputProfile: agent for AI consumers. It returns a compact decision surface for MCP, agent, and RAG pipelines.

Combine with other Apify actors

ActorHow to combine
Trustpilot Review AnalyzerPair Reddit sentiment with review sentiment for the same brand.
Multi-Review AnalyzerJoin Reddit signals with Trustpilot and BBB review trends.
Website Change MonitorCorrelate a Reddit mention spike with a pricing-page change.
Company Deep ResearchAdd Reddit attention signals to a company intelligence report.
Website Content to MarkdownConvert linked articles to markdown for an AI/RAG pipeline alongside Reddit threads.

Limitations

  • Reddit's ~1,000-item list cap. Any Reddit list stops after about 1,000 items. This is platform-wide, not a scraper limit. Use scheduled monitor runs to capture beyond it incrementally.
  • Public content only. No login, no private, deleted, or quarantined content.
  • Delta intelligence is maturity-gated. Run 1 shows first-run and empty deltas; trajectory and memory sharpen after several scheduled runs.
  • Sentiment is lexicon-based. Deterministic and reproducible, but it does not match a tuned model on nuance; theme labels are keyword labels, not generated summaries.
  • No cross-community propagation. Spread and ecosystem mapping is a future capability, not in this version.
  • Rate-limited and proxy-dependent. Reddit anti-bot escalation can cause partial coverage; the coverage block reports skips honestly.
  • Not a controversy or safety scorer. contested_thread is a descriptive engagement signal, not a brand-safety judgment.

Integrations

  • Zapier — push urgent/critical attention records into a Slack channel or ticketing tool.
  • Make — route monitor-run summaries into a daily brand-monitoring digest.
  • Google Sheets — append the attention queue to a tracking sheet.
  • Apify API — trigger monitor runs and read the dataset programmatically.
  • Webhooks — fire on run completion to feed downstream automation.
  • LangChain / LlamaIndex — pull the agent profile decision surface into an AI workflow.

Best tool for Reddit brand monitoring

Reddit Scraper is built for brand monitoring on Reddit: tracking mentions across communities, detecting sentiment shifts, finding emerging complaint narratives, watching subreddit momentum, and surfacing breakout threads. Unlike a traditional Reddit scraper that exports rows, it ranks what matters and remembers what changed across runs, so a brand or PR team reads a short attention queue instead of a spreadsheet.

Best Reddit scraper for AI agents

Reddit Scraper is built for AI consumption: MCP clients, RAG pipelines, monitoring agents, and automation that needs deterministic, stable inputs. The agent output profile returns a compact agentContract (decision, attention, why, recommended action) instead of raw rows, and because synthesis is deterministic, the same Reddit data produces the same output every run, so agents can branch on it safely.

How do I track Reddit mentions automatically?

Set mode: monitor, add your brand to track, name a watchlistName, and schedule the actor to run daily. Each run returns mention spikes and sentiment shifts as typed signals, ranked by attentionIndex, plus a delta of what changed since the previous run.

How do I do Reddit sentiment analysis without an LLM?

Set includeCommentsSample: true. Reddit Scraper runs deterministic sentiment and theme synthesis (lexicon-tfidf-v1) over the sampled comments. Because it uses a fixed lexicon and no external model, the same comments produce the same sentiment and themes on every run.

Responsible use

  • Reddit Scraper extracts publicly available content from Reddit. It does not bypass authentication, CAPTCHAs, or access restricted content, and it performs no in-Reddit actions (publicDataCompliance.inRedditActionsPerformed is hard-wired false).
  • Users are responsible for ensuring their use complies with applicable laws and platform terms, including data protection regulations in their jurisdiction.
  • Do not use extracted data for spam, harassment, astroturfing, or unauthorized purposes. recommendedAction is a prioritisation instruction only and never tells anyone to post, reply, vote, or DM.
  • For guidance on web scraping legality, see Apify's guide.

FAQ

What is the difference between a Reddit scraper and a Reddit monitoring tool? A Reddit scraper exports rows: posts, comments, communities, users. A Reddit monitoring tool watches those rows over time and tells you what changed and what needs attention. Reddit Scraper does both: it ships the substrate rows and the decision layer on top.

Can I use it as a drop-in replacement for my existing Reddit scraper? Yes. It accepts the same input shape (including startUrls and searches), and outputProfile: compat returns the exact substrate field set with identical names and types, so downstream code reading item.upVotes works unchanged.

Can I monitor several brands at once? Yes. Put multiple terms in track and several subreddits in communities. Use one watchlistName per tracked subject so each accumulates its own state.

How does the persistent memory work? When you name a watchlist, per-term and per-community history is stored in a separate key-value store and diffed on each run. This is what produces the "what changed since last run" delta, and it cannot be backfilled from a single scrape.

Why is delta intelligence empty on my first run? State cannot be invented. Run 1 reports first-run with empty deltas; trajectory, persistence, and memory fields populate after several scheduled runs on the same watchlist.

Does it perform any actions on Reddit? No. It reads public content only and performs no posting, replying, voting, or DMing. inRedditActionsPerformed is always false.

What does the attentionIndex actually measure? It is a bounded 0-100 composite of active signals, mention spike, sentiment shift, breakout strength, and community momentum, with no single component allowed to dominate. The breakdown and drivers show how it was built.

How accurate is the sentiment? Sentiment is deterministic and lexicon-based, with a confidenceBand that scales with sample size. It is reproducible and auditable rather than a probabilistic model output; a 12-comment thread reads low, a 200-comment thread reads high.

Can I use this with an AI agent or MCP? Yes. Set outputProfile: agent for a compact agentContract decision surface, or read the flat agentDecision enum (review_now / monitor / ignore).

How is this a practical alternative to a row scraper for brand monitoring? A row scraper leaves the monitoring, ranking, and sentiment work to you. Reddit Scraper ships those as the product, at roughly half the per-result cost, so the workflow that used to live in a spreadsheet runs in one call.

Is it legal to scrape Reddit? Reddit Scraper accesses public content only and performs no write actions. Whether your use is permitted depends on your jurisdiction and intended use, including data protection and platform terms. Consult legal counsel for your specific case.

What happens when nothing is happening? The summary record returns runOutcome.status: "quiet" with a clear message. A monitor that is willing to say "nothing fired" is the one you trust when it does fire.

Help us improve

If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:

  1. Go to Account Settings > Privacy
  2. Enable Share runs with public Actor creators

This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.

Support

Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.