Reddit Post & Modqueue Scraper — MCP, Community Intel + AI avatar

Reddit Post & Modqueue Scraper — MCP, Community Intel + AI

Pricing

Pay per usage

Go to Apify Store
Reddit Post & Modqueue Scraper — MCP, Community Intel + AI

Reddit Post & Modqueue Scraper — MCP, Community Intel + AI

RAG-ready Reddit feed with 99%+ run success — posts, comments, modqueue, sentiment at $0.002/post, no OAuth. MCP-compatible alternative to Brandwatch/Sprout/Mention ($4 vs $249/mo). AI digest across 5 LLM providers. Trudax-grade with bonus moderation. x402-ready.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Nick

Nick

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 hours ago

Last modified

Share

Reddit Scraper — Posts, Comments, Modqueue & AI Sentiment Analysis

Scrape Reddit posts, comments, and moderation queues from any subreddit or search query — no OAuth required for public data. Extract titles, scores, upvote ratios, comment threads, flair, virality scores, and full post metadata with a single run. Add an optional built-in LLM analyst to get sentiment scores, theme detection, community health assessment, emerging trends, and actionable recommendations — all in one dataset.

What it does

Unlike every other Reddit scraper on Apify, this one covers three distinct use cases in a single actor: subreddit browsing, site-wide search, and a moderation queue tier for community managers who need to export modqueue / reports / spam / unmoderated feeds into Apify dashboards and triage workflows.

This actor connects to Reddit's public HTML endpoints and (optionally) Reddit's authenticated JSON API to extract structured data. For subreddit and search modes, no Reddit account or API key is needed — the actor appends .json to Reddit URLs and parses the response. For modqueue mode, the actor exchanges your moderator OAuth credentials for a user-scope token and calls Reddit's private moderation endpoints.

Scraping modes:

  • Subreddit mode — Browse any public subreddit sorted by hot, top, new, or rising with configurable time filters and pagination up to 100 posts per run
  • Search mode — Search across all of Reddit (or scoped to a subreddit) for posts matching any keyword, phrase, or topic with relevance and time-based sorting
  • Moderation Queue mode — Community managers and subreddit mods can pipe the modqueue, reports, spam, or unmoderated feeds into Apify for triage dashboards, escalation workflows, or long-term moderation analytics

Data enrichment:

  • Optional comment extraction pulls top-level comments for each post including author, score, and body — and as a side effect enriches each post with selftext (full post body) and upvote_ratio from the post-detail page
  • Optional Reddit OAuth mode (useJsonApi: true) unlocks richer metadata on every post (upvote_ratio, subreddit_subscribers, awards_count, author_flair, edited, distinguished, num_crossposts) and increases rate limits from ~30 req/min to ~100 req/min

AI analysis:

When enableAiAnalysis is enabled, the actor sends scraped post titles and metadata to your chosen LLM provider and returns a structured analysis item with overall sentiment scores, top discussion themes, emerging trends, common complaints, common praises, community health score, most engaging posts, and actionable recommendations — all as structured JSON in the same dataset as the raw posts.

Features

  • No OAuth required by default — Uses Reddit's public HTML endpoints; no Reddit account, no API keys, no app registration needed for subreddit and search modes
  • Subreddit scraping — Browse any public subreddit by hot, top, new, or rising posts with configurable time filters and pagination
  • Reddit search — Search across all of Reddit or scoped to a specific subreddit for posts matching any keyword, phrase, or topic
  • Moderation queue mode — Community managers and subreddit mods can pipe the modqueue, reports, spam, or unmoderated feeds into Apify for triage dashboards, escalation workflows, or long-term moderation analytics. Each item carries item_type (post/comment), reported, num_reports, mod_reports[], user_reports[], removal_reason, banned_by, approved_by, ignore_reports, and spam metadata. Requires a moderator-scope OAuth token
  • Comment extraction — Optionally scrape top-level comments for each post including author, score, and body text. Also enriches each post with selftext and upvote_ratio from the post-detail page — fields not available in listing markup
  • Full post metadata — title, author, score, upvote ratio, comment count, flair, awards, NSFW flag, stickied, spoiler, locked, archived, is_self, is_video flags, creation date, permalink, content text, and a derived virality_score (score normalized by age in hours, useful for ranking trending posts)
  • AI sentiment analysis — Automatically analyze scraped posts to identify overall community sentiment with positive, negative, and neutral scores
  • Theme and trend detection — AI identifies the top discussion themes, emerging trends, common complaints, and common praises across the scraped posts
  • Community health scoring — AI-generated community health assessment based on engagement patterns, discussion quality, and upvote ratios
  • Multi-LLM provider support — OpenRouter (recommended — 300+ models), Anthropic (Claude), Google AI (Gemini), OpenAI (GPT), or Ollama (self-hosted) for AI analysis
  • Optional Reddit OAuth mode — Bring your own Reddit app credentials (useJsonApi: true) to unlock richer metadata and ~3x higher rate limits. Falls back to HTML scraping on auth failure so a bad token never breaks a run
  • Rate-limit compliant — Built-in 2-second delay between requests (0.65s in OAuth mode) and a descriptive User-Agent to comply with Reddit's public API guidelines
  • Pay-per-event pricing — You pay only for what you scrape: $0.002 per post, $0.001 per comment, $0.01 per modqueue item, $0.05 per AI analysis

Use Cases

Market Research and Consumer Insights

Monitor what consumers are saying about products, brands, or industries in relevant subreddits. Track sentiment shifts over time by scheduling recurring scrapes of subreddits like r/technology, r/gadgets, or industry-specific communities. Identify unmet needs and feature requests that surface in user discussions. The AI analysis highlights the most common themes and complaints from hundreds of posts without manual reading.

Brand and Reputation Management

Track mentions of your brand, products, or competitors across Reddit. AI analysis highlights common complaints and praises, giving you an early warning system for reputation issues. Combine subreddit monitoring with Reddit search to capture discussions that happen outside your primary communities. Schedule hourly or daily runs to catch emerging threads before they go viral.

Product Teams and UX Research

Scrape subreddits where your users congregate to understand their pain points, feature requests, and workflow challenges. The structured comment data lets you build qualitative research datasets without manual copy-pasting. Use AI analysis to quickly surface the most common themes from hundreds of posts. Route the dataset to Google Sheets or Airtable via Apify integrations for collaborative tagging.

Community Management and Moderation

Assess the health and engagement levels of subreddits you manage or participate in. Community health scoring evaluates discussion quality, engagement levels, and overall mood. Track how community sentiment changes after product launches, policy changes, or major announcements.

Moderation Queue tier: mods of a subreddit can point the actor at their modqueue, reports, spam, or unmoderated feed to export items awaiting action into Apify. Typical uses: (1) build a triage dashboard in Retool / Metabase / Sheets that surfaces high-report-count items to senior mods first, (2) archive modqueue snapshots for audit trails, (3) escalate repeatedly-reported users via webhook, (4) cross-reference report volume against post themes using the same AI analysis layer.

Content Creation and Marketing

Discover what topics are trending in your niche by scraping relevant subreddits sorted by hot or rising posts. Identify high-engagement content formats and topics that resonate with your target audience. Use AI trend analysis to plan content calendars around emerging discussions. The most-engaging-posts field in the AI output shows which formats and angles drive the most comments.

Investment Research and Due Diligence

Monitor subreddits like r/investing, r/wallstreetbets, r/stocks, or industry-specific communities for sentiment around companies, sectors, or market events. AI sentiment scoring provides quantitative data that can complement financial analysis. Track how community mood shifts around earnings reports, product launches, or regulatory changes.

Academic and Social Research

Build datasets for social science research on community dynamics, information propagation, or platform culture. The actor exports clean JSON, CSV, or Excel — compatible with pandas, R, and standard research toolchains. The virality_score field provides a normalized engagement signal useful for comparative analysis across subreddits of different sizes.

Input

ParameterTypeDefaultDescription
modeselectsubredditScraping mode: subreddit (browse a specific subreddit), search (search across Reddit), or modqueue (fetch moderation queue — requires mod auth)
subredditstringartificialSubreddit name without the r/ prefix (e.g. "python", "webdev", "startups"). Required for subreddit mode
searchQuerystringSearch query to find posts across Reddit. Required for search mode
searchSubredditstringallScope the search to a specific subreddit. Leave as all to search site-wide. Scoping to a specific subreddit is far more reliable because Reddit's bot filters are aggressive on the site-wide /search endpoint
sortByselecthotSort order: hot (trending), top (highest scored), new (most recent), rising (gaining traction), relevance (search mode only)
timeFilterselectweekTime period filter for top and relevance sorting: hour, day, week, month, year, all
maxPostsinteger25Maximum posts to scrape (1–100). Higher values paginate across multiple Reddit pages
includeCommentsbooleanfalseScrape top comments for each post. Also enriches each post with selftext and upvote_ratio. Adds ~2–3s per post
maxCommentsPerPostinteger10Maximum top-level comments per post (1–50). Only used when includeComments is enabled
enableAiAnalysisbooleanfalseGenerate AI sentiment and trend analysis. Requires an LLM API key
llmProviderselectopenrouterAI provider: openrouter (recommended), anthropic (Claude), google (Gemini), openai (GPT), or ollama (self-hosted)
llmModelstring(auto)Override the default model. Leave empty for provider default
openrouterApiKeystringOpenRouter API key. Required when using OpenRouter provider. Get one at openrouter.ai/keys
anthropicApiKeystringAnthropic API key. Get one at console.anthropic.com
googleApiKeystringGoogle AI (Gemini) API key. Get one at aistudio.google.com/app/apikey
openaiApiKeystringOpenAI API key. Get one at platform.openai.com/api-keys
ollamaBaseUrlstringhttp://localhost:11434Base URL for your local Ollama instance
useJsonApibooleanfalseOpt in to Reddit's authenticated JSON API. Requires redditClientId + redditClientSecret. Unlocks richer metadata and ~3x higher rate limit
redditClientIdstringReddit app client ID from reddit.com/prefs/apps. Required when useJsonApi is enabled or mode=modqueue
redditClientSecretstringReddit app secret. Required when useJsonApi is enabled or mode=modqueue
redditUsernamestringModerator Reddit username for mode=modqueue password-grant auth
redditPasswordstringModerator Reddit password. Use an app password if 2FA is enabled
redditRefreshTokenstringPre-obtained Reddit OAuth refresh token (preferred for modqueue production pipelines)
modqueueFeedselectmodqueueModqueue mode only — which feed to fetch: modqueue (all pending), reports, spam, unmoderated
modqueueOnlyselect(both)Modqueue mode only — narrow to links (posts) or comments. Empty means both
proxyConfigurationobjectResidentialProxy settings. Residential proxies are strongly recommended for Reddit at scale

Cost (pay-per-event pricing):

EventPrice
Post scraped$0.002
Comment scraped$0.001
Modqueue item scraped$0.01
AI analysis completed$0.05

Plus Apify compute costs and your LLM provider's charges (typically $0.001–0.01 per analysis with OpenRouter + Gemini Flash). A typical 25-post run with AI analysis costs roughly $0.10 all-in.

Output

Each post is pushed as an individual dataset item:

{
"post_id": "1abc123",
"title": "What AI tools are you using in your workflow?",
"author": "techuser42",
"subreddit": "artificial",
"subreddit_subscribers": 428512,
"score": 342,
"upvote_ratio": 0.94,
"num_comments": 87,
"created_utc": "2026-04-10T14:30:00+00:00",
"url": "https://www.reddit.com/r/artificial/comments/1abc123/...",
"permalink": "https://www.reddit.com/r/artificial/comments/1abc123/...",
"selftext": "I've been experimenting with several AI tools for my daily work...",
"is_self": true,
"is_video": false,
"flair": "Discussion",
"author_flair": "Senior Dev",
"awards_count": 2,
"over_18": false,
"stickied": false,
"spoiler": false,
"locked": false,
"archived": false,
"edited": false,
"distinguished": "",
"num_crossposts": 3,
"domain": "self.artificial",
"virality_score": 28.5,
"scraped_at": "2026-04-10T16:00:00+00:00",
"comments": [
{
"comment_id": "def456",
"author": "devpro",
"body": "I've been using Cursor for code and it's been a game changer...",
"score": 89,
"created_utc": "2026-04-10T15:00:00+00:00"
}
]
}

Fields subreddit_subscribers, author_flair, edited, distinguished, and num_crossposts are fully populated when useJsonApi: true is set. In HTML mode they default to 0 / "" / false so the schema stays consistent across both paths.

When AI analysis is enabled, a summary item is pushed at the end of the dataset:

{
"type": "summary",
"source": "r/artificial",
"mode": "subreddit",
"sort_by": "hot",
"posts_found": 25,
"ai_analysis": {
"top_themes": ["AI coding tools", "LLM comparisons", "AI ethics"],
"sentiment": {
"positive_score": 55,
"negative_score": 15,
"neutral_score": 30,
"summary": "Generally optimistic community with excitement about new tools"
},
"most_engaging_posts": [
{
"title": "What AI tools are you using in your workflow?",
"score": 342,
"num_comments": 87,
"why_engaging": "Practical discussion that invites personal experience sharing"
}
],
"common_complaints": ["API pricing too high", "Model hallucinations"],
"common_praises": ["Productivity improvements", "Open source model quality"],
"emerging_trends": ["Local LLM hosting", "AI agents for automation"],
"community_health": {
"score": 8,
"engagement_level": "high",
"discussion_quality": "high",
"summary": "Active community with constructive discussions"
},
"recommendations": [
"Share practical tool comparisons for high engagement",
"Address pricing concerns with cost-optimization guides"
]
},
"generated_at": "2026-04-10T16:00:00+00:00"
}

Moderation queue items (mode=modqueue) include full report metadata:

{
"type": "modqueue_item",
"item_type": "post",
"item_id": "1xy9abc",
"title": "Is this a legit discount?",
"author": "someuser123",
"subreddit": "yoursubreddit",
"permalink": "https://www.reddit.com/r/yoursubreddit/comments/1xy9abc/...",
"score": -3,
"num_reports": 4,
"reported": true,
"mod_reports": [{"reason": "spam link", "moderator": "ModAlice"}],
"user_reports": [{"reason": "misleading title", "count": 2}, {"reason": "spam", "count": 2}],
"removal_reason": "",
"banned_by": "",
"approved_by": "",
"ignore_reports": false,
"spam": false,
"scraped_at": "2026-04-22T15:00:00+00:00"
}

Output is available as JSON, CSV, or Excel via Apify's dataset export. Use Apify integrations to send results to Google Sheets, Slack, webhooks, or any downstream system.

Quick Start

Scrape hot posts from a subreddit with AI analysis (no OAuth needed):

{
"mode": "subreddit",
"subreddit": "artificial",
"sortBy": "hot",
"maxPosts": 25,
"includeComments": false,
"enableAiAnalysis": true,
"openrouterApiKey": "sk-or-..."
}

Sentiment Deep-Dive — Posts + Threaded Comments

Pulls top posts plus full threaded comments (1 post + N comments per thread) for sentiment / discourse analysis. Comments charge $0.001 each, so a 10-post run with 50 comments/post is $0.55, ideal for community-research projects.

{
"subreddits": ["MachineLearning"],
"sortBy": "top",
"timeRange": "week",
"maxPostsPerSubreddit": 10,
"includeComments": true,
"maxCommentsPerPost": 50,
"enableAiAnalysis": true,
"aiProvider": "openrouter"
}

Per run cost: 10 posts × $0.002 + 500 comments × $0.001 + 1 AI summary × $0.05 = $0.57. Equivalent to ~3 minutes of an SMM analyst's time at $50/hr — and you get a structured JSON of every comment with sentiment scores.

Best for: academic research, community-pulse reports, brand-mention deep-dives, modqueue analysis, product-launch reaction tracking.

Search Reddit for a topic with comments:

{
"mode": "search",
"searchQuery": "best project management tools",
"sortBy": "relevance",
"timeFilter": "month",
"maxPosts": 50,
"includeComments": true,
"maxCommentsPerPost": 10,
"enableAiAnalysis": true,
"openrouterApiKey": "sk-or-..."
}

Raw data collection without AI (zero API keys required):

{
"mode": "subreddit",
"subreddit": "webdev",
"sortBy": "top",
"timeFilter": "week",
"maxPosts": 100,
"includeComments": true,
"maxCommentsPerPost": 5,
"enableAiAnalysis": false
}

Reddit OAuth mode — richer metadata, ~3x higher rate limit:

Register a Reddit app at https://www.reddit.com/prefs/apps (select "script" type — simplest). Copy the client ID and secret.

{
"mode": "subreddit",
"subreddit": "startups",
"sortBy": "top",
"timeFilter": "week",
"maxPosts": 100,
"useJsonApi": true,
"redditClientId": "YOUR_REDDIT_APP_CLIENT_ID",
"redditClientSecret": "YOUR_REDDIT_APP_SECRET",
"enableAiAnalysis": true,
"openrouterApiKey": "sk-or-..."
}

OAuth mode falls back to HTML scraping automatically if credentials are invalid, so a bad token never breaks a run.

Moderation Queue mode — community managers:

{
"mode": "modqueue",
"subreddit": "yoursubreddit",
"modqueueFeed": "modqueue",
"maxPosts": 100,
"redditClientId": "YOUR_REDDIT_APP_CLIENT_ID",
"redditClientSecret": "YOUR_REDDIT_APP_SECRET",
"redditUsername": "your_mod_username",
"redditPassword": "your_mod_password"
}

Preferred for production — use a refresh token (no password stored):

{
"mode": "modqueue",
"subreddit": "yoursubreddit",
"modqueueFeed": "reports",
"maxPosts": 50,
"redditClientId": "YOUR_REDDIT_APP_CLIENT_ID",
"redditClientSecret": "YOUR_REDDIT_APP_SECRET",
"redditRefreshToken": "YOUR_PRE_OBTAINED_REFRESH_TOKEN"
}

Modqueue OAuth setup (one-time):

  1. Go to https://www.reddit.com/prefs/apps and click "create another app..."
  2. For the simplest path choose "script" — list the moderator Reddit account as a developer. For production choose "web app" and mint a refresh token via the authorization code flow
  3. The moderator account must have posts or access moderator permissions on the target subreddit
  4. Script apps: pass redditClientId, redditClientSecret, redditUsername, redditPassword as inputs (or matching env vars REDDIT_CLIENT_ID, REDDIT_CLIENT_SECRET, REDDIT_USERNAME, REDDIT_PASSWORD)
  5. Web apps: generate a refresh token once via the code flow with scopes read modposts, then pass redditClientId, redditClientSecret, redditRefreshToken
  6. Modqueue mode fails fast with a clear status message if credentials are missing or rejected — it does NOT silently fall back to HTML

Troubleshooting

Getting 403 or empty results from subreddit/search mode Reddit blocks requests from datacenter IP ranges at scale. Ensure the proxy configuration uses Residential proxies (the default prefill). Without residential proxies, most requests return 403.

Search returns fewer results than expected Reddit search is notoriously inconsistent for unauthenticated scrapers. Try sort: "new" or sort: "top" instead of relevance. Very new posts may not be indexed yet. For comprehensive coverage, scrape the subreddit directly with mode: "subreddit" and filter by keyword in post-processing. Scoping search to a specific subreddit (searchSubreddit: "python") is far more reliable than site-wide search.

upvote_ratio is null on some posts Reddit only exposes upvote ratio on the post-detail page, not on listing pages (HTML path). Two fixes: (1) set includeComments: true — the actor fetches the detail page for each post and populates upvote_ratio and selftext as a side effect; or (2) enable useJsonApi: true with your Reddit app credentials to get upvote_ratio on every post from the authenticated JSON API with no extra requests.

Posts return empty body text (selftext) Reddit posts can be link posts (no text), image posts, or video posts. The selftext field is empty for link posts by design. Check the is_self field: true = text post, false = link/image/video post.

Modqueue returns 0 items or HTTP 403 Modqueue mode requires you to be authenticated as a moderator of the target subreddit. Verify: (1) the Reddit app credentials are correct, (2) the moderator account has posts or access permissions on that subreddit (check reddit.com/r/SUBREDDIT/about/moderators), (3) for script apps the moderator account is listed as a developer on the app.

Modqueue fails with auth error despite correct credentials If 2FA is enabled on the moderator account, use an app password (not the account password) for the redditPassword field. Generate one at reddit.com/prefs/apps under "authorized apps." For production pipelines, mint a refresh token once via the code flow and use redditRefreshToken instead of username/password.

Comments load but are empty or truncated Some subreddits hide comment trees until users are logged in, or rate-limit comment fetches. Set includeComments: false if you only need post metadata, or add a residential proxy configuration to bypass comment-level rate limits. Comment bodies are limited to 500 characters and post selftext to 1000 characters to keep dataset sizes practical.

AI analysis returns an error about missing API key Make sure you have set both the llmProvider field and the corresponding API key field (e.g., openrouterApiKey for OpenRouter). API keys can also be provided as environment variables: OPENROUTER_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY, OPENAI_API_KEY. Ollama does not require an API key; set ollamaBaseUrl to your local instance URL.

Actor runs slowly with comments enabled Each post with comments requires an additional detail-page request subject to the 2-second rate limit delay. 100 posts with comments = ~3–4 minutes minimum. Enable OAuth mode (useJsonApi: true) to reduce the delay to 0.65 seconds per request (~3x faster). Alternatively, reduce maxCommentsPerPost or set includeComments: false for the initial exploratory run.

This actor scrapes publicly available data from Reddit. By using this actor, you agree to the following:

  • Your responsibility: You are solely responsible for ensuring your use complies with all applicable laws, regulations, and Reddit's terms of service and API terms. This includes but is not limited to GDPR (EU), CCPA (California), and other data protection laws in your jurisdiction.
  • Reddit API Terms: This actor uses Reddit's public JSON endpoints and HTML interface. Review Reddit's API Terms of Use (https://www.reddit.com/wiki/api-terms) and ensure your use case is compliant. Automated access to Reddit is subject to their acceptable use policies.
  • No legal advice: This actor does not constitute legal advice. Consult a qualified attorney if you have questions about the legality of your specific use case.
  • Intended use: This actor is designed for legitimate business purposes such as market research, competitive analysis, brand monitoring, and academic research using publicly accessible data.
  • Data handling: You are responsible for how you store, process, and share any data collected. Ensure you have a lawful basis for processing any personal data (such as usernames) under applicable privacy laws.
  • Rate limiting: This actor implements polite crawling practices including a 2-second delay between requests and a descriptive User-Agent header to minimize impact on Reddit's servers.
  • No warranty: This actor is provided "as is" without warranty. Data accuracy depends on Reddit's content and API availability.
  • Content copyright: Reddit posts and comments are authored by their respective users. Do not republish scraped content without considering the original authors' rights and Reddit's content policy.
  • Personal data notice: Reddit posts and comments contain usernames which may constitute personal data under GDPR. Ensure you have a lawful basis for processing. Do not use extracted data for unsolicited contact, doxxing, or harassment of Reddit users.
  • Public subreddits only: This actor can only access public subreddits. Private, quarantined, or restricted communities may return empty results or errors. The modqueue mode requires explicit moderator authorization and cannot access moderation data for subreddits where the authenticated account is not a moderator.
  • Google News Monitor — Combine Reddit community sentiment with news media coverage for a complete picture. Track what the media says alongside what Reddit communities discuss about your brand or industry.
  • YouTube Scraper — Scrape YouTube creator content and video metadata for cross-platform social media analysis. Compare Reddit thread discussions with YouTube audience reactions to the same topic.
  • ProductHunt Scraper — Monitor product launches and community feedback on ProductHunt alongside subreddit discussions (r/SideProject, r/startups, r/programming) for a multi-platform launch sentiment view.