Reddit Intelligence Scraper
Pricing
from $0.75 / 1,000 results
Reddit Intelligence Scraper
Scrape Reddit posts, comments, user profiles & subreddit analytics at scale. No API key needed. Supports all sort types (hot, new, top, rising). Built for market research, brand monitoring, sentiment analysis & AI/ML datasets. Outputs clean JSON, CSV & Markdown. Pay only per result scraped.
Pricing
from $0.75 / 1,000 results
Rating
5.0
(1)
Developer

Andy Page
Actor stats
2
Bookmarked
7
Total users
2
Monthly active users
9 days ago
Last modified
Categories
Share
Extract Reddit posts, comments, user profiles, and subreddit analytics at scale. No API key required. AI-ready JSON, CSV, and Markdown output formats. Built for market research, brand monitoring, and ML training datasets.
Now with built-in sentiment analysis, engagement scoring, keyword search, and smart filtering.
Features
- Keyword Search — Search Reddit by keyword across all subreddits or within specific ones. Find every discussion about your topic.
- Subreddit Posts — Scrape posts from any subreddit with sorting (hot, new, top, rising, controversial) and time range filtering
- Comment Threads — Extract full comment trees with configurable depth, including hidden "more comments" expansion
- User Profiles — Scrape user karma, account age, post history, and comment history
- Subreddit Analytics — Get subscriber counts, active users, rules, and moderator lists
- Sentiment Analysis — Automatic positive/negative/neutral/mixed scoring on every post and comment (zero API cost)
- Engagement Scoring — Virality scores, engagement rates, controversy detection, and quality signals
- Keyword Filtering — Include/exclude results by keyword after scraping to reduce noise
- Deduplication — Automatic removal of duplicate posts across multi-subreddit scrapes
- Concurrent Scraping — Comments fetched in parallel (5x faster than sequential)
- Multiple Output Formats — JSON (default), CSV (for spreadsheets), Markdown (for RAG/LLM pipelines)
- Built-in Proxy Rotation — Residential proxies included, no extra cost
- Rate Limit Handling — Smart throttling with exponential backoff, no IP bans
How It Works
This actor uses Reddit's public JSON endpoints to extract data. No API key or Reddit account is required. Data is cleaned, structured, enriched with sentiment and engagement scores, and pushed to Apify Dataset in your chosen format.
Input Examples
1. Keyword Search (NEW)
Find all discussions about a topic across Reddit:
{"mode": "search","searchQuery": "best CRM for small business","subreddits": ["smallbusiness", "SaaS", "startups"],"sortBy": "relevance","timeRange": "month","maxPosts": 200,"includeComments": true,"maxCommentDepth": 2}
Search all of Reddit (no subreddit restriction):
{"mode": "search","searchQuery": "\"product review\" iPhone","searchType": "both","sortBy": "top","timeRange": "week","maxPosts": 500}
2. Scrape Subreddit Posts
Extract the top 500 posts from r/technology this month:
{"mode": "subredditPosts","subreddits": ["technology"],"sortBy": "top","timeRange": "month","maxPosts": 500,"includeComments": false,"outputFormat": "json"}
3. Brand Monitoring with Filtering
Monitor mentions by scraping hot posts and comments, filtering for your brand:
{"mode": "subredditPosts","subreddits": ["technology", "gadgets", "apple"],"sortBy": "hot","maxPosts": 100,"includeComments": true,"maxCommentDepth": 2,"filterKeywords": ["iPhone", "Apple Watch", "MacBook"],"excludeKeywords": ["spam", "giveaway"],"outputFormat": "json"}
4. AI Training Dataset
Collect high-quality conversational data in Markdown for LLM fine-tuning:
{"mode": "subredditPosts","subreddits": ["AskHistorians", "explainlikeimfive", "science"],"sortBy": "top","timeRange": "month","maxPosts": 1000,"includeComments": true,"maxCommentDepth": 5,"expandMoreComments": true,"outputFormat": "markdown"}
5. Scrape Comment Threads
Extract all comments from specific Reddit posts with full "more comments" expansion:
{"mode": "commentThreads","postUrls": ["https://www.reddit.com/r/technology/comments/abc123/example_post/"],"maxCommentDepth": 5,"sortComments": "top","expandMoreComments": true}
6. User Profile Research
Scrape profiles and activity history for specific users:
{"mode": "userProfiles","usernames": ["spez", "AutoModerator"],"includeUserPosts": true,"includeUserComments": true,"maxItems": 50}
7. Subreddit Analytics
Get metadata, rules, and moderator lists for subreddits:
{"mode": "subredditAnalytics","subreddits": ["technology", "programming", "datascience"],"includeRules": true,"includeModerators": true}
8. Sentiment-Only Analysis
Scrape posts with sentiment enabled but scoring disabled (lighter output):
{"mode": "search","searchQuery": "customer service experience","subreddits": ["CustomerService", "TalesFromRetail"],"maxPosts": 200,"enableSentiment": true,"enableScoring": false,"includeComments": true}
Output Schema
Post Output (with Sentiment & Engagement)
{"postId": "abc123","subreddit": "technology","title": "AI breakthrough in...","author": "username","score": 4582,"upvoteRatio": 0.94,"numComments": 234,"created": "2026-01-15T10:23:00.000Z","url": "https://example.com/article","permalink": "https://www.reddit.com/r/technology/comments/abc123/...","selftext": "Post body text...","isVideo": false,"mediaUrl": "https://i.redd.it/...","flair": "News","gilded": 2,"awardsReceived": 5,"isNsfw": false,"domain": "example.com","sentiment": {"score": 0.42,"label": "positive","magnitude": 0.65,"wordCount": 156,"scoredWords": 23},"engagement": {"engagementRate": 0.051,"viralityScore": 7.23,"isControversial": false,"qualitySignal": "high"}}
Comment Output (with Sentiment & Engagement)
{"commentId": "def456","postId": "abc123","parentId": "t1_ghi789","author": "commenter","body": "Comment text...","score": 156,"created": "2026-01-15T11:45:00.000Z","depth": 2,"gilded": 0,"edited": false,"replies": 12,"isSubmitter": false,"controversiality": 0,"sentiment": {"score": -0.25,"label": "negative","magnitude": 0.45,"wordCount": 42,"scoredWords": 8},"engagement": {"engagementRate": 0.077,"viralityScore": 4.1,"isControversial": false,"qualitySignal": "medium"}}
Search Result Output
{"postId": "xyz789","subreddit": "smallbusiness","title": "Best CRM recommendations?","author": "username","score": 234,"_searchQuery": "best CRM for small business","_searchScope": "r/smallbusiness","_resultType": "post","_mode": "search","sentiment": { "score": 0.15, "label": "positive", "magnitude": 0.3 },"engagement": { "engagementRate": 0.12, "viralityScore": 5.8, "isControversial": false, "qualitySignal": "medium" }}
User Profile Output
{"username": "reddituser","accountCreated": "2018-03-15T00:00:00.000Z","postKarma": 45234,"commentKarma": 89123,"totalKarma": 134357,"isPremium": false,"isEmployee": false,"recentPosts": ["..."],"recentComments": ["..."]}
Subreddit Analytics Output
{"name": "technology","subscribers": 15234567,"activeUsers": 45234,"created": "2008-01-25T00:00:00.000Z","description": "Subreddit description...","isNsfw": false,"rules": ["..."],"moderators": ["..."]}
Input Parameters Reference
Core Settings
| Parameter | Type | Default | Description |
|---|---|---|---|
mode | enum | subredditPosts | Scraping mode: subredditPosts, commentThreads, userProfiles, subredditAnalytics, search |
subreddits | string[] | — | Subreddit names (without r/ prefix) |
postUrls | string[] | — | Reddit post URLs (for commentThreads mode) |
usernames | string[] | — | Reddit usernames (without u/ prefix) |
searchQuery | string | — | Keyword search query (for search mode). Supports Reddit search syntax. |
searchType | enum | posts | Search result type: posts, comments, both |
Scraping Options
| Parameter | Type | Default | Description |
|---|---|---|---|
sortBy | enum | hot | Sort order: hot, new, top, rising, controversial, relevance, comments |
timeRange | enum | week | Time range for top/controversial: hour, day, week, month, year, all |
maxPosts | integer | 100 | Max posts per subreddit (1-10,000) |
includeComments | boolean | false | Also scrape comments for each post |
maxCommentDepth | integer | 3 | Max comment tree depth (1-10) |
expandMoreComments | boolean | false | Fetch hidden comments behind "more comments" links |
sortComments | enum | confidence | Comment sort: confidence (best), top, new, controversial, old, qa |
maxItems | integer | 100 | Max posts/comments per user profile |
Filtering
| Parameter | Type | Default | Description |
|---|---|---|---|
filterKeywords | string[] | [] | Only keep results containing at least one of these keywords (case-insensitive) |
excludeKeywords | string[] | [] | Remove results containing any of these keywords (case-insensitive) |
Intelligence Features
| Parameter | Type | Default | Description |
|---|---|---|---|
enableSentiment | boolean | true | Add sentiment scoring (positive/negative/neutral/mixed) to posts and comments |
enableScoring | boolean | true | Add engagement metrics (virality, engagement rate, controversy, quality) |
Output & Advanced
| Parameter | Type | Default | Description |
|---|---|---|---|
outputFormat | enum | json | Output format: json, csv, markdown |
proxyConfig | object | Apify residential | Proxy configuration |
Sentiment Analysis
Every post and comment is automatically scored with a built-in lexicon-based sentiment analyzer. No external API calls are made, so there's zero extra cost.
Output fields:
sentiment.score— Normalized score from -1.0 (very negative) to +1.0 (very positive)sentiment.label— Human-readable label:positive,negative,neutral, ormixedsentiment.magnitude— Strength of sentiment regardless of direction (0.0 to 1.0)sentiment.wordCount— Total words analyzedsentiment.scoredWords— Words that contributed to the score
The analyzer handles negation ("not good" = negative), intensifiers ("very good" = more positive), and common Reddit vocabulary.
Engagement Scoring
Each post and comment gets an engagement score to help you identify the most valuable content:
Post engagement fields:
engagement.engagementRate— Comments-to-score ratio (higher = more discussion per upvote)engagement.viralityScore— 0-10 scale based on score, comments, awards, and upvote ratioengagement.isControversial— True if low upvote ratio with meaningful scoreengagement.qualitySignal—high,medium, orlowbased on virality score
Comment engagement fields:
- Same fields, plus depth-adjusted scoring (top-level comments scored higher)
Cost Estimation
This actor uses Pay-Per-Result pricing — you pay a flat rate per result in the dataset:
| Item | Cost |
|---|---|
| Per result (post, comment, search result, profile, or analytics) | $0.00075 |
| Per actor run (one-time start fee) | $0.00005 |
That's $0.75 per 1,000 results.
Examples:
- Scraping 500 posts (no comments): ~$0.38
- Scraping 500 posts with ~50 comments each (25,500 dataset items): ~$19.13
- Searching for a keyword and getting 200 results: ~$0.15
- 10 user profiles: ~$0.0075
Sentiment analysis and engagement scoring are computed locally and add zero extra cost.
Rate Limits and Best Practices
- The actor respects Reddit's rate limits (30 requests/minute with built-in throttling and 2-second minimum delay between requests)
- Comments are now scraped concurrently (5 at a time) for significantly faster runs
- For large scrapes (>1,000 posts), expect ~5 minutes per 1,000 posts without comments
- Including comments significantly increases scraping time (each post requires an additional request)
- Use
maxCommentDepth: 1for faster scrapes when you only need top-level comments - Use
expandMoreComments: trueto get complete comment threads (slower but more thorough) - Residential proxies are used by default for best reliability
- Use
filterKeywordsto reduce noise when scraping broad subreddits - Disable
enableSentimentandenableScoringif you don't need intelligence features (slightly faster)
FAQ
Q: Do I need a Reddit API key? A: No. This actor uses Reddit's public JSON endpoints which don't require authentication.
Q: Can I scrape private/quarantined subreddits? A: No. Only public subreddits are accessible without authentication.
Q: What happens if Reddit rate-limits the actor? A: The actor automatically backs off and retries with exponential delays. This is handled transparently.
Q: Can I schedule recurring scrapes? A: Yes! Use Apify's built-in scheduling to run the actor on a recurring basis (daily, hourly, etc.).
Q: What's the maximum number of posts I can scrape? A: The input accepts up to 10,000 posts per subreddit. Reddit's API typically makes ~1,000 unique posts available per listing, so actual results may be lower for some subreddits.
Q: Does sentiment analysis use an external API? A: No. Sentiment analysis uses a built-in lexicon (AFINN-style word list) that runs entirely locally. There are no external API calls and no extra cost.
Q: What's the difference between filterKeywords and searchQuery?
A: searchQuery (search mode) tells Reddit's search engine what to find. filterKeywords is a post-processing filter applied after scraping to further narrow results. You can use both together for maximum precision.
Q: How does "Expand Hidden Comments" work?
A: Reddit hides many comments behind "more comments" links. When expandMoreComments is enabled, the actor makes additional API calls to fetch these hidden comments, resulting in more complete threads but slower scraping.
Use Cases
- Market Research — Track product discussions, competitor mentions, and industry trends
- Brand Monitoring — Monitor your brand mentions across Reddit with sentiment scoring
- AI/ML Training Data — Collect high-quality conversational data for LLM fine-tuning
- Lead Generation — Find potential customers discussing problems your product solves
- Academic Research — Study online communities and discourse patterns
- Content Strategy — Discover trending topics and what resonates with audiences
- Competitive Intelligence — Monitor competitor discussions and user sentiment
- PR Crisis Detection — Get early warnings on negative sentiment spikes
- Product Feedback Mining — Extract feature requests and complaints from Reddit threads
- Investment Research — Monitor sentiment around stocks, crypto, and markets
Support
If you encounter any issues, please open an issue on this actor's page or reach out via Apify community channels. We typically respond within 24 hours.
