Reddit Intelligence Scraper avatar

Reddit Intelligence Scraper

Pricing

from $0.75 / 1,000 results

Go to Apify Store
Reddit Intelligence Scraper

Reddit Intelligence Scraper

Scrape Reddit posts, comments, user profiles & subreddit analytics at scale. No API key needed. Supports all sort types (hot, new, top, rising). Built for market research, brand monitoring, sentiment analysis & AI/ML datasets. Outputs clean JSON, CSV & Markdown. Pay only per result scraped.

Pricing

from $0.75 / 1,000 results

Rating

5.0

(1)

Developer

Andy Page

Andy Page

Maintained by Community

Actor stats

2

Bookmarked

7

Total users

2

Monthly active users

9 days ago

Last modified

Share

Extract Reddit posts, comments, user profiles, and subreddit analytics at scale. No API key required. AI-ready JSON, CSV, and Markdown output formats. Built for market research, brand monitoring, and ML training datasets.

Now with built-in sentiment analysis, engagement scoring, keyword search, and smart filtering.

Features

  • Keyword Search — Search Reddit by keyword across all subreddits or within specific ones. Find every discussion about your topic.
  • Subreddit Posts — Scrape posts from any subreddit with sorting (hot, new, top, rising, controversial) and time range filtering
  • Comment Threads — Extract full comment trees with configurable depth, including hidden "more comments" expansion
  • User Profiles — Scrape user karma, account age, post history, and comment history
  • Subreddit Analytics — Get subscriber counts, active users, rules, and moderator lists
  • Sentiment Analysis — Automatic positive/negative/neutral/mixed scoring on every post and comment (zero API cost)
  • Engagement Scoring — Virality scores, engagement rates, controversy detection, and quality signals
  • Keyword Filtering — Include/exclude results by keyword after scraping to reduce noise
  • Deduplication — Automatic removal of duplicate posts across multi-subreddit scrapes
  • Concurrent Scraping — Comments fetched in parallel (5x faster than sequential)
  • Multiple Output Formats — JSON (default), CSV (for spreadsheets), Markdown (for RAG/LLM pipelines)
  • Built-in Proxy Rotation — Residential proxies included, no extra cost
  • Rate Limit Handling — Smart throttling with exponential backoff, no IP bans

How It Works

This actor uses Reddit's public JSON endpoints to extract data. No API key or Reddit account is required. Data is cleaned, structured, enriched with sentiment and engagement scores, and pushed to Apify Dataset in your chosen format.

Input Examples

1. Keyword Search (NEW)

Find all discussions about a topic across Reddit:

{
"mode": "search",
"searchQuery": "best CRM for small business",
"subreddits": ["smallbusiness", "SaaS", "startups"],
"sortBy": "relevance",
"timeRange": "month",
"maxPosts": 200,
"includeComments": true,
"maxCommentDepth": 2
}

Search all of Reddit (no subreddit restriction):

{
"mode": "search",
"searchQuery": "\"product review\" iPhone",
"searchType": "both",
"sortBy": "top",
"timeRange": "week",
"maxPosts": 500
}

2. Scrape Subreddit Posts

Extract the top 500 posts from r/technology this month:

{
"mode": "subredditPosts",
"subreddits": ["technology"],
"sortBy": "top",
"timeRange": "month",
"maxPosts": 500,
"includeComments": false,
"outputFormat": "json"
}

3. Brand Monitoring with Filtering

Monitor mentions by scraping hot posts and comments, filtering for your brand:

{
"mode": "subredditPosts",
"subreddits": ["technology", "gadgets", "apple"],
"sortBy": "hot",
"maxPosts": 100,
"includeComments": true,
"maxCommentDepth": 2,
"filterKeywords": ["iPhone", "Apple Watch", "MacBook"],
"excludeKeywords": ["spam", "giveaway"],
"outputFormat": "json"
}

4. AI Training Dataset

Collect high-quality conversational data in Markdown for LLM fine-tuning:

{
"mode": "subredditPosts",
"subreddits": ["AskHistorians", "explainlikeimfive", "science"],
"sortBy": "top",
"timeRange": "month",
"maxPosts": 1000,
"includeComments": true,
"maxCommentDepth": 5,
"expandMoreComments": true,
"outputFormat": "markdown"
}

5. Scrape Comment Threads

Extract all comments from specific Reddit posts with full "more comments" expansion:

{
"mode": "commentThreads",
"postUrls": [
"https://www.reddit.com/r/technology/comments/abc123/example_post/"
],
"maxCommentDepth": 5,
"sortComments": "top",
"expandMoreComments": true
}

6. User Profile Research

Scrape profiles and activity history for specific users:

{
"mode": "userProfiles",
"usernames": ["spez", "AutoModerator"],
"includeUserPosts": true,
"includeUserComments": true,
"maxItems": 50
}

7. Subreddit Analytics

Get metadata, rules, and moderator lists for subreddits:

{
"mode": "subredditAnalytics",
"subreddits": ["technology", "programming", "datascience"],
"includeRules": true,
"includeModerators": true
}

8. Sentiment-Only Analysis

Scrape posts with sentiment enabled but scoring disabled (lighter output):

{
"mode": "search",
"searchQuery": "customer service experience",
"subreddits": ["CustomerService", "TalesFromRetail"],
"maxPosts": 200,
"enableSentiment": true,
"enableScoring": false,
"includeComments": true
}

Output Schema

Post Output (with Sentiment & Engagement)

{
"postId": "abc123",
"subreddit": "technology",
"title": "AI breakthrough in...",
"author": "username",
"score": 4582,
"upvoteRatio": 0.94,
"numComments": 234,
"created": "2026-01-15T10:23:00.000Z",
"url": "https://example.com/article",
"permalink": "https://www.reddit.com/r/technology/comments/abc123/...",
"selftext": "Post body text...",
"isVideo": false,
"mediaUrl": "https://i.redd.it/...",
"flair": "News",
"gilded": 2,
"awardsReceived": 5,
"isNsfw": false,
"domain": "example.com",
"sentiment": {
"score": 0.42,
"label": "positive",
"magnitude": 0.65,
"wordCount": 156,
"scoredWords": 23
},
"engagement": {
"engagementRate": 0.051,
"viralityScore": 7.23,
"isControversial": false,
"qualitySignal": "high"
}
}

Comment Output (with Sentiment & Engagement)

{
"commentId": "def456",
"postId": "abc123",
"parentId": "t1_ghi789",
"author": "commenter",
"body": "Comment text...",
"score": 156,
"created": "2026-01-15T11:45:00.000Z",
"depth": 2,
"gilded": 0,
"edited": false,
"replies": 12,
"isSubmitter": false,
"controversiality": 0,
"sentiment": {
"score": -0.25,
"label": "negative",
"magnitude": 0.45,
"wordCount": 42,
"scoredWords": 8
},
"engagement": {
"engagementRate": 0.077,
"viralityScore": 4.1,
"isControversial": false,
"qualitySignal": "medium"
}
}

Search Result Output

{
"postId": "xyz789",
"subreddit": "smallbusiness",
"title": "Best CRM recommendations?",
"author": "username",
"score": 234,
"_searchQuery": "best CRM for small business",
"_searchScope": "r/smallbusiness",
"_resultType": "post",
"_mode": "search",
"sentiment": { "score": 0.15, "label": "positive", "magnitude": 0.3 },
"engagement": { "engagementRate": 0.12, "viralityScore": 5.8, "isControversial": false, "qualitySignal": "medium" }
}

User Profile Output

{
"username": "reddituser",
"accountCreated": "2018-03-15T00:00:00.000Z",
"postKarma": 45234,
"commentKarma": 89123,
"totalKarma": 134357,
"isPremium": false,
"isEmployee": false,
"recentPosts": ["..."],
"recentComments": ["..."]
}

Subreddit Analytics Output

{
"name": "technology",
"subscribers": 15234567,
"activeUsers": 45234,
"created": "2008-01-25T00:00:00.000Z",
"description": "Subreddit description...",
"isNsfw": false,
"rules": ["..."],
"moderators": ["..."]
}

Input Parameters Reference

Core Settings

ParameterTypeDefaultDescription
modeenumsubredditPostsScraping mode: subredditPosts, commentThreads, userProfiles, subredditAnalytics, search
subredditsstring[]Subreddit names (without r/ prefix)
postUrlsstring[]Reddit post URLs (for commentThreads mode)
usernamesstring[]Reddit usernames (without u/ prefix)
searchQuerystringKeyword search query (for search mode). Supports Reddit search syntax.
searchTypeenumpostsSearch result type: posts, comments, both

Scraping Options

ParameterTypeDefaultDescription
sortByenumhotSort order: hot, new, top, rising, controversial, relevance, comments
timeRangeenumweekTime range for top/controversial: hour, day, week, month, year, all
maxPostsinteger100Max posts per subreddit (1-10,000)
includeCommentsbooleanfalseAlso scrape comments for each post
maxCommentDepthinteger3Max comment tree depth (1-10)
expandMoreCommentsbooleanfalseFetch hidden comments behind "more comments" links
sortCommentsenumconfidenceComment sort: confidence (best), top, new, controversial, old, qa
maxItemsinteger100Max posts/comments per user profile

Filtering

ParameterTypeDefaultDescription
filterKeywordsstring[][]Only keep results containing at least one of these keywords (case-insensitive)
excludeKeywordsstring[][]Remove results containing any of these keywords (case-insensitive)

Intelligence Features

ParameterTypeDefaultDescription
enableSentimentbooleantrueAdd sentiment scoring (positive/negative/neutral/mixed) to posts and comments
enableScoringbooleantrueAdd engagement metrics (virality, engagement rate, controversy, quality)

Output & Advanced

ParameterTypeDefaultDescription
outputFormatenumjsonOutput format: json, csv, markdown
proxyConfigobjectApify residentialProxy configuration

Sentiment Analysis

Every post and comment is automatically scored with a built-in lexicon-based sentiment analyzer. No external API calls are made, so there's zero extra cost.

Output fields:

  • sentiment.score — Normalized score from -1.0 (very negative) to +1.0 (very positive)
  • sentiment.label — Human-readable label: positive, negative, neutral, or mixed
  • sentiment.magnitude — Strength of sentiment regardless of direction (0.0 to 1.0)
  • sentiment.wordCount — Total words analyzed
  • sentiment.scoredWords — Words that contributed to the score

The analyzer handles negation ("not good" = negative), intensifiers ("very good" = more positive), and common Reddit vocabulary.

Engagement Scoring

Each post and comment gets an engagement score to help you identify the most valuable content:

Post engagement fields:

  • engagement.engagementRate — Comments-to-score ratio (higher = more discussion per upvote)
  • engagement.viralityScore — 0-10 scale based on score, comments, awards, and upvote ratio
  • engagement.isControversial — True if low upvote ratio with meaningful score
  • engagement.qualitySignalhigh, medium, or low based on virality score

Comment engagement fields:

  • Same fields, plus depth-adjusted scoring (top-level comments scored higher)

Cost Estimation

This actor uses Pay-Per-Result pricing — you pay a flat rate per result in the dataset:

ItemCost
Per result (post, comment, search result, profile, or analytics)$0.00075
Per actor run (one-time start fee)$0.00005

That's $0.75 per 1,000 results.

Examples:

  • Scraping 500 posts (no comments): ~$0.38
  • Scraping 500 posts with ~50 comments each (25,500 dataset items): ~$19.13
  • Searching for a keyword and getting 200 results: ~$0.15
  • 10 user profiles: ~$0.0075

Sentiment analysis and engagement scoring are computed locally and add zero extra cost.

Rate Limits and Best Practices

  • The actor respects Reddit's rate limits (30 requests/minute with built-in throttling and 2-second minimum delay between requests)
  • Comments are now scraped concurrently (5 at a time) for significantly faster runs
  • For large scrapes (>1,000 posts), expect ~5 minutes per 1,000 posts without comments
  • Including comments significantly increases scraping time (each post requires an additional request)
  • Use maxCommentDepth: 1 for faster scrapes when you only need top-level comments
  • Use expandMoreComments: true to get complete comment threads (slower but more thorough)
  • Residential proxies are used by default for best reliability
  • Use filterKeywords to reduce noise when scraping broad subreddits
  • Disable enableSentiment and enableScoring if you don't need intelligence features (slightly faster)

FAQ

Q: Do I need a Reddit API key? A: No. This actor uses Reddit's public JSON endpoints which don't require authentication.

Q: Can I scrape private/quarantined subreddits? A: No. Only public subreddits are accessible without authentication.

Q: What happens if Reddit rate-limits the actor? A: The actor automatically backs off and retries with exponential delays. This is handled transparently.

Q: Can I schedule recurring scrapes? A: Yes! Use Apify's built-in scheduling to run the actor on a recurring basis (daily, hourly, etc.).

Q: What's the maximum number of posts I can scrape? A: The input accepts up to 10,000 posts per subreddit. Reddit's API typically makes ~1,000 unique posts available per listing, so actual results may be lower for some subreddits.

Q: Does sentiment analysis use an external API? A: No. Sentiment analysis uses a built-in lexicon (AFINN-style word list) that runs entirely locally. There are no external API calls and no extra cost.

Q: What's the difference between filterKeywords and searchQuery? A: searchQuery (search mode) tells Reddit's search engine what to find. filterKeywords is a post-processing filter applied after scraping to further narrow results. You can use both together for maximum precision.

Q: How does "Expand Hidden Comments" work? A: Reddit hides many comments behind "more comments" links. When expandMoreComments is enabled, the actor makes additional API calls to fetch these hidden comments, resulting in more complete threads but slower scraping.

Use Cases

  1. Market Research — Track product discussions, competitor mentions, and industry trends
  2. Brand Monitoring — Monitor your brand mentions across Reddit with sentiment scoring
  3. AI/ML Training Data — Collect high-quality conversational data for LLM fine-tuning
  4. Lead Generation — Find potential customers discussing problems your product solves
  5. Academic Research — Study online communities and discourse patterns
  6. Content Strategy — Discover trending topics and what resonates with audiences
  7. Competitive Intelligence — Monitor competitor discussions and user sentiment
  8. PR Crisis Detection — Get early warnings on negative sentiment spikes
  9. Product Feedback Mining — Extract feature requests and complaints from Reddit threads
  10. Investment Research — Monitor sentiment around stocks, crypto, and markets

Support

If you encounter any issues, please open an issue on this actor's page or reach out via Apify community channels. We typically respond within 24 hours.