Reddit Intelligence Scraper
Pricing
from $3.00 / 1,000 reddit posts
Reddit Intelligence Scraper
Reddit is one of the largest real-time sources of consumer opinions, trends, and product feedback. Reddit Intelligence Scraper is an advanced Apify Actor built to turn Reddit into a powerful business, research, and growth-hacking intelligence engine.
Pricing
from $3.00 / 1,000 reddit posts
Rating
0.0
(0)
Developer

charith wijesundara
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 days ago
Last modified
Categories
Share
A production-ready Apify Actor for scraping Reddit posts and comments with AI-powered intelligence extraction.
Features
- Multi-source scraping: Subreddits, search results, and user profiles
- Full comment trees: Extract nested comment threads
- AI-powered analysis: Sentiment, topic extraction, entity recognition (via LlamaIndex + OpenAI)
- Anti-ban system: Proxy rotation, session pooling, CAPTCHA detection, human-like delays
- Playwright support: JavaScript rendering for dynamic content
- LangGraph orchestration: Agentic crawl decision-making
Input Schema
{"subreddits": ["entrepreneur", "startups"],"keywords": ["stripe", "shopify", "saas"],"users": ["spez"],"maxPosts": 100,"maxCommentsPerPost": 50,"sort": "hot","time": "week","minScore": 10,"includeNSFW": false,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]},"enablePlaywright": true,"enableAI": false,"openaiApiKey": ""}
Output Format
Each scraped post produces a dataset item:
{"type": "reddit_post","post": {"post_id": "abc123","subreddit": "startups","title": "Post title","body": "Post content...","author": "username","score": 150,"upvote_ratio": 0.95,"num_comments": 42,"awards": ["Gold"],"flair": "Discussion","created_utc": "2024-01-15T10:30:00Z","post_age_hours": 24.5,"url": "https://...","permalink": "/r/startups/comments/...","is_nsfw": false,"is_locked": false,"is_archived": false},"comments": [{"comment_id": "xyz789","parent_id": "abc123","author": "commenter","body": "Comment text...","score": 25,"depth": 0,"created_utc": "2024-01-15T11:00:00Z","is_op": false,"is_deleted": false}],"ai": {"sentiment": 0.75,"topics": ["entrepreneurship", "funding", "product-market-fit"],"entities": ["Stripe", "Y Combinator", "Series A"]},"scraped_at": "2024-01-15T12:00:00Z","source": "reddit"}
Local Development
# Install dependenciespip install -r requirements.txt# Install Playwright browsersplaywright install chromium# Run locallyapify run
Deployment
# Login to Apifyapify login# Deploy to Apify platformapify push
Configuration
Proxy Settings
Residential proxies are strongly recommended for Reddit scraping:
{"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
AI Processing
The actor supports three AI providers for sentiment analysis, topic extraction, and entity recognition:
| Provider | Models | Default |
|---|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-3.5-turbo | gpt-4o |
| Gemini | gemini-2.0-flash, gemini-1.5-pro | gemini-2.0-flash |
| Anthropic | claude-3-5-sonnet, claude-3-5-haiku | claude-3-5-sonnet |
Configuration Example:
{"enableAI": true,"aiProvider": "openai","aiModel": "gpt-4o","openaiApiKey": "sk-..."}
For Gemini:
{"enableAI": true,"aiProvider": "gemini","geminiApiKey": "AIza..."}
For Anthropic:
{"enableAI": true,"aiProvider": "anthropic","anthropicApiKey": "sk-ant-..."}
Architecture
src/├── __init__.py # Package init├── __main__.py # Entry point├── main.py # Actor initialization and orchestration├── items.py # Scrapy item definitions├── middlewares.py # Anti-ban middlewares├── pipelines.py # Data processing pipelines├── settings.py # Scrapy configuration├── agents/ # LangGraph orchestration│ ├── __init__.py│ └── graph.py├── ai/ # LlamaIndex AI processing│ ├── __init__.py│ └── processor.py└── spiders/ # Scrapy spiders├── __init__.py└── reddit_spider.py