Reddit Intelligence Scraper avatar
Reddit Intelligence Scraper
Under maintenance

Pricing

from $3.00 / 1,000 reddit posts

Go to Apify Store
Reddit Intelligence Scraper

Reddit Intelligence Scraper

Under maintenance

Reddit is one of the largest real-time sources of consumer opinions, trends, and product feedback. Reddit Intelligence Scraper is an advanced Apify Actor built to turn Reddit into a powerful business, research, and growth-hacking intelligence engine.

Pricing

from $3.00 / 1,000 reddit posts

Rating

0.0

(0)

Developer

charith wijesundara

charith wijesundara

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

A production-ready Apify Actor for scraping Reddit posts and comments with AI-powered intelligence extraction.

Features

  • Multi-source scraping: Subreddits, search results, and user profiles
  • Full comment trees: Extract nested comment threads
  • AI-powered analysis: Sentiment, topic extraction, entity recognition (via LlamaIndex + OpenAI)
  • Anti-ban system: Proxy rotation, session pooling, CAPTCHA detection, human-like delays
  • Playwright support: JavaScript rendering for dynamic content
  • LangGraph orchestration: Agentic crawl decision-making

Input Schema

{
"subreddits": ["entrepreneur", "startups"],
"keywords": ["stripe", "shopify", "saas"],
"users": ["spez"],
"maxPosts": 100,
"maxCommentsPerPost": 50,
"sort": "hot",
"time": "week",
"minScore": 10,
"includeNSFW": false,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
},
"enablePlaywright": true,
"enableAI": false,
"openaiApiKey": ""
}

Output Format

Each scraped post produces a dataset item:

{
"type": "reddit_post",
"post": {
"post_id": "abc123",
"subreddit": "startups",
"title": "Post title",
"body": "Post content...",
"author": "username",
"score": 150,
"upvote_ratio": 0.95,
"num_comments": 42,
"awards": ["Gold"],
"flair": "Discussion",
"created_utc": "2024-01-15T10:30:00Z",
"post_age_hours": 24.5,
"url": "https://...",
"permalink": "/r/startups/comments/...",
"is_nsfw": false,
"is_locked": false,
"is_archived": false
},
"comments": [
{
"comment_id": "xyz789",
"parent_id": "abc123",
"author": "commenter",
"body": "Comment text...",
"score": 25,
"depth": 0,
"created_utc": "2024-01-15T11:00:00Z",
"is_op": false,
"is_deleted": false
}
],
"ai": {
"sentiment": 0.75,
"topics": ["entrepreneurship", "funding", "product-market-fit"],
"entities": ["Stripe", "Y Combinator", "Series A"]
},
"scraped_at": "2024-01-15T12:00:00Z",
"source": "reddit"
}

Local Development

# Install dependencies
pip install -r requirements.txt
# Install Playwright browsers
playwright install chromium
# Run locally
apify run

Deployment

# Login to Apify
apify login
# Deploy to Apify platform
apify push

Configuration

Proxy Settings

Residential proxies are strongly recommended for Reddit scraping:

{
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

AI Processing

The actor supports three AI providers for sentiment analysis, topic extraction, and entity recognition:

ProviderModelsDefault
OpenAIgpt-4o, gpt-4o-mini, gpt-3.5-turbogpt-4o
Geminigemini-2.0-flash, gemini-1.5-progemini-2.0-flash
Anthropicclaude-3-5-sonnet, claude-3-5-haikuclaude-3-5-sonnet

Configuration Example:

{
"enableAI": true,
"aiProvider": "openai",
"aiModel": "gpt-4o",
"openaiApiKey": "sk-..."
}

For Gemini:

{
"enableAI": true,
"aiProvider": "gemini",
"geminiApiKey": "AIza..."
}

For Anthropic:

{
"enableAI": true,
"aiProvider": "anthropic",
"anthropicApiKey": "sk-ant-..."
}

Architecture

src/
├── __init__.py # Package init
├── __main__.py # Entry point
├── main.py # Actor initialization and orchestration
├── items.py # Scrapy item definitions
├── middlewares.py # Anti-ban middlewares
├── pipelines.py # Data processing pipelines
├── settings.py # Scrapy configuration
├── agents/ # LangGraph orchestration
│ ├── __init__.py
│ └── graph.py
├── ai/ # LlamaIndex AI processing
│ ├── __init__.py
│ └── processor.py
└── spiders/ # Scrapy spiders
├── __init__.py
└── reddit_spider.py