Reddit Advanced Scraper
Pricing
Pay per usage
Reddit Advanced Scraper
The Advanced Reddit Scraper is able to scrape reddit posts and comments and returns them in 1 output item per post.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Crazee Media
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Reddit Advanced Scraper - Any Subreddit (Anti-Rate-Limit)
Powerful Reddit scraper for any subreddit with advanced anti-rate-limit protection. Scrape posts and comments with full configurability.
✨ Features
- 📊 Scrape Any Subreddit - Choose any subreddit (AskReddit, technology, gaming, science, etc.)
- 💬 Full Comment Threads - Scrape nested comments with configurable depth
- 🛡️ Anti-Rate-Limit Protection - Automatic session rotation, adaptive delays, user agent switching
- 🎯 Fully Configurable - Control post count, comment depth, delays, and more
- 📤 Webhook Support - Send results to your webhook (n8n, Zapier, etc.)
- 📈 Real-Time Logging - Track progress with detailed logs
- 🔄 Adaptive Intelligence - Automatically adjusts delays based on rate limiting
🚀 Quick Start
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| subreddit | string | AskReddit | Name of subreddit (without r/) |
| sort | enum | hot | Sort posts by: hot, new, top, rising |
| maxPosts | integer | 1000 | Maximum posts to scrape (1-10000) |
| maxComments | integer | 50 | Max comments per post (0-500) |
| maxDepth | integer | 5 | Comment nesting depth (0-10) |
| baseDelay | integer | 6 | Base delay between requests in seconds |
| sessionRotationRequests | integer | 50 | Rotate session every N requests |
| webhookUrl | string | null | Optional webhook URL for results |
| adaptiveDelays | boolean | true | Auto-adjust delays based on rate limiting |
Example Configuration
{"subreddit": "technology","sort": "hot","maxPosts": 500,"maxComments": 100,"maxDepth": 5,"baseDelay": 6,"adaptiveDelays": true}
📦 Output Format
Each scraped post includes:
{"title": "Post title","author": "username","subreddit": "technology","score": 12345,"num_comments": 567,"created_utc": "2026-01-01T12:00:00Z","url": "https://reddit.com/...","permalink": "https://old.reddit.com/r/technology/...","selftext": "Post body text","is_self": true,"comments": [{"author": "commenter","body": "Comment text","score": 123,"depth": 0,"replies": [...]}],"total_comments_scraped": 45}
🛡️ Anti-Rate-Limit Features
1. Session Rotation
Automatically creates fresh sessions every 50 requests (configurable) with new identities.
2. Adaptive Delays
- Starts with base delay (default 6s)
- Increases exponentially when rate limited
- Gradually decreases with successful requests
- Adds random jitter to appear more human
3. User Agent Rotation
Rotates through 10+ different user agents:
- Chrome (Windows, Mac, Linux)
- Firefox (Windows, Mac, Linux)
- Safari (Mac)
4. Exponential Backoff
Automatically backs off with increasing delays on failures (2s, 4s, 8s, 16s, etc.)
5. Smart Request Timing
- Random jitter on all delays (±20%)
- Configurable base delays
- Respects Reddit's Retry-After headers
🎯 Use Cases
- Market Research - Analyze sentiment and trends in specific communities
- Academic Research - Collect data for social media studies
- Community Analysis - Understand discussion patterns and popular topics
- Content Discovery - Find trending posts and discussions
- Data Collection - Build datasets for ML/AI training
- Monitoring - Track specific subreddits for mentions or topics
💡 Tips for Best Results
Avoid Rate Limits
- Use default delay of 6+ seconds
- Enable adaptive delays
- Keep session rotation at 50 requests
- Don't scrape too many posts at once (stay under 1000)
Get More Data
- Increase
maxCommentsfor deeper discussions - Increase
maxDepthfor nested reply chains - Use
sort: "new"for latest content - Use
sort: "top"for highest quality
Performance
- Lower
maxCommentsfor faster scraping - Reduce
maxDepthif you don't need deep threads - Increase
baseDelayif you get rate limited
🔗 Webhook Integration
Send results to your webhook endpoint (n8n, Zapier, Make, etc.):
{"webhookUrl": "https://your-webhook.com/endpoint"}
Webhook payload format:
{"timestamp": "2026-01-01T12:00:00Z","scraper_type": "reddit","action": "scrape_complete","count": 500,"metadata": {"version": "2.0-apify","source": "Apify Actor","scraped_at": "2026-01-01T12:00:00Z","count": 500},"data": [...]}
Deploy to Apify
-
Log in to Apify:
$apify login -
Deploy your Actor:
$apify push -
Find it under Actors -> My Actors
⚠️ Important Notes
- Reddit's old.reddit.com interface is used (more scraping-friendly)
- No API authentication required (uses web scraping)
- Respects Reddit's rate limits automatically
- Data is saved to Apify dataset storage
- Always check Reddit's Terms of Service for your use case
📊 Popular Subreddits to Scrape
- r/AskReddit - Popular Q&A discussions
- r/technology - Tech news and discussions
- r/science - Scientific articles and comments
- r/gaming - Gaming community discussions
- r/news - Breaking news and comments
- Any other public subreddit!
🔧 Technical Details
- Built with: Python, BeautifulSoup, Requests, Apify SDK
- Parsing: lxml parser for fast HTML processing
- Session Management: Connection pooling for efficiency
- Error Handling: Automatic retries with exponential backoff
- Logging: Detailed progress tracking via Apify SDK
Version: 2.0 Built with advanced anti-rate-limit technology