Reddit Advanced Scraper avatar
Reddit Advanced Scraper

Pricing

Pay per usage

Go to Apify Store
Reddit Advanced Scraper

Reddit Advanced Scraper

The Advanced Reddit Scraper is able to scrape reddit posts and comments and returns them in 1 output item per post.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Crazee Media

Crazee Media

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Reddit Advanced Scraper - Any Subreddit (Anti-Rate-Limit)

Powerful Reddit scraper for any subreddit with advanced anti-rate-limit protection. Scrape posts and comments with full configurability.

✨ Features

  • 📊 Scrape Any Subreddit - Choose any subreddit (AskReddit, technology, gaming, science, etc.)
  • 💬 Full Comment Threads - Scrape nested comments with configurable depth
  • 🛡️ Anti-Rate-Limit Protection - Automatic session rotation, adaptive delays, user agent switching
  • 🎯 Fully Configurable - Control post count, comment depth, delays, and more
  • 📤 Webhook Support - Send results to your webhook (n8n, Zapier, etc.)
  • 📈 Real-Time Logging - Track progress with detailed logs
  • 🔄 Adaptive Intelligence - Automatically adjusts delays based on rate limiting

🚀 Quick Start

Input Parameters

ParameterTypeDefaultDescription
subredditstringAskRedditName of subreddit (without r/)
sortenumhotSort posts by: hot, new, top, rising
maxPostsinteger1000Maximum posts to scrape (1-10000)
maxCommentsinteger50Max comments per post (0-500)
maxDepthinteger5Comment nesting depth (0-10)
baseDelayinteger6Base delay between requests in seconds
sessionRotationRequestsinteger50Rotate session every N requests
webhookUrlstringnullOptional webhook URL for results
adaptiveDelaysbooleantrueAuto-adjust delays based on rate limiting

Example Configuration

{
"subreddit": "technology",
"sort": "hot",
"maxPosts": 500,
"maxComments": 100,
"maxDepth": 5,
"baseDelay": 6,
"adaptiveDelays": true
}

📦 Output Format

Each scraped post includes:

{
"title": "Post title",
"author": "username",
"subreddit": "technology",
"score": 12345,
"num_comments": 567,
"created_utc": "2026-01-01T12:00:00Z",
"url": "https://reddit.com/...",
"permalink": "https://old.reddit.com/r/technology/...",
"selftext": "Post body text",
"is_self": true,
"comments": [
{
"author": "commenter",
"body": "Comment text",
"score": 123,
"depth": 0,
"replies": [...]
}
],
"total_comments_scraped": 45
}

🛡️ Anti-Rate-Limit Features

1. Session Rotation

Automatically creates fresh sessions every 50 requests (configurable) with new identities.

2. Adaptive Delays

  • Starts with base delay (default 6s)
  • Increases exponentially when rate limited
  • Gradually decreases with successful requests
  • Adds random jitter to appear more human

3. User Agent Rotation

Rotates through 10+ different user agents:

  • Chrome (Windows, Mac, Linux)
  • Firefox (Windows, Mac, Linux)
  • Safari (Mac)

4. Exponential Backoff

Automatically backs off with increasing delays on failures (2s, 4s, 8s, 16s, etc.)

5. Smart Request Timing

  • Random jitter on all delays (±20%)
  • Configurable base delays
  • Respects Reddit's Retry-After headers

🎯 Use Cases

  • Market Research - Analyze sentiment and trends in specific communities
  • Academic Research - Collect data for social media studies
  • Community Analysis - Understand discussion patterns and popular topics
  • Content Discovery - Find trending posts and discussions
  • Data Collection - Build datasets for ML/AI training
  • Monitoring - Track specific subreddits for mentions or topics

💡 Tips for Best Results

Avoid Rate Limits

  • Use default delay of 6+ seconds
  • Enable adaptive delays
  • Keep session rotation at 50 requests
  • Don't scrape too many posts at once (stay under 1000)

Get More Data

  • Increase maxComments for deeper discussions
  • Increase maxDepth for nested reply chains
  • Use sort: "new" for latest content
  • Use sort: "top" for highest quality

Performance

  • Lower maxComments for faster scraping
  • Reduce maxDepth if you don't need deep threads
  • Increase baseDelay if you get rate limited

🔗 Webhook Integration

Send results to your webhook endpoint (n8n, Zapier, Make, etc.):

{
"webhookUrl": "https://your-webhook.com/endpoint"
}

Webhook payload format:

{
"timestamp": "2026-01-01T12:00:00Z",
"scraper_type": "reddit",
"action": "scrape_complete",
"count": 500,
"metadata": {
"version": "2.0-apify",
"source": "Apify Actor",
"scraped_at": "2026-01-01T12:00:00Z",
"count": 500
},
"data": [...]
}

Deploy to Apify

  1. Log in to Apify:

    $apify login
  2. Deploy your Actor:

    $apify push
  3. Find it under Actors -> My Actors

⚠️ Important Notes

  • Reddit's old.reddit.com interface is used (more scraping-friendly)
  • No API authentication required (uses web scraping)
  • Respects Reddit's rate limits automatically
  • Data is saved to Apify dataset storage
  • Always check Reddit's Terms of Service for your use case
  • r/AskReddit - Popular Q&A discussions
  • r/technology - Tech news and discussions
  • r/science - Scientific articles and comments
  • r/gaming - Gaming community discussions
  • r/news - Breaking news and comments
  • Any other public subreddit!

🔧 Technical Details

  • Built with: Python, BeautifulSoup, Requests, Apify SDK
  • Parsing: lxml parser for fast HTML processing
  • Session Management: Connection pooling for efficiency
  • Error Handling: Automatic retries with exponential backoff
  • Logging: Detailed progress tracking via Apify SDK

Version: 2.0 Built with advanced anti-rate-limit technology