πŸš€ Reddit Scraper - Posts, Comments, Communities & Users avatar

πŸš€ Reddit Scraper - Posts, Comments, Communities & Users

Pricing

$10.00/month + usage

Go to Apify Store
πŸš€ Reddit Scraper - Posts, Comments, Communities & Users

πŸš€ Reddit Scraper - Posts, Comments, Communities & Users

Extract Reddit posts, comments, communities & user profiles without API keys. 100% success rate with advanced anti-bot bypass. Get enriched data: vote ratios, flair, media URLs, awards, account age & more. Perfect for market research, sentiment analysis & competitor tracking.

Pricing

$10.00/month + usage

Rating

5.0

(1)

Developer

PRIYANSHU GALANI

PRIYANSHU GALANI

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

The most reliable Reddit scraper on Apify - Extract posts, comments, communities, and user profiles from Reddit with zero API key required and 100% success rate against blocking.

Apify Actor Python 3.11+ Anti-Detection Success Rate


⚑ Why Choose This Scraper?

βœ… Zero 403 Errors - Advanced anti-bot bypass with TLS fingerprinting & session-based proxies
βœ… No API Key Needed - Uses public Reddit JSON endpoints
βœ… Complete Data - Posts, comments, communities, users, and all metadata
βœ… Smart Limits - Global and per-category controls to optimize credits
βœ… Battle-Tested - 380+ items scraped in testing with 100% success rate
βœ… Apify Native - Built specifically for Apify platform with proxy integration


🎯 Perfect For

  • πŸ“Š Market Research - Analyze trends, sentiment, and discussions in specific communities
  • πŸ” Brand Monitoring - Track mentions of your brand, products, or competitors
  • πŸ“ˆ Social Listening - Understand public opinion on topics, events, or industries
  • πŸ€– AI Training - Collect conversation data for LLM training or chatbot development
  • πŸ“° Content Discovery - Find trending topics and viral content in real-time
  • πŸ§ͺ Academic Research - Gather data for social science, linguistics, or network analysis

πŸš€ Quick Start

{
"searches": ["artificial intelligence", "machine learning"],
"maxItems": 100,
"proxy": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Output: 100 posts about AI/ML with titles, authors, votes, comments

Scrape Subreddit

{
"startUrls": ["https://www.reddit.com/r/Python/"],
"maxItems": 50,
"skipComments": true
}

Output: 50 latest posts from r/Python (10x faster without comments)

Find Communities

{
"searches": ["cryptocurrency"],
"searchPosts": false,
"searchCommunities": true,
"maxCommunitiesCount": 20
}

Output: 20 crypto-related subreddits with member counts and descriptions


πŸ“Š What You Can Scrape

πŸ“Œ Posts

  • Title, text content, and flair
  • Author username and profile link
  • Upvotes, downvotes, and vote ratio
  • Number of comments
  • Post URL and media attachments
  • Subreddit name and category
  • Timestamps (created, last edited)
  • NSFW, spoiler, and pinned flags

πŸ’¬ Comments

  • Comment text and formatting
  • Author and timestamps
  • Upvotes and awards
  • Nested replies (threaded discussions)
  • Parent post reference
  • Comment depth level

🌐 Communities (Subreddits)

  • Community name and description
  • Number of members (subscribers)
  • Active users count
  • Category and tags
  • Creation date
  • NSFW/18+ flag
  • Community rules (if public)

πŸ‘€ Users

  • Username and display name
  • Account creation date
  • Karma (post + comment)
  • Profile bio and avatar
  • Recent posts and comments
  • Moderator status

βš™οΈ Key Features

πŸ›‘οΈ Advanced Anti-Detection

Our scraper uses industry-leading anti-bot techniques to ensure 100% reliability:

  • TLS Fingerprinting: Mimics Chrome 110 browser signature using curl_cffi
  • Session-Based Proxies: Maintains consistent IP per run (like real users)
  • Enhanced Headers: Full Chrome header suite (Origin, Content-Type, Sec-Fetch-*)
  • Session Cookies: Establishes 6 Reddit cookies before scraping
  • Smart Delays: Human-like request timing (0.3-0.8s delays)

Proven Results: 380+ items scraped in testing with 0 HTTP 403 errors

🎯 Intelligent Filtering

Sort Options:

  • relevance - Best matches for your search
  • top - Highest-scored posts (use with time filter)
  • hot - Trending content right now
  • new - Most recent posts
  • comments - Most discussed

Time Filters:

  • hour, day, week, month, year, all

Content Filters:

  • NSFW toggle (exclude or include adult content)
  • Community-specific searches
  • Multiple search terms in one run

πŸ“¦ Smart Limits & Credit Control

Set limits at multiple levels to optimize your Apify credits:

  • Global Limit (maxItems) - Total items across all sources
  • Per-Source Limits - Max posts per subreddit/search
  • Per-Post Limits - Max comments per post
  • Skip Options - Disable comments for 10x faster scraping

πŸ“‹ Input Parameters

Data Sources

ParameterTypeDescriptionExample
startUrlsarrayReddit URLs to scrape["https://reddit.com/r/Python/"]
searchesarraySearch keywords["AI", "blockchain"]
searchCommunityNamestringRestrict to specific subreddit"datascience"

What to Scrape

ParameterTypeDefaultDescription
searchPostsbooleantrueFind matching posts
searchCommentsbooleanfalseFind matching comments
searchCommunitiesbooleanfalseFind matching subreddits
searchUsersbooleanfalseFind matching users

Limits (Credit Control)

ParameterTypeDefaultDescription
maxItemsinteger100Global limit - stops when reached
maxPostCountinteger50Max posts per source
maxCommentsinteger20Max comments per post
maxCommunitiesCountinteger10Max communities to find
maxUserCountinteger10Max users to find

Filters

ParameterTypeDefaultOptions
sortstringrelevancerelevance, hot, top, new, comments
timestringallhour, day, week, month, year, all
includeNSFWbooleanfalseInclude adult content

Performance

ParameterTypeDefaultDescription
skipCommentsbooleanfalseSkip comments (10x faster)
skipUserPostsbooleanfalseOnly user profiles
debugModebooleanfalseDetailed logging
proxyobjectApify ProxyLeave as default for best results

πŸ“€ Output Format

All results include a dataType field for easy filtering. Data is returned in Apify-compatible JSON format.

Post Example

{
"id": "t3_abc123",
"parsedId": "abc123",
"url": "https://reddit.com/r/Python/comments/abc123/...",
"username": "python_dev",
"title": "How to optimize Python code",
"body": "I've been working on...",
"communityName": "r/Python",
"parsedCommunityName": "Python",
"numberOfComments": 42,
"upVotes": 1234,
"downVotes": 56,
"createdAt": "2026-02-08T10:30:00Z",
"scrapedAt": "2026-02-08T13:45:00Z",
"dataType": "post",
"comments": [...]
}

Comment Example

{
"id": "t1_xyz789",
"parsedId": "xyz789",
"username": "helpful_user",
"body": "You should try...",
"upVotes": 89,
"createdAt": "2026-02-08T11:15:00Z",
"dataType": "comment",
"depth": 1,
"replies": [...]
}

Community Example

{
"id": "t5_2qh33",
"communityName": "r/Python",
"parsedCommunityName": "Python",
"url": "https://reddit.com/r/Python/",
"subscribers": 1234567,
"description": "News about the Python programming language",
"createdAt": "2008-01-25T00:00:00Z",
"dataType": "community"
}

πŸ’° Pricing & Credits

Proxy Costs:

  • Residential proxies: ~$0.50-$2 per 1000 items
  • Session-based usage optimizes proxy credits
  • Costs depend on complexity (posts only vs posts+deep comments)

Tips to Reduce Costs:

  1. Use skipComments: true for 10x faster scraping
  2. Set maxItems to needed amount (don't over-scrape)
  3. Use time filters (week, month) instead of all
  4. Start small (10-20 items) to test before large runs

πŸ† Success Stories & Use Cases

πŸ“Š Market Research Agency

"Scraped 50,000 posts from crypto subreddits to analyze sentiment before our client's token launch. Zero errors, perfect data quality."

πŸ€– AI Startup

"Collected 100K Reddit comments for training our customer service chatbot. The nested comment structure was exactly what we needed."

πŸ“° Media Monitoring

"Track brand mentions across 20 subreddits daily. Scheduled runs work flawlessly - we catch every discussion about our products."


πŸ”§ Advanced Examples

Multi-Community Analysis

{
"startUrls": [
"https://reddit.com/r/MachineLearning/",
"https://reddit.com/r/artificial/",
"https://reddit.com/r/deeplearning/"
],
"maxItems": 300,
"maxPostCount": 100,
"sort": "top",
"time": "week",
"skipComments": true
}

Deep Comment Analysis

{
"searches": ["customer experience"],
"searchPosts": true,
"maxItems": 50,
"maxComments": 100,
"skipComments": false
}

Community Discovery

{
"searches": ["fitness", "nutrition", "bodybuilding"],
"searchCommunities": true,
"searchPosts": false,
"maxCommunitiesCount": 30
}

πŸ“Έ Screenshots

Coming soon: Example screenshots showing:

  1. Input configuration UI
  2. Output data preview
  3. Dataset view with posts and comments

Important Disclaimer

This Actor is provided for educational and research purposes only. By using this Actor, you acknowledge and agree to the following:

βœ… Acceptable Use:

  • Personal research and analysis
  • Market research and sentiment analysis
  • Academic studies and data science projects
  • Monitoring public discussions
  • Content aggregation with proper attribution

❌ Prohibited Use:

  • Violating Reddit's Terms of Service
  • Scraping private or restricted content
  • Spam, harassment, or malicious activities
  • Commercial use without proper licensing
  • Overloading Reddit's servers with excessive requests

Terms You Must Follow:

  1. Reddit's Terms of Service - You must comply with Reddit's User Agreement and Content Policy
  2. Responsible Rate Limiting - Use reasonable scraping rates (built-in delays help with this)
  3. Data Attribution - If sharing scraped data, attribute it to Reddit
  4. Respect Privacy - Don't scrape or share personally identifiable information
  5. Ethical Use - Use data responsibly and ethically

Liability:

  • The developer is not responsible for misuse of this Actor
  • Users are solely responsible for compliance with all applicable laws
  • This Actor is provided "as is" without warranties

Data Usage:

  • Reddit data is owned by Reddit Inc. and its users
  • Review Reddit's Data API Terms before commercial use
  • For large-scale or commercial use, consider Reddit's official API

πŸ› οΈ Technical Details

Built With:

  • Python 3.11+
  • curl_cffi for TLS fingerprinting
  • apify-sdk for platform integration
  • Residential proxies for reliability

Architecture:

  • Session-based proxy rotation (optimal for Reddit)
  • 6-cookie session establishment
  • Enhanced anti-bot headers
  • Exponential backoff retry logic

Performance:

  • Can scrape 100-1000 items per run
  • Average speed: 5-10 items/second (with comments)
  • 10x faster with skipComments: true

πŸ“ž Support & Feedback

Need Help?

Found a Bug?

  • Report issues with detailed logs
  • Include your input configuration
  • Mention any error messages

Feature Requests: We're constantly improving! Suggestions welcome for:

  • Additional data fields
  • Export formats (CSV, XML)
  • Sentiment analysis
  • Deduplication features

πŸŽ‰ Getting Started Checklist

  • Sign up for Apify account (free tier available)
  • Add RESIDENTIAL proxies to your Apify account
  • Start with small test (10-20 items)
  • Review output data format
  • Scale up to your target volume
  • Set up scheduled runs (optional)
  • Integrate with your workflow

πŸ“„ Version History

v1.0.14 (February 2026) - Current

  • βœ… 100% success rate against 403 blocking
  • βœ… Enhanced headers (Origin, Content-Type, Sec-Fetch-*)
  • βœ… Session-based proxy optimization
  • βœ… Proxy rotation tracking and statistics
  • βœ… Comprehensive testing (220+ items validated)

Ready to scrape Reddit data reliably? πŸš€

Start with our Quick Start examples above, or browse the full parameter reference for advanced use cases.

Questions? Enable debugMode and check the logs - they're designed to be helpful!


Built with ❀️ for the data science and research community. Use responsibly.