Reddit Posts & Comments Scrape
Pricing
$7.99/month + usage
Reddit Posts & Comments Scrape
Unlock the power of Reddit’s vast community discussions with this high-performance data extraction tool. Whether you are tracking trends, performing sentiment analysis, or generating niche leads, the Reddit Posts & Comments Scraper provides clean, structured data in seconds.
Pricing
$7.99/month + usage
Rating
0.0
(0)
Developer

Scrape Pilot
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Share
🚀 Reddit Posts & Comments Scraper
📋 Table of Contents
- About
- Features
- Installation
- Quick Start
- Configuration
- Input/Output Format
- API Reference
- Examples
- Proxy Support
- Rate Limiting
- Troubleshooting
- Contributing
- License
- FAQ
📖 About
The Reddit Posts & Comments Scraper is a professional-grade reddit scraper tool designed to efficiently extract public posts, comments, and metadata from Reddit subreddits. Whether you're conducting market research, sentiment analysis, or building data-driven applications, this reddit scraper provides reliable and structured data extraction capabilities.
This tool is built with scalability and compliance in mind, respecting Reddit's API guidelines while delivering high-performance data extraction for developers, researchers, and businesses.
✨ Features
| Feature | Description |
|---|---|
| 🎯 Targeted Scraping | Extract posts from specific subreddits with custom filters |
| 💬 Comment Extraction | Optional comment scraping for deeper insights |
| 🔒 Proxy Support | Residential & datacenter proxy configuration included |
| 📊 Rich Metadata | Get scores, upvote ratios, authors, flairs, and more |
| 🔄 Multiple Sort Options | Sort by hot, new, top, rising, and controversial |
| ⏱️ Time Filtering | Filter posts by hour, day, week, month, year, or all time |
| 📁 Multiple Formats | Export data in JSON, CSV, or XML formats |
| 🚀 High Performance | Optimized for large-scale data extraction |
| 🛡️ Rate Limiting | Built-in rate limiting to avoid IP bans |
| 📝 Detailed Logging | Comprehensive logging for debugging and monitoring |
📦 Installation
Prerequisites
- Python 3.8 or higher
- pip (Python package manager)
- Reddit API credentials (optional but recommended)
Step-by-Step Installation
# 1. Clone the repositorygit clone https://github.com/yourusername/reddit-scraper.gitcd reddit-scraper# 2. Create a virtual environment (recommended)python -m venv venv# 3. Activate the virtual environment# On Windows:venv\Scripts\activate# On macOS/Linux:source venv/bin/activate# 4. Install dependenciespip install -r requirements.txt# 5. Verify installationpython reddit_scraper.py --version
Requirements File (requirements.txt)
requests>=2.28.0praw>=7.7.0pandas>=1.5.0beautifulsoup4>=4.11.0lxml>=4.9.0python-dotenv>=1.0.0apify-client>=1.0.0
⚡ Quick Start
Basic Usage
from reddit_scraper import RedditScraper# Initialize the scraperscraper = RedditScraper()# Define your configurationconfig = {"include_comments": False,"subreddit": "technology","sort": "hot","time_filter": "all","max_results": 25}# Run the scraperresults = scraper.scrape(config)# Export to JSONscraper.export_to_json(results, "output.json")# Export to CSVscraper.export_to_csv(results, "output.csv")
Command Line Usage
# Basic scrapepython reddit_scraper.py --subreddit technology --max-results 25# With commentspython reddit_scraper.py --subreddit technology --include-comments --max-results 50# With custom sort and time filterpython reddit_scraper.py --subreddit technology --sort top --time-filter week --max-results 100# With proxy configurationpython reddit_scraper.py --subreddit technology --use-proxy --proxy-group RESIDENTIAL
⚙️ Configuration
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
subreddit | string | ✅ Yes | - | Target subreddit name (e.g., "technology") |
include_comments | boolean | ❌ No | false | Whether to scrape comments for each post |
sort | string | ❌ No | "hot" | Sort order: hot, new, top, rising, controversial |
time_filter | string | ❌ No | "all" | Time range: hour, day, week, month, year, all |
max_results | integer | ❌ No | 25 | Maximum number of posts to scrape (1-1000) |
proxyConfiguration.useApifyProxy | boolean | ❌ No | false | Enable Apify proxy service |
proxyConfiguration.apifyProxyGroups | array | ❌ No | [] | Proxy groups: RESIDENTIAL, DATACENTER |
Example Configuration
{"include_comments": false,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]},"subreddit": "technology","sort": "hot","time_filter": "all","max_results": 25}
📥 Input/Output Format
Input Example
{"include_comments": false,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]},"subreddit": "technology","sort": "hot","time_filter": "all","max_results": 25}
Output Example
[{"post_id": "1rt52qa","title": "Meta planning sweeping layoffs as AI costs mount","text": null,"score": 4506,"upvote_ratio": 0.97,"url": "https://www.reuters.com/business/world-at-work/meta-planning-sweeping-layoffs-ai-costs-mount-2026-03-14/","permalink": "https://www.reddit.com/r/technology/comments/1rt52qa/meta_planning_sweeping_layoffs_as_ai_costs_mount/","author": "joe4942","subreddit": "technology","flair": "Business","num_comments": 569,"awards": 0,"is_video": false,"domain": "reuters.com","thumbnail": "https://external-preview.redd.it/...","created_at": "1773448769"}]
Output Fields Description
| Field | Type | Description |
|---|---|---|
post_id | string | Unique Reddit post identifier |
title | string | Post title |
text | string/null | Self-post text content (null for link posts) |
score | integer | Total upvotes minus downvotes |
upvote_ratio | float | Percentage of upvotes (0.0 - 1.0) |
url | string | Original link URL (for link posts) |
permalink | string | Reddit post permalink |
author | string | Post author username |
subreddit | string | Subreddit name |
flair | string/null | Post flair text |
num_comments | integer | Number of comments on the post |
awards | integer | Total awards received |
is_video | boolean | Whether the post is a video |
domain | string/null | Domain of the linked content |
thumbnail | string/null | Thumbnail image URL |
created_at | string | Unix timestamp of post creation |
🔌 API Reference
Class: RedditScraper
Constructor
scraper = RedditScraper(api_credentials=None, rate_limit=True)
| Parameter | Type | Default | Description |
|---|---|---|---|
api_credentials | dict | None | Reddit API credentials (client_id, client_secret) |
rate_limit | boolean | True | Enable automatic rate limiting |
Methods
| Method | Parameters | Returns | Description |
|---|---|---|---|
scrape(config) | config: dict | list | Main scraping method |
export_to_json(data, filename) | data: list, filename: str | bool | Export data to JSON file |
export_to_csv(data, filename) | data: list, filename: str | bool | Export data to CSV file |
export_to_xml(data, filename) | data: list, filename: str | bool | Export data to XML file |
validate_config(config) | config: dict | bool | Validate configuration parameters |
get_subreddit_info(name) | name: str | dict | Get subreddit metadata |
💡 Examples
Example 1: Scrape Top Posts from r/technology
config = {"subreddit": "technology","sort": "top","time_filter": "week","max_results": 50}results = scraper.scrape(config)print(f"Scraped {len(results)} posts")
Example 2: Scrape with Comments
config = {"subreddit": "programming","include_comments": True,"sort": "hot","max_results": 10}results = scraper.scrape(config)for post in results:print(f"Post: {post['title']}")print(f"Comments: {len(post.get('comments', []))}")
Example 3: Multiple Subreddits
subreddits = ["technology", "programming", "artificial"]for subreddit in subreddits:config = {"subreddit": subreddit,"max_results": 25}results = scraper.scrape(config)scraper.export_to_json(results, f"{subreddit}_posts.json")
Example 4: With Proxy Configuration
config = {"subreddit": "technology","proxyConfiguration": {"useApifyProxy": True,"apifyProxyGroups": ["RESIDENTIAL"]},"max_results": 100}results = scraper.scrape(config)
🔐 Proxy Support
This reddit scraper supports advanced proxy configurations to avoid rate limiting and IP bans.
Supported Proxy Types
| Proxy Type | Description | Best For |
|---|---|---|
RESIDENTIAL | Real user IP addresses | High-volume scraping |
DATACENTER | Datacenter IP addresses | Fast, cost-effective scraping |
Proxy Configuration
{"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"],"apifyProxyCountry": "US"}}
Environment Variables
# .env fileAPIFY_API_TOKEN=your_apify_token_herePROXY_ENABLED=truePROXY_GROUP=RESIDENTIAL
⏱️ Rate Limiting
To ensure responsible usage and avoid bans, this reddit scraper includes built-in rate limiting:
| Action | Rate Limit | Recommendation |
|---|---|---|
| API Requests | 60/minute | Use proxy for higher limits |
| Post Scraping | 100/minute | Enable delays between requests |
| Comment Scraping | 50/minute | Use residential proxies |
Rate Limit Configuration
scraper = RedditScraper(rate_limit=True,rate_limit_delay=1.0, # seconds between requestsmax_retries=3)
🛠️ Troubleshooting
Common Issues
| Issue | Solution |
|---|---|
| 429 Too Many Requests | Enable proxy, increase delay between requests |
| 403 Forbidden | Check subreddit privacy settings, use API credentials |
| Empty Results | Verify subreddit name, check sort/time_filter values |
| Connection Timeout | Enable proxy, check network connection |
| Invalid JSON Output | Validate input configuration format |
Debug Mode
# Enable verbose loggingpython reddit_scraper.py --subreddit technology --debug# Check API statuspython reddit_scraper.py --status-check
Log Files
Logs are saved in ./logs/scraper.log by default. Configure log level:
import logginglogging.basicConfig(level=logging.DEBUG)
🤝 Contributing
We welcome contributions! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Setup
# Clone your forkgit clone https://github.com/yourusername/reddit-scraper.git# Install dev dependenciespip install -r requirements-dev.txt# Run testspytest tests/# Run lintingflake8 .black .
Code Style
- Follow PEP 8 guidelines
- Add docstrings for all functions
- Write unit tests for new features
- Update documentation for changes
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
MIT LicenseCopyright (c) 2026 Reddit ScraperPermission is hereby granted, free of charge, to any person obtaining a copyof this software and associated documentation files (the "Software"), to dealin the Software without restriction, including without limitation the rightsto use, copy, modify, merge, publish, distribute, sublicense, and/or sellcopies of the Software, and to permit persons to whom the Software isfurnished to do so, subject to the following conditions:The above copyright notice and this permission notice shall be included in allcopies or substantial portions of the Software.
❓ FAQ
Q: Is this reddit scraper legal?
A: Yes, this tool only scrapes publicly available data and respects Reddit's API terms of service. Always use responsibly and comply with Reddit's guidelines.
Q: Do I need Reddit API credentials?
A: Not required, but recommended for higher rate limits and better reliability. You can get free API credentials at Reddit Apps.
Q: Can I scrape private subreddits?
A: No, this reddit scraper only works with public subreddits. Private subreddits require authentication and are not supported.
Q: What's the maximum number of posts I can scrape?
A: Technically unlimited, but we recommend staying under 1000 posts per run to avoid rate limiting. Use pagination for larger datasets.
Q: Does this work with Reddit's new API changes?
A: Yes, this tool is updated regularly to comply with Reddit's API changes. Check the releases page for the latest version.
Q: Can I scrape comments recursively?
A: Yes, enable include_comments: true in your configuration. Note that this increases API calls significantly.
Q: How do I report bugs or request features?
A: Please open an issue on our GitHub Issues page with detailed information.
📞 Support
- Documentation: https://reddit-scraper.readthedocs.io
- Issues: https://github.com/yourusername/reddit-scraper/issues
- Discussions: https://github.com/yourusername/reddit-scraper/discussions
- Email: support@reddit-scraper.com
🙏 Acknowledgments
- Reddit API - For providing the data access
- PRAW - Python Reddit API Wrapper
- Apify - Proxy services
- All contributors and supporters
📊 Usage Statistics
