Reddit Scraper | Posts, Search, Comments & User Data
Pricing
$5.99/month + usage
Reddit Scraper | Posts, Search, Comments & User Data
Extract structured Reddit data from subreddits, search results, single posts, and user profiles. Get titles, text, scores, upvote ratio, comments, authors, flairs, timestamps, and more in clean JSON. Built for research, monitoring, trend tracking, and automation
Pricing
$5.99/month + usage
Rating
0.0
(0)
Developer
Scrape Pilot
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 days ago
Last modified
Share
π Reddit Posts & Comments Scraper
httpshttpshttpshttpshttpshttpshttpshttpshttpshttpshttpshttpshttpshttpshttpshttps
π Table of Contents
- About
- Features
- Installation
- Quick Start
- Configuration
- Input/Output Format
- API Reference
- Examples
- Proxy Support
- Rate Limiting
- Troubleshooting
- Contributing
- License
- FAQ
π About
The Reddit Posts & Comments Scraper is a professional-grade reddit scraper tool designed to efficiently extract public posts, comments, and metadata from Reddit subreddits. Whether you're conducting market research, sentiment analysis, or building data-driven applications, this reddit scraper provides reliable and structured data extraction capabilities.
This tool is built with scalability and compliance in mind, respecting Reddit's API guidelines while delivering high-performance data extraction for developers, researchers, and businesses.
β¨ Features
| Feature | Description |
|---|---|
| π― Targeted Scraping | Extract posts from specific subreddits with custom filters |
| π¬ Comment Extraction | Optional comment scraping for deeper insights |
| π Proxy Support | Residential & datacenter proxy configuration included |
| π Rich Metadata | Get scores, upvote ratios, authors, flairs, and more |
| π Multiple Sort Options | Sort by hot, new, top, rising, and controversial |
| β±οΈ Time Filtering | Filter posts by hour, day, week, month, year, or all time |
| π Multiple Formats | Export data in JSON, CSV, or XML formats |
| π High Performance | Optimized for large-scale data extraction |
| π‘οΈ Rate Limiting | Built-in rate limiting to avoid IP bans |
| π Detailed Logging | Comprehensive logging for debugging and monitoring |
β‘ Quick Start
Basic Usage
from reddit_scraper import RedditScraper# Initialize the scraperscraper = RedditScraper()# Define your configurationconfig = {"include_comments": False,"subreddit": "technology","sort": "hot","time_filter": "all","max_results": 25}# Run the scraperresults = scraper.scrape(config)# Export to JSONscraper.export_to_json(results, "output.json")# Export to CSVscraper.export_to_csv(results, "output.csv")
Command Line Usage
# Basic scrapepython reddit_scraper.py --subreddit technology --max-results 25# With commentspython reddit_scraper.py --subreddit technology --include-comments --max-results 50# With custom sort and time filterpython reddit_scraper.py --subreddit technology --sort top --time-filter week --max-results 100# With proxy configurationpython reddit_scraper.py --subreddit technology --use-proxy --proxy-group RESIDENTIAL
βοΈ Configuration
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
subreddit | string | β Yes | - | Target subreddit name (e.g., "technology") |
include_comments | boolean | β No | false | Whether to scrape comments for each post |
sort | string | β No | "hot" | Sort order: hot, new, top, rising, controversial |
time_filter | string | β No | "all" | Time range: hour, day, week, month, year, all |
max_results | integer | β No | 25 | Maximum number of posts to scrape (1-1000) |
proxyConfiguration.useApifyProxy | boolean | β No | false | Enable Apify proxy service |
proxyConfiguration.apifyProxyGroups | array | β No | [] | Proxy groups: RESIDENTIAL, DATACENTER |
Example Configuration
{"include_comments": false,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]},"subreddit": "technology","sort": "hot","time_filter": "all","max_results": 25}
π₯ Input/Output Format
Input Example
{"include_comments": false,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]},"subreddit": "technology","sort": "hot","time_filter": "all","max_results": 25}
Output Example
[{"post_id": "1rt52qa","title": "Meta planning sweeping layoffs as AI costs mount","text": null,"score": 4506,"upvote_ratio": 0.97,"url": "https://www.reuters.com/business/world-at-work/meta-planning-sweeping-layoffs-ai-costs-mount-2026-03-14/","permalink": "https://www.reddit.com/r/technology/comments/1rt52qa/meta_planning_sweeping_layoffs_as_ai_costs_mount/","author": "joe4942","subreddit": "technology","flair": "Business","num_comments": 569,"awards": 0,"is_video": false,"domain": "reuters.com","thumbnail": "https://external-preview.redd.it/...","created_at": "1773448769"}]
Output Fields Description
| Field | Type | Description |
|---|---|---|
post_id | string | Unique Reddit post identifier |
title | string | Post title |
text | string/null | Self-post text content (null for link posts) |
score | integer | Total upvotes minus downvotes |
upvote_ratio | float | Percentage of upvotes (0.0 - 1.0) |
url | string | Original link URL (for link posts) |
permalink | string | Reddit post permalink |
author | string | Post author username |
subreddit | string | Subreddit name |
flair | string/null | Post flair text |
num_comments | integer | Number of comments on the post |
awards | integer | Total awards received |
is_video | boolean | Whether the post is a video |
domain | string/null | Domain of the linked content |
thumbnail | string/null | Thumbnail image URL |
created_at | string | Unix timestamp of post creation |
π API Reference
Class: RedditScraper
Constructor
scraper = RedditScraper(api_credentials=None, rate_limit=True)
| Parameter | Type | Default | Description |
|---|---|---|---|
api_credentials | dict | None | Reddit API credentials (client_id, client_secret) |
rate_limit | boolean | True | Enable automatic rate limiting |
Methods
| Method | Parameters | Returns | Description |
|---|---|---|---|
scrape(config) | config: dict | list | Main scraping method |
export_to_json(data, filename) | data: list, filename: str | bool | Export data to JSON file |
export_to_csv(data, filename) | data: list, filename: str | bool | Export data to CSV file |
export_to_xml(data, filename) | data: list, filename: str | bool | Export data to XML file |
validate_config(config) | config: dict | bool | Validate configuration parameters |
get_subreddit_info(name) | name: str | dict | Get subreddit metadata |
π‘ Examples
Example 1: Scrape Top Posts from r/technology
config = {"subreddit": "technology","sort": "top","time_filter": "week","max_results": 50}results = scraper.scrape(config)print(f"Scraped {len(results)} posts")
Example 2: Scrape with Comments
config = {"subreddit": "programming","include_comments": True,"sort": "hot","max_results": 10}results = scraper.scrape(config)for post in results:print(f"Post: {post['title']}")print(f"Comments: {len(post.get('comments', []))}")
Example 3: Multiple Subreddits
subreddits = ["technology", "programming", "artificial"]for subreddit in subreddits:config = {"subreddit": subreddit,"max_results": 25}results = scraper.scrape(config)scraper.export_to_json(results, f"{subreddit}_posts.json")
Example 4: With Proxy Configuration
config = {"subreddit": "technology","proxyConfiguration": {"useApifyProxy": True,"apifyProxyGroups": ["RESIDENTIAL"]},"max_results": 100}results = scraper.scrape(config)
π Proxy Support
This reddit scraper supports advanced proxy configurations to avoid rate limiting and IP bans.
Supported Proxy Types
| Proxy Type | Description | Best For |
|---|---|---|
RESIDENTIAL | Real user IP addresses | High-volume scraping |
DATACENTER | Datacenter IP addresses | Fast, cost-effective scraping |
Proxy Configuration
{"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"],"apifyProxyCountry": "US"}}
Environment Variables
# .env fileAPIFY_API_TOKEN=your_apify_token_herePROXY_ENABLED=truePROXY_GROUP=RESIDENTIAL
β±οΈ Rate Limiting
To ensure responsible usage and avoid bans, this reddit scraper includes built-in rate limiting:
| Action | Rate Limit | Recommendation |
|---|---|---|
| API Requests | 60/minute | Use proxy for higher limits |
| Post Scraping | 100/minute | Enable delays between requests |
| Comment Scraping | 50/minute | Use residential proxies |
Rate Limit Configuration
scraper = RedditScraper(rate_limit=True,rate_limit_delay=1.0, # seconds between requestsmax_retries=3)
π οΈ Troubleshooting
Common Issues
| Issue | Solution |
|---|---|
| 429 Too Many Requests | Enable proxy, increase delay between requests |
| 403 Forbidden | Check subreddit privacy settings, use API credentials |
| Empty Results | Verify subreddit name, check sort/time_filter values |
| Connection Timeout | Enable proxy, check network connection |
| Invalid JSON Output | Validate input configuration format |
Debug Mode
# Enable verbose loggingpython reddit_scraper.py --subreddit technology --debug# Check API statuspython reddit_scraper.py --status-check
Log Files
Logs are saved in ./logs/scraper.log by default. Configure log level:
import logginglogging.basicConfig(level=logging.DEBUG)
π€ Contributing
We welcome contributions! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Setup
# Clone your forkgit clone https://github.com/yourusername/reddit-scraper.git# Install dev dependenciespip install -r requirements-dev.txt# Run testspytest tests/# Run lintingflake8 .black .
seo keyword
reddit scraper, reddit post scraper, subreddit scraper, reddit search scraper, reddit comments scraper, reddit user scraper, reddit data extractor, reddit api scraper, apify reddit scraper, social listening scraper, community analysis scraper, osint reddit scrape
Code Style
- Follow PEP 8 guidelines
- Add docstrings for all functions
- Write unit tests for new features
- Update documentation for changes