Reddit Posts & Comments Scrape avatar

Reddit Posts & Comments Scrape

Pricing

$7.99/month + usage

Go to Apify Store
Reddit Posts & Comments Scrape

Reddit Posts & Comments Scrape

Unlock the power of Reddit’s vast community discussions with this high-performance data extraction tool. Whether you are tracking trends, performing sentiment analysis, or generating niche leads, the Reddit Posts & Comments Scraper provides clean, structured data in seconds.

Pricing

$7.99/month + usage

Rating

0.0

(0)

Developer

Scrape Pilot

Scrape Pilot

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

🚀 Reddit Posts & Comments Scraper


📋 Table of Contents


📖 About

The Reddit Posts & Comments Scraper is a professional-grade reddit scraper tool designed to efficiently extract public posts, comments, and metadata from Reddit subreddits. Whether you're conducting market research, sentiment analysis, or building data-driven applications, this reddit scraper provides reliable and structured data extraction capabilities.

This tool is built with scalability and compliance in mind, respecting Reddit's API guidelines while delivering high-performance data extraction for developers, researchers, and businesses.


✨ Features

FeatureDescription
🎯 Targeted ScrapingExtract posts from specific subreddits with custom filters
💬 Comment ExtractionOptional comment scraping for deeper insights
🔒 Proxy SupportResidential & datacenter proxy configuration included
📊 Rich MetadataGet scores, upvote ratios, authors, flairs, and more
🔄 Multiple Sort OptionsSort by hot, new, top, rising, and controversial
⏱️ Time FilteringFilter posts by hour, day, week, month, year, or all time
📁 Multiple FormatsExport data in JSON, CSV, or XML formats
🚀 High PerformanceOptimized for large-scale data extraction
🛡️ Rate LimitingBuilt-in rate limiting to avoid IP bans
📝 Detailed LoggingComprehensive logging for debugging and monitoring

📦 Installation

Prerequisites

  • Python 3.8 or higher
  • pip (Python package manager)
  • Reddit API credentials (optional but recommended)

Step-by-Step Installation

# 1. Clone the repository
git clone https://github.com/yourusername/reddit-scraper.git
cd reddit-scraper
# 2. Create a virtual environment (recommended)
python -m venv venv
# 3. Activate the virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# 4. Install dependencies
pip install -r requirements.txt
# 5. Verify installation
python reddit_scraper.py --version

Requirements File (requirements.txt)

requests>=2.28.0
praw>=7.7.0
pandas>=1.5.0
beautifulsoup4>=4.11.0
lxml>=4.9.0
python-dotenv>=1.0.0
apify-client>=1.0.0

⚡ Quick Start

Basic Usage

from reddit_scraper import RedditScraper
# Initialize the scraper
scraper = RedditScraper()
# Define your configuration
config = {
"include_comments": False,
"subreddit": "technology",
"sort": "hot",
"time_filter": "all",
"max_results": 25
}
# Run the scraper
results = scraper.scrape(config)
# Export to JSON
scraper.export_to_json(results, "output.json")
# Export to CSV
scraper.export_to_csv(results, "output.csv")

Command Line Usage

# Basic scrape
python reddit_scraper.py --subreddit technology --max-results 25
# With comments
python reddit_scraper.py --subreddit technology --include-comments --max-results 50
# With custom sort and time filter
python reddit_scraper.py --subreddit technology --sort top --time-filter week --max-results 100
# With proxy configuration
python reddit_scraper.py --subreddit technology --use-proxy --proxy-group RESIDENTIAL

⚙️ Configuration

Input Parameters

ParameterTypeRequiredDefaultDescription
subredditstring✅ Yes-Target subreddit name (e.g., "technology")
include_commentsboolean❌ NofalseWhether to scrape comments for each post
sortstring❌ No"hot"Sort order: hot, new, top, rising, controversial
time_filterstring❌ No"all"Time range: hour, day, week, month, year, all
max_resultsinteger❌ No25Maximum number of posts to scrape (1-1000)
proxyConfiguration.useApifyProxyboolean❌ NofalseEnable Apify proxy service
proxyConfiguration.apifyProxyGroupsarray❌ No[]Proxy groups: RESIDENTIAL, DATACENTER

Example Configuration

{
"include_comments": false,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
},
"subreddit": "technology",
"sort": "hot",
"time_filter": "all",
"max_results": 25
}

📥 Input/Output Format

Input Example

{
"include_comments": false,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
},
"subreddit": "technology",
"sort": "hot",
"time_filter": "all",
"max_results": 25
}

Output Example

[
{
"post_id": "1rt52qa",
"title": "Meta planning sweeping layoffs as AI costs mount",
"text": null,
"score": 4506,
"upvote_ratio": 0.97,
"url": "https://www.reuters.com/business/world-at-work/meta-planning-sweeping-layoffs-ai-costs-mount-2026-03-14/",
"permalink": "https://www.reddit.com/r/technology/comments/1rt52qa/meta_planning_sweeping_layoffs_as_ai_costs_mount/",
"author": "joe4942",
"subreddit": "technology",
"flair": "Business",
"num_comments": 569,
"awards": 0,
"is_video": false,
"domain": "reuters.com",
"thumbnail": "https://external-preview.redd.it/...",
"created_at": "1773448769"
}
]

Output Fields Description

FieldTypeDescription
post_idstringUnique Reddit post identifier
titlestringPost title
textstring/nullSelf-post text content (null for link posts)
scoreintegerTotal upvotes minus downvotes
upvote_ratiofloatPercentage of upvotes (0.0 - 1.0)
urlstringOriginal link URL (for link posts)
permalinkstringReddit post permalink
authorstringPost author username
subredditstringSubreddit name
flairstring/nullPost flair text
num_commentsintegerNumber of comments on the post
awardsintegerTotal awards received
is_videobooleanWhether the post is a video
domainstring/nullDomain of the linked content
thumbnailstring/nullThumbnail image URL
created_atstringUnix timestamp of post creation

🔌 API Reference

Class: RedditScraper

Constructor

scraper = RedditScraper(api_credentials=None, rate_limit=True)
ParameterTypeDefaultDescription
api_credentialsdictNoneReddit API credentials (client_id, client_secret)
rate_limitbooleanTrueEnable automatic rate limiting

Methods

MethodParametersReturnsDescription
scrape(config)config: dictlistMain scraping method
export_to_json(data, filename)data: list, filename: strboolExport data to JSON file
export_to_csv(data, filename)data: list, filename: strboolExport data to CSV file
export_to_xml(data, filename)data: list, filename: strboolExport data to XML file
validate_config(config)config: dictboolValidate configuration parameters
get_subreddit_info(name)name: strdictGet subreddit metadata

💡 Examples

Example 1: Scrape Top Posts from r/technology

config = {
"subreddit": "technology",
"sort": "top",
"time_filter": "week",
"max_results": 50
}
results = scraper.scrape(config)
print(f"Scraped {len(results)} posts")

Example 2: Scrape with Comments

config = {
"subreddit": "programming",
"include_comments": True,
"sort": "hot",
"max_results": 10
}
results = scraper.scrape(config)
for post in results:
print(f"Post: {post['title']}")
print(f"Comments: {len(post.get('comments', []))}")

Example 3: Multiple Subreddits

subreddits = ["technology", "programming", "artificial"]
for subreddit in subreddits:
config = {
"subreddit": subreddit,
"max_results": 25
}
results = scraper.scrape(config)
scraper.export_to_json(results, f"{subreddit}_posts.json")

Example 4: With Proxy Configuration

config = {
"subreddit": "technology",
"proxyConfiguration": {
"useApifyProxy": True,
"apifyProxyGroups": ["RESIDENTIAL"]
},
"max_results": 100
}
results = scraper.scrape(config)

🔐 Proxy Support

This reddit scraper supports advanced proxy configurations to avoid rate limiting and IP bans.

Supported Proxy Types

Proxy TypeDescriptionBest For
RESIDENTIALReal user IP addressesHigh-volume scraping
DATACENTERDatacenter IP addressesFast, cost-effective scraping

Proxy Configuration

{
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"],
"apifyProxyCountry": "US"
}
}

Environment Variables

# .env file
APIFY_API_TOKEN=your_apify_token_here
PROXY_ENABLED=true
PROXY_GROUP=RESIDENTIAL

⏱️ Rate Limiting

To ensure responsible usage and avoid bans, this reddit scraper includes built-in rate limiting:

ActionRate LimitRecommendation
API Requests60/minuteUse proxy for higher limits
Post Scraping100/minuteEnable delays between requests
Comment Scraping50/minuteUse residential proxies

Rate Limit Configuration

scraper = RedditScraper(
rate_limit=True,
rate_limit_delay=1.0, # seconds between requests
max_retries=3
)

🛠️ Troubleshooting

Common Issues

IssueSolution
429 Too Many RequestsEnable proxy, increase delay between requests
403 ForbiddenCheck subreddit privacy settings, use API credentials
Empty ResultsVerify subreddit name, check sort/time_filter values
Connection TimeoutEnable proxy, check network connection
Invalid JSON OutputValidate input configuration format

Debug Mode

# Enable verbose logging
python reddit_scraper.py --subreddit technology --debug
# Check API status
python reddit_scraper.py --status-check

Log Files

Logs are saved in ./logs/scraper.log by default. Configure log level:

import logging
logging.basicConfig(level=logging.DEBUG)

🤝 Contributing

We welcome contributions! Here's how you can help:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Setup

# Clone your fork
git clone https://github.com/yourusername/reddit-scraper.git
# Install dev dependencies
pip install -r requirements-dev.txt
# Run tests
pytest tests/
# Run linting
flake8 .
black .

Code Style

  • Follow PEP 8 guidelines
  • Add docstrings for all functions
  • Write unit tests for new features
  • Update documentation for changes

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License
Copyright (c) 2026 Reddit Scraper
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

❓ FAQ

A: Yes, this tool only scrapes publicly available data and respects Reddit's API terms of service. Always use responsibly and comply with Reddit's guidelines.

Q: Do I need Reddit API credentials?

A: Not required, but recommended for higher rate limits and better reliability. You can get free API credentials at Reddit Apps.

Q: Can I scrape private subreddits?

A: No, this reddit scraper only works with public subreddits. Private subreddits require authentication and are not supported.

Q: What's the maximum number of posts I can scrape?

A: Technically unlimited, but we recommend staying under 1000 posts per run to avoid rate limiting. Use pagination for larger datasets.

Q: Does this work with Reddit's new API changes?

A: Yes, this tool is updated regularly to comply with Reddit's API changes. Check the releases page for the latest version.

Q: Can I scrape comments recursively?

A: Yes, enable include_comments: true in your configuration. Note that this increases API calls significantly.

Q: How do I report bugs or request features?

A: Please open an issue on our GitHub Issues page with detailed information.


📞 Support


🙏 Acknowledgments

  • Reddit API - For providing the data access
  • PRAW - Python Reddit API Wrapper
  • Apify - Proxy services
  • All contributors and supporters


📊 Usage Statistics

Reddit Scraper Stats