Reddit Scraper | Posts, Search, Comments & User Data avatar

Reddit Scraper | Posts, Search, Comments & User Data

Pricing

$5.99/month + usage

Go to Apify Store
Reddit Scraper | Posts, Search, Comments & User Data

Reddit Scraper | Posts, Search, Comments & User Data

Extract structured Reddit data from subreddits, search results, single posts, and user profiles. Get titles, text, scores, upvote ratio, comments, authors, flairs, timestamps, and more in clean JSON. Built for research, monitoring, trend tracking, and automation

Pricing

$5.99/month + usage

Rating

0.0

(0)

Developer

Scrape Pilot

Scrape Pilot

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

πŸš€ Reddit Posts & Comments Scraper


httpshttpshttpshttpshttpshttpshttpshttpshttpshttpshttpshttpshttpshttpshttpshttps

πŸ“‹ Table of Contents


πŸ“– About

The Reddit Posts & Comments Scraper is a professional-grade reddit scraper tool designed to efficiently extract public posts, comments, and metadata from Reddit subreddits. Whether you're conducting market research, sentiment analysis, or building data-driven applications, this reddit scraper provides reliable and structured data extraction capabilities.

This tool is built with scalability and compliance in mind, respecting Reddit's API guidelines while delivering high-performance data extraction for developers, researchers, and businesses.


✨ Features

FeatureDescription
🎯 Targeted ScrapingExtract posts from specific subreddits with custom filters
πŸ’¬ Comment ExtractionOptional comment scraping for deeper insights
πŸ”’ Proxy SupportResidential & datacenter proxy configuration included
πŸ“Š Rich MetadataGet scores, upvote ratios, authors, flairs, and more
πŸ”„ Multiple Sort OptionsSort by hot, new, top, rising, and controversial
⏱️ Time FilteringFilter posts by hour, day, week, month, year, or all time
πŸ“ Multiple FormatsExport data in JSON, CSV, or XML formats
πŸš€ High PerformanceOptimized for large-scale data extraction
πŸ›‘οΈ Rate LimitingBuilt-in rate limiting to avoid IP bans
πŸ“ Detailed LoggingComprehensive logging for debugging and monitoring

⚑ Quick Start

Basic Usage

from reddit_scraper import RedditScraper
# Initialize the scraper
scraper = RedditScraper()
# Define your configuration
config = {
"include_comments": False,
"subreddit": "technology",
"sort": "hot",
"time_filter": "all",
"max_results": 25
}
# Run the scraper
results = scraper.scrape(config)
# Export to JSON
scraper.export_to_json(results, "output.json")
# Export to CSV
scraper.export_to_csv(results, "output.csv")

Command Line Usage

# Basic scrape
python reddit_scraper.py --subreddit technology --max-results 25
# With comments
python reddit_scraper.py --subreddit technology --include-comments --max-results 50
# With custom sort and time filter
python reddit_scraper.py --subreddit technology --sort top --time-filter week --max-results 100
# With proxy configuration
python reddit_scraper.py --subreddit technology --use-proxy --proxy-group RESIDENTIAL

βš™οΈ Configuration

Input Parameters

ParameterTypeRequiredDefaultDescription
subredditstringβœ… Yes-Target subreddit name (e.g., "technology")
include_commentsboolean❌ NofalseWhether to scrape comments for each post
sortstring❌ No"hot"Sort order: hot, new, top, rising, controversial
time_filterstring❌ No"all"Time range: hour, day, week, month, year, all
max_resultsinteger❌ No25Maximum number of posts to scrape (1-1000)
proxyConfiguration.useApifyProxyboolean❌ NofalseEnable Apify proxy service
proxyConfiguration.apifyProxyGroupsarray❌ No[]Proxy groups: RESIDENTIAL, DATACENTER

Example Configuration

{
"include_comments": false,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
},
"subreddit": "technology",
"sort": "hot",
"time_filter": "all",
"max_results": 25
}

πŸ“₯ Input/Output Format

Input Example

{
"include_comments": false,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
},
"subreddit": "technology",
"sort": "hot",
"time_filter": "all",
"max_results": 25
}

Output Example

[
{
"post_id": "1rt52qa",
"title": "Meta planning sweeping layoffs as AI costs mount",
"text": null,
"score": 4506,
"upvote_ratio": 0.97,
"url": "https://www.reuters.com/business/world-at-work/meta-planning-sweeping-layoffs-ai-costs-mount-2026-03-14/",
"permalink": "https://www.reddit.com/r/technology/comments/1rt52qa/meta_planning_sweeping_layoffs_as_ai_costs_mount/",
"author": "joe4942",
"subreddit": "technology",
"flair": "Business",
"num_comments": 569,
"awards": 0,
"is_video": false,
"domain": "reuters.com",
"thumbnail": "https://external-preview.redd.it/...",
"created_at": "1773448769"
}
]

Output Fields Description

FieldTypeDescription
post_idstringUnique Reddit post identifier
titlestringPost title
textstring/nullSelf-post text content (null for link posts)
scoreintegerTotal upvotes minus downvotes
upvote_ratiofloatPercentage of upvotes (0.0 - 1.0)
urlstringOriginal link URL (for link posts)
permalinkstringReddit post permalink
authorstringPost author username
subredditstringSubreddit name
flairstring/nullPost flair text
num_commentsintegerNumber of comments on the post
awardsintegerTotal awards received
is_videobooleanWhether the post is a video
domainstring/nullDomain of the linked content
thumbnailstring/nullThumbnail image URL
created_atstringUnix timestamp of post creation

πŸ”Œ API Reference

Class: RedditScraper

Constructor

scraper = RedditScraper(api_credentials=None, rate_limit=True)
ParameterTypeDefaultDescription
api_credentialsdictNoneReddit API credentials (client_id, client_secret)
rate_limitbooleanTrueEnable automatic rate limiting

Methods

MethodParametersReturnsDescription
scrape(config)config: dictlistMain scraping method
export_to_json(data, filename)data: list, filename: strboolExport data to JSON file
export_to_csv(data, filename)data: list, filename: strboolExport data to CSV file
export_to_xml(data, filename)data: list, filename: strboolExport data to XML file
validate_config(config)config: dictboolValidate configuration parameters
get_subreddit_info(name)name: strdictGet subreddit metadata

πŸ’‘ Examples

Example 1: Scrape Top Posts from r/technology

config = {
"subreddit": "technology",
"sort": "top",
"time_filter": "week",
"max_results": 50
}
results = scraper.scrape(config)
print(f"Scraped {len(results)} posts")

Example 2: Scrape with Comments

config = {
"subreddit": "programming",
"include_comments": True,
"sort": "hot",
"max_results": 10
}
results = scraper.scrape(config)
for post in results:
print(f"Post: {post['title']}")
print(f"Comments: {len(post.get('comments', []))}")

Example 3: Multiple Subreddits

subreddits = ["technology", "programming", "artificial"]
for subreddit in subreddits:
config = {
"subreddit": subreddit,
"max_results": 25
}
results = scraper.scrape(config)
scraper.export_to_json(results, f"{subreddit}_posts.json")

Example 4: With Proxy Configuration

config = {
"subreddit": "technology",
"proxyConfiguration": {
"useApifyProxy": True,
"apifyProxyGroups": ["RESIDENTIAL"]
},
"max_results": 100
}
results = scraper.scrape(config)

πŸ” Proxy Support

This reddit scraper supports advanced proxy configurations to avoid rate limiting and IP bans.

Supported Proxy Types

Proxy TypeDescriptionBest For
RESIDENTIALReal user IP addressesHigh-volume scraping
DATACENTERDatacenter IP addressesFast, cost-effective scraping

Proxy Configuration

{
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"],
"apifyProxyCountry": "US"
}
}

Environment Variables

# .env file
APIFY_API_TOKEN=your_apify_token_here
PROXY_ENABLED=true
PROXY_GROUP=RESIDENTIAL

⏱️ Rate Limiting

To ensure responsible usage and avoid bans, this reddit scraper includes built-in rate limiting:

ActionRate LimitRecommendation
API Requests60/minuteUse proxy for higher limits
Post Scraping100/minuteEnable delays between requests
Comment Scraping50/minuteUse residential proxies

Rate Limit Configuration

scraper = RedditScraper(
rate_limit=True,
rate_limit_delay=1.0, # seconds between requests
max_retries=3
)

πŸ› οΈ Troubleshooting

Common Issues

IssueSolution
429 Too Many RequestsEnable proxy, increase delay between requests
403 ForbiddenCheck subreddit privacy settings, use API credentials
Empty ResultsVerify subreddit name, check sort/time_filter values
Connection TimeoutEnable proxy, check network connection
Invalid JSON OutputValidate input configuration format

Debug Mode

# Enable verbose logging
python reddit_scraper.py --subreddit technology --debug
# Check API status
python reddit_scraper.py --status-check

Log Files

Logs are saved in ./logs/scraper.log by default. Configure log level:

import logging
logging.basicConfig(level=logging.DEBUG)

🀝 Contributing

We welcome contributions! Here's how you can help:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Setup

# Clone your fork
git clone https://github.com/yourusername/reddit-scraper.git
# Install dev dependencies
pip install -r requirements-dev.txt
# Run tests
pytest tests/
# Run linting
flake8 .
black .

seo keyword


reddit scraper, reddit post scraper, subreddit scraper, reddit search scraper, reddit comments scraper, reddit user scraper, reddit data extractor, reddit api scraper, apify reddit scraper, social listening scraper, community analysis scraper, osint reddit scrape

Code Style

  • Follow PEP 8 guidelines
  • Add docstrings for all functions
  • Write unit tests for new features
  • Update documentation for changes