Reddit Community Scraper πΎ
Pricing
Pay per usage
Reddit Community Scraper πΎ
Efficiently extract detailed data from Reddit communities and subreddits. This lightweight actor is designed for speed and simplicity. For optimal performance and to minimize the risk of rate limiting or blocking, the use of residential proxies is highly recommended.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Shahid Irfan
Actor stats
0
Bookmarked
13
Total users
5
Monthly active users
18 days ago
Last modified
Categories
Share
Reddit Community Scraper
Extract comprehensive data from Reddit communities with ease. Collect posts, comments, and user information at scale for research, analysis, and monitoring. Perfect for market intelligence, content analysis, and social media insights.
Features
- Post Extraction β Collect complete post data including titles, content, and metadata
- Comment Collection β Gather threaded comments with configurable depth limits
- Advanced Search β Find posts, communities, and users across Reddit
- Flexible Filtering β Filter by date ranges, content type, and custom criteria
- Pagination Control β Specify exact page ranges for precise data collection
- High Performance β Concurrent scraping with built-in proxy support
- Rich Data Output β Structured JSON with all relevant fields and timestamps
Use Cases
Social Media Research
Analyze trending topics and community sentiment across Reddit. Understand what topics are gaining traction and how communities are responding to current events.
Market Intelligence
Track product mentions, brand discussions, and consumer feedback. Identify emerging trends and customer pain points across different communities.
Content Analysis
Build comprehensive datasets for sentiment analysis and topic modeling. Study how information spreads and evolves within online communities.
Competitive Monitoring
Monitor competitor mentions and industry discussions. Stay informed about market changes and customer perceptions in real-time.
Academic Research
Collect large-scale social data for sociological and psychological studies. Analyze community behavior patterns and information diffusion.
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
startUrls | Array | Yes | β | Reddit URLs to scrape (subreddits, posts, or user profiles) |
maxPostCount | Integer | No | 4 | Maximum number of posts to collect (0-10000) |
maxCommentsPerPost | Integer | No | 2 | Maximum comments per post (0-1000, 0 = no comments) |
skipComments | Boolean | No | false | Skip comment scraping entirely |
startPage | Integer | No | 1 | Starting page number |
endPage | Integer | No | null | Ending page number (null for unlimited) |
searchQuery | String | No | "" | Search term for posts, communities, or users |
searchPosts | Boolean | No | false | Enable post search |
searchCommunities | Boolean | No | false | Enable community search |
searchComments | Boolean | No | false | Enable comment search |
sort | String | No | "new" | Sort order: hot, new, top, rising, relevance, best, comments |
time | String | No | "all" | Time filter: hour, day, week, month, year, all |
maxPostAgeDays | Integer | No | null | Only collect posts from last N days |
includeNSFW | Boolean | No | false | Include NSFW content |
ignoreStartUrls | Boolean | No | false | Ignore start URLs when using search |
maxConcurrency | Integer | No | 10 | Maximum concurrent requests |
maxRequestRetries | Integer | No | 3 | Retry limit for failed requests |
debugMode | Boolean | No | false | Enable detailed logging |
proxyConfiguration | Object | No | β | Proxy settings for scraping |
Output Data
Each item in the dataset contains:
| Field | Type | Description |
|---|---|---|
dataType | String | Type of data: post, comment, or community |
id | String | Reddit's unique identifier |
url | String | Direct link to the content |
username | String | Author's username |
title | String | Post or community title |
body | String | Text content |
communityName | String | Subreddit name |
numberOfComments | Integer | Comment count |
upVotes | Integer | Upvote count |
upVoteRatio | Number | Upvote ratio (0-1) |
createdAt | String | Creation timestamp (ISO 8601) |
scrapedAt | String | Scraping timestamp (ISO 8601) |
isVideo | Boolean | Whether content is a video |
over18 | Boolean | NSFW content flag |
thumbnailUrl | String | Thumbnail image URL |
imageUrls | Array | List of image URLs |
Usage Examples
Basic Subreddit Scraping
Extract recent posts from a subreddit:
{"startUrls": [{ "url": "https://www.reddit.com/r/technology/" }],"maxPostCount": 50,"maxCommentsPerPost": 10}
Advanced Search and Filtering
Search for posts about artificial intelligence with time filters:
{"searchQuery": "artificial intelligence","searchPosts": true,"ignoreStartUrls": true,"maxPostCount": 100,"sort": "top","time": "week","maxPostAgeDays": 7}
User Profile Analysis
Collect posts from a specific user:
{"startUrls": [{ "url": "https://www.reddit.com/user/username/" }],"maxPostCount": 25,"skipComments": true,"sort": "new"}
Deep Comment Thread Extraction
Extract all comments from a specific post:
{"startUrls": [{ "url": "https://www.reddit.com/r/AskReddit/comments/abc123/" }],"maxPostCount": 1,"maxCommentsPerPost": 500}
Community Discovery
Find and analyze communities related to a topic:
{"searchQuery": "machine learning","searchCommunities": true,"ignoreStartUrls": true,"maxPostCount": 20}
Sample Output
{"dataType": "post","id": "t3_abc123","url": "https://www.reddit.com/r/technology/comments/abc123/example-post/","username": "tech_enthusiast","title": "New breakthrough in quantum computing","communityName": "r/technology","body": "Researchers have achieved a major milestone in quantum computing technology...","numberOfComments": 42,"upVotes": 1234,"upVoteRatio": 0.95,"isVideo": false,"over18": false,"createdAt": "2025-01-15T10:30:00.000Z","scrapedAt": "2025-01-15T12:00:00.000Z","imageUrls": ["https://i.redd.it/example.jpg"]}
Tips for Best Results
Optimize Search Queries
- Use specific keywords for better results
- Combine multiple terms for precise targeting
- Test queries on Reddit first to verify results
Manage Data Volume
- Start with smaller limits for testing (20-50 posts)
- Increase gradually for production runs
- Use date filters to focus on recent content
Handle Rate Limits
- Enable proxy support for large-scale scraping
- Adjust concurrency based on your needs
- Use retry settings for reliable data collection
Quality Filtering
- Exclude NSFW content unless specifically needed
- Filter by engagement metrics (upvotes, comments)
- Use time ranges to focus on current discussions
Integrations
Connect your Reddit data with:
- Google Sheets β Export for collaborative analysis
- Airtable β Build searchable community databases
- Slack β Get notifications on trending topics
- Webhooks β Send data to custom endpoints
- Make β Create automated social monitoring workflows
- Zapier β Trigger actions based on Reddit activity
Export Formats
Download data in multiple formats:
- JSON β For developers and API integrations
- CSV β For spreadsheet analysis and reporting
- Excel β For business intelligence dashboards
- XML β For system integrations and feeds
Frequently Asked Questions
How many posts can I collect?
You can collect up to 10,000 posts per run. The practical limit depends on the subreddit size and your filtering criteria.
Can I scrape comments without posts?
Yes, set maxPostCount to 1 and maxCommentsPerPost to your desired comment limit when scraping specific posts.
What if I get rate limited?
The scraper includes built-in proxy support. Enable residential proxies in the proxy configuration for best results with large-scale scraping.
How do I search across multiple subreddits?
Use the search functionality with ignoreStartUrls: true to search across all of Reddit, or provide multiple start URLs for specific communities.
Can I filter by post score or engagement?
Use the sort parameter to order by hot, top, or controversial posts, and filter by date ranges using maxPostAgeDays.
What data is included in the output?
The scraper extracts all publicly available data including text content, metadata, timestamps, and media URLs. Some fields may be empty if not provided by Reddit.
How do I handle NSFW content?
Set includeNSFW: true to include adult content, or leave as false to filter it out. Always respect community guidelines and local laws.
Can I run this continuously?
Use Apify's scheduling features to run the scraper at regular intervals for ongoing monitoring and data collection.
Support
For issues or feature requests, contact support through the Apify Console.
Resources
Legal Notice
This actor is designed for legitimate data collection purposes. Users are responsible for ensuring compliance with Reddit's terms of service and applicable laws. Use data responsibly and respect rate limits.