Pricing

Pay per usage

Try for free

Go to Apify Store

Reddit Community Scraper 👾

Try for free

Efficiently extract detailed data from Reddit communities and subreddits. This lightweight actor is designed for speed and simplicity. For optimal performance and to minimize the risk of rate limiting or blocking, the use of residential proxies is highly recommended.

Pricing

Pay per usage

Rating

5.0

(1)

Developer

Shahid Irfan

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Reddit Community Scraper

Extract comprehensive data from Reddit communities with ease. Collect posts, comments, and user information at scale for research, analysis, and monitoring. Perfect for market intelligence, content analysis, and social media insights.

Features

Post Extraction — Collect complete post data including titles, content, and metadata
Comment Collection — Gather threaded comments with configurable depth limits
Advanced Search — Find posts, communities, and users across Reddit
Flexible Filtering — Filter by date ranges, content type, and custom criteria
Pagination Control — Specify exact page ranges for precise data collection
High Performance — Concurrent scraping with built-in proxy support
Rich Data Output — Structured JSON with all relevant fields and timestamps

Use Cases

Analyze trending topics and community sentiment across Reddit. Understand what topics are gaining traction and how communities are responding to current events.

Market Intelligence

Track product mentions, brand discussions, and consumer feedback. Identify emerging trends and customer pain points across different communities.

Content Analysis

Build comprehensive datasets for sentiment analysis and topic modeling. Study how information spreads and evolves within online communities.

Competitive Monitoring

Monitor competitor mentions and industry discussions. Stay informed about market changes and customer perceptions in real-time.

Academic Research

Collect large-scale social data for sociological and psychological studies. Analyze community behavior patterns and information diffusion.

Input Parameters

Parameter	Type	Required	Default	Description
`startUrls`	Array	Yes	—	Reddit URLs to scrape (subreddits, posts, or user profiles)
`maxPostCount`	Integer	No	`4`	Maximum number of posts to collect (0-10000)
`maxCommentsPerPost`	Integer	No	`2`	Maximum comments per post (0-1000, 0 = no comments)
`skipComments`	Boolean	No	`false`	Skip comment scraping entirely
`startPage`	Integer	No	`1`	Starting page number
`endPage`	Integer	No	`null`	Ending page number (null for unlimited)
`searchQuery`	String	No	`""`	Search term for posts, communities, or users
`searchPosts`	Boolean	No	`false`	Enable post search
`searchCommunities`	Boolean	No	`false`	Enable community search
`searchComments`	Boolean	No	`false`	Enable comment search
`sort`	String	No	`"new"`	Sort order: hot, new, top, rising, relevance, best, comments
`time`	String	No	`"all"`	Time filter: hour, day, week, month, year, all
`maxPostAgeDays`	Integer	No	`null`	Only collect posts from last N days
`includeNSFW`	Boolean	No	`false`	Include NSFW content
`ignoreStartUrls`	Boolean	No	`false`	Ignore start URLs when using search
`maxConcurrency`	Integer	No	`10`	Maximum concurrent requests
`maxRequestRetries`	Integer	No	`3`	Retry limit for failed requests
`debugMode`	Boolean	No	`false`	Enable detailed logging
`proxyConfiguration`	Object	No	—	Proxy settings for scraping

Output Data

Each item in the dataset contains:

Field	Type	Description
`dataType`	String	Type of data: post, comment, or community
`id`	String	Reddit's unique identifier
`url`	String	Direct link to the content
`username`	String	Author's username
`title`	String	Post or community title
`body`	String	Text content
`communityName`	String	Subreddit name
`numberOfComments`	Integer	Comment count
`upVotes`	Integer	Upvote count
`upVoteRatio`	Number	Upvote ratio (0-1)
`createdAt`	String	Creation timestamp (ISO 8601)
`scrapedAt`	String	Scraping timestamp (ISO 8601)
`isVideo`	Boolean	Whether content is a video
`over18`	Boolean	NSFW content flag
`thumbnailUrl`	String	Thumbnail image URL
`imageUrls`	Array	List of image URLs

Usage Examples

Basic Subreddit Scraping

Extract recent posts from a subreddit:

{
    "startUrls": [
        { "url": "https://www.reddit.com/r/technology/" }
    ],
    "maxPostCount": 50,
    "maxCommentsPerPost": 10
}

Advanced Search and Filtering

Search for posts about artificial intelligence with time filters:

{
    "searchQuery": "artificial intelligence",
    "searchPosts": true,
    "ignoreStartUrls": true,
    "maxPostCount": 100,
    "sort": "top",
    "time": "week",
    "maxPostAgeDays": 7
}

User Profile Analysis

Collect posts from a specific user:

{
    "startUrls": [
        { "url": "https://www.reddit.com/user/username/" }
    ],
    "maxPostCount": 25,
    "skipComments": true,
    "sort": "new"
}

Deep Comment Thread Extraction

Extract all comments from a specific post:

{
    "startUrls": [
        { "url": "https://www.reddit.com/r/AskReddit/comments/abc123/" }
    ],
    "maxPostCount": 1,
    "maxCommentsPerPost": 500
}

Community Discovery

Find and analyze communities related to a topic:

{
    "searchQuery": "machine learning",
    "searchCommunities": true,
    "ignoreStartUrls": true,
    "maxPostCount": 20
}

Sample Output

{
    "dataType": "post",
    "id": "t3_abc123",
    "url": "https://www.reddit.com/r/technology/comments/abc123/example-post/",
    "username": "tech_enthusiast",
    "title": "New breakthrough in quantum computing",
    "communityName": "r/technology",
    "body": "Researchers have achieved a major milestone in quantum computing technology...",
    "numberOfComments": 42,
    "upVotes": 1234,
    "upVoteRatio": 0.95,
    "isVideo": false,
    "over18": false,
    "createdAt": "2025-01-15T10:30:00.000Z",
    "scrapedAt": "2025-01-15T12:00:00.000Z",
    "imageUrls": ["https://i.redd.it/example.jpg"]
}

Tips for Best Results

Optimize Search Queries

Use specific keywords for better results
Combine multiple terms for precise targeting
Test queries on Reddit first to verify results

Manage Data Volume

Start with smaller limits for testing (20-50 posts)
Increase gradually for production runs
Use date filters to focus on recent content

Handle Rate Limits

Enable proxy support for large-scale scraping
Adjust concurrency based on your needs
Use retry settings for reliable data collection

Quality Filtering

Exclude NSFW content unless specifically needed
Filter by engagement metrics (upvotes, comments)
Use time ranges to focus on current discussions

Integrations

Connect your Reddit data with:

Google Sheets — Export for collaborative analysis
Airtable — Build searchable community databases
Slack — Get notifications on trending topics
Webhooks — Send data to custom endpoints
Make — Create automated social monitoring workflows
Zapier — Trigger actions based on Reddit activity

Export Formats

Download data in multiple formats:

JSON — For developers and API integrations
CSV — For spreadsheet analysis and reporting
Excel — For business intelligence dashboards
XML — For system integrations and feeds

Frequently Asked Questions

How many posts can I collect?

You can collect up to 10,000 posts per run. The practical limit depends on the subreddit size and your filtering criteria.

Can I scrape comments without posts?

Yes, set maxPostCount to 1 and maxCommentsPerPost to your desired comment limit when scraping specific posts.

What if I get rate limited?

The scraper includes built-in proxy support. Enable residential proxies in the proxy configuration for best results with large-scale scraping.

How do I search across multiple subreddits?

Use the search functionality with ignoreStartUrls: true to search across all of Reddit, or provide multiple start URLs for specific communities.

Can I filter by post score or engagement?

Use the sort parameter to order by hot, top, or controversial posts, and filter by date ranges using maxPostAgeDays.

What data is included in the output?

The scraper extracts all publicly available data including text content, metadata, timestamps, and media URLs. Some fields may be empty if not provided by Reddit.

How do I handle NSFW content?

Set includeNSFW: true to include adult content, or leave as false to filter it out. Always respect community guidelines and local laws.

Can I run this continuously?

Use Apify's scheduling features to run the scraper at regular intervals for ongoing monitoring and data collection.

Support

For issues or feature requests, contact support through the Apify Console.

Resources

Legal Notice

This actor is designed for legitimate data collection purposes. Users are responsible for ensuring compliance with Reddit's terms of service and applicable laws. Use data responsibly and respect rate limits.

Redfin Property Scraper 🏠

shahidirfan/Redfin-Property-Scraper

Extract real estate listings, property details, and market insights from Redfin. This lightweight scraper is optimized for speed and efficiency. For consistent results and to prevent blocking, the use of residential proxies is highly recommended.

Shahid Irfan

NHS UK Jobs Scraper

shahidirfan/NHS-UK-jobs-Scraper

Efficiently extract vacancies from the UK's official health job board. This lightweight actor is designed for speed and reliability. For the best performance and to avoid blocking, using residential proxies is highly recommended. Streamline your healthcare recruitment data today!

Shahid Irfan

Propertyfinder Scraper 🏠

shahidirfan/Propertyfinder-Scraper

Efficiently scrape real estate listings from Propertyfinder with this lightweight actor. Extract property details, prices, and locations quickly. For the most reliable performance and to minimize blocking risks, using residential proxies is strongly recommended.

Shahid Irfan

FinnNO Job Scraper

shahidirfan/FinnNO-Job-Scraper

Meet the FinnNO Job Scraper, a lightweight actor designed for efficiently extracting job listings from Finn.no. Fast and reliable. For optimal performance and to minimize blocking, the use of residential proxies is strongly recommended. Access Norwegian job market data effortlessly!

Shahid Irfan

5.0

Google Maps Business Scraper📍

shahidirfan/Google-Maps-Business-Scraper

Extract detailed business data from Google Maps efficiently. Captures names, contact info, ratings, and locations instantly. Designed as a lightweight tool perfect for lead generation. Note: For optimal stability and to prevent blocking, the use of residential proxies is strongly recommended.

Shahid Irfan

5.0

Timesjobs Scraper 💼

shahidirfan/Timesjobs-Scraper

Extract job listings efficiently from Timesjobs, a leading Indian career portal. This lightweight actor is designed for fast data collection. For optimal stability and to prevent blocking, the use of residential proxies is strongly recommended.

Shahid Irfan

Wuzzuf Jobs Scraper 🔍

shahidirfan/Wuzzuf-Jobs-Scraper

Extract job listings efficiently from Wuzzuf, Egypt's leading employment platform. This lightweight actor is designed for speed and ease of use. To ensure the best stability and avoid potential blocking, using residential proxies is highly recommended.

Shahid Irfan

5.0

Randstad Job Scraper

shahidirfan/Randstad-Job-Scraper

Extract job listings efficiently with the Randstad Job Scraper. This lightweight solution is built for speed and ease of use. To ensure seamless extraction and reliable performance, the use of residential proxies is highly recommended. Start gathering recruitment data instantly.

Shahid Irfan

5.0

ClearedJobs Scraper

shahidirfan/ClearedJobs-Scraper

Effortlessly extract security-cleared job listings with the ClearedJobs Scraper. This lightweight actor is designed for fast, efficient data extraction. For optimal performance and to avoid IP blocking, using residential proxies is highly recommended. Streamline your recruitment data today!

Shahid Irfan

Roberthalf Jobs Scraper

shahidirfan/Roberthalf-Jobs-Scraper

Efficiently extract detailed job listings from Robert Half, a premier professional staffing agency. This lightweight actor is designed for speed and reliability. To ensure seamless access and avoid blocking, using residential proxies is highly recommended.

Shahid Irfan

Reddit Community Scraper 👾

Reddit Community Scraper

Features

Use Cases

Social Media Research

Market Intelligence

Content Analysis

Competitive Monitoring

Academic Research

Input Parameters

Output Data

Usage Examples

Basic Subreddit Scraping

Advanced Search and Filtering

User Profile Analysis

Deep Comment Thread Extraction

Community Discovery

Sample Output

Tips for Best Results

Optimize Search Queries

Manage Data Volume

Handle Rate Limits

Quality Filtering

Integrations

Export Formats

Frequently Asked Questions

How many posts can I collect?

Can I scrape comments without posts?

What if I get rate limited?

How do I search across multiple subreddits?

Can I filter by post score or engagement?

What data is included in the output?

How do I handle NSFW content?

Can I run this continuously?

Support

Resources

Legal Notice

You might also like

Redfin Property Scraper 🏠

NHS UK Jobs Scraper

Propertyfinder Scraper 🏠

FinnNO Job Scraper

Google Maps Business Scraper📍

Timesjobs Scraper 💼

Wuzzuf Jobs Scraper 🔍

Randstad Job Scraper

ClearedJobs Scraper

Roberthalf Jobs Scraper

Related articles