Reddit Community Scraper πŸ‘Ύ avatar

Reddit Community Scraper πŸ‘Ύ

Pricing

Pay per usage

Go to Apify Store
Reddit Community Scraper πŸ‘Ύ

Reddit Community Scraper πŸ‘Ύ

Efficiently extract detailed data from Reddit communities and subreddits. This lightweight actor is designed for speed and simplicity. For optimal performance and to minimize the risk of rate limiting or blocking, the use of residential proxies is highly recommended.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

0

Bookmarked

13

Total users

5

Monthly active users

18 days ago

Last modified

Share

Reddit Community Scraper

Extract comprehensive data from Reddit communities with ease. Collect posts, comments, and user information at scale for research, analysis, and monitoring. Perfect for market intelligence, content analysis, and social media insights.

Features

  • Post Extraction β€” Collect complete post data including titles, content, and metadata
  • Comment Collection β€” Gather threaded comments with configurable depth limits
  • Advanced Search β€” Find posts, communities, and users across Reddit
  • Flexible Filtering β€” Filter by date ranges, content type, and custom criteria
  • Pagination Control β€” Specify exact page ranges for precise data collection
  • High Performance β€” Concurrent scraping with built-in proxy support
  • Rich Data Output β€” Structured JSON with all relevant fields and timestamps

Use Cases

Social Media Research

Analyze trending topics and community sentiment across Reddit. Understand what topics are gaining traction and how communities are responding to current events.

Market Intelligence

Track product mentions, brand discussions, and consumer feedback. Identify emerging trends and customer pain points across different communities.

Content Analysis

Build comprehensive datasets for sentiment analysis and topic modeling. Study how information spreads and evolves within online communities.

Competitive Monitoring

Monitor competitor mentions and industry discussions. Stay informed about market changes and customer perceptions in real-time.

Academic Research

Collect large-scale social data for sociological and psychological studies. Analyze community behavior patterns and information diffusion.


Input Parameters

ParameterTypeRequiredDefaultDescription
startUrlsArrayYesβ€”Reddit URLs to scrape (subreddits, posts, or user profiles)
maxPostCountIntegerNo4Maximum number of posts to collect (0-10000)
maxCommentsPerPostIntegerNo2Maximum comments per post (0-1000, 0 = no comments)
skipCommentsBooleanNofalseSkip comment scraping entirely
startPageIntegerNo1Starting page number
endPageIntegerNonullEnding page number (null for unlimited)
searchQueryStringNo""Search term for posts, communities, or users
searchPostsBooleanNofalseEnable post search
searchCommunitiesBooleanNofalseEnable community search
searchCommentsBooleanNofalseEnable comment search
sortStringNo"new"Sort order: hot, new, top, rising, relevance, best, comments
timeStringNo"all"Time filter: hour, day, week, month, year, all
maxPostAgeDaysIntegerNonullOnly collect posts from last N days
includeNSFWBooleanNofalseInclude NSFW content
ignoreStartUrlsBooleanNofalseIgnore start URLs when using search
maxConcurrencyIntegerNo10Maximum concurrent requests
maxRequestRetriesIntegerNo3Retry limit for failed requests
debugModeBooleanNofalseEnable detailed logging
proxyConfigurationObjectNoβ€”Proxy settings for scraping

Output Data

Each item in the dataset contains:

FieldTypeDescription
dataTypeStringType of data: post, comment, or community
idStringReddit's unique identifier
urlStringDirect link to the content
usernameStringAuthor's username
titleStringPost or community title
bodyStringText content
communityNameStringSubreddit name
numberOfCommentsIntegerComment count
upVotesIntegerUpvote count
upVoteRatioNumberUpvote ratio (0-1)
createdAtStringCreation timestamp (ISO 8601)
scrapedAtStringScraping timestamp (ISO 8601)
isVideoBooleanWhether content is a video
over18BooleanNSFW content flag
thumbnailUrlStringThumbnail image URL
imageUrlsArrayList of image URLs

Usage Examples

Basic Subreddit Scraping

Extract recent posts from a subreddit:

{
"startUrls": [
{ "url": "https://www.reddit.com/r/technology/" }
],
"maxPostCount": 50,
"maxCommentsPerPost": 10
}

Advanced Search and Filtering

Search for posts about artificial intelligence with time filters:

{
"searchQuery": "artificial intelligence",
"searchPosts": true,
"ignoreStartUrls": true,
"maxPostCount": 100,
"sort": "top",
"time": "week",
"maxPostAgeDays": 7
}

User Profile Analysis

Collect posts from a specific user:

{
"startUrls": [
{ "url": "https://www.reddit.com/user/username/" }
],
"maxPostCount": 25,
"skipComments": true,
"sort": "new"
}

Deep Comment Thread Extraction

Extract all comments from a specific post:

{
"startUrls": [
{ "url": "https://www.reddit.com/r/AskReddit/comments/abc123/" }
],
"maxPostCount": 1,
"maxCommentsPerPost": 500
}

Community Discovery

Find and analyze communities related to a topic:

{
"searchQuery": "machine learning",
"searchCommunities": true,
"ignoreStartUrls": true,
"maxPostCount": 20
}

Sample Output

{
"dataType": "post",
"id": "t3_abc123",
"url": "https://www.reddit.com/r/technology/comments/abc123/example-post/",
"username": "tech_enthusiast",
"title": "New breakthrough in quantum computing",
"communityName": "r/technology",
"body": "Researchers have achieved a major milestone in quantum computing technology...",
"numberOfComments": 42,
"upVotes": 1234,
"upVoteRatio": 0.95,
"isVideo": false,
"over18": false,
"createdAt": "2025-01-15T10:30:00.000Z",
"scrapedAt": "2025-01-15T12:00:00.000Z",
"imageUrls": ["https://i.redd.it/example.jpg"]
}

Tips for Best Results

Optimize Search Queries

  • Use specific keywords for better results
  • Combine multiple terms for precise targeting
  • Test queries on Reddit first to verify results

Manage Data Volume

  • Start with smaller limits for testing (20-50 posts)
  • Increase gradually for production runs
  • Use date filters to focus on recent content

Handle Rate Limits

  • Enable proxy support for large-scale scraping
  • Adjust concurrency based on your needs
  • Use retry settings for reliable data collection

Quality Filtering

  • Exclude NSFW content unless specifically needed
  • Filter by engagement metrics (upvotes, comments)
  • Use time ranges to focus on current discussions

Integrations

Connect your Reddit data with:

  • Google Sheets β€” Export for collaborative analysis
  • Airtable β€” Build searchable community databases
  • Slack β€” Get notifications on trending topics
  • Webhooks β€” Send data to custom endpoints
  • Make β€” Create automated social monitoring workflows
  • Zapier β€” Trigger actions based on Reddit activity

Export Formats

Download data in multiple formats:

  • JSON β€” For developers and API integrations
  • CSV β€” For spreadsheet analysis and reporting
  • Excel β€” For business intelligence dashboards
  • XML β€” For system integrations and feeds

Frequently Asked Questions

How many posts can I collect?

You can collect up to 10,000 posts per run. The practical limit depends on the subreddit size and your filtering criteria.

Can I scrape comments without posts?

Yes, set maxPostCount to 1 and maxCommentsPerPost to your desired comment limit when scraping specific posts.

What if I get rate limited?

The scraper includes built-in proxy support. Enable residential proxies in the proxy configuration for best results with large-scale scraping.

How do I search across multiple subreddits?

Use the search functionality with ignoreStartUrls: true to search across all of Reddit, or provide multiple start URLs for specific communities.

Can I filter by post score or engagement?

Use the sort parameter to order by hot, top, or controversial posts, and filter by date ranges using maxPostAgeDays.

What data is included in the output?

The scraper extracts all publicly available data including text content, metadata, timestamps, and media URLs. Some fields may be empty if not provided by Reddit.

How do I handle NSFW content?

Set includeNSFW: true to include adult content, or leave as false to filter it out. Always respect community guidelines and local laws.

Can I run this continuously?

Use Apify's scheduling features to run the scraper at regular intervals for ongoing monitoring and data collection.


Support

For issues or feature requests, contact support through the Apify Console.

Resources


This actor is designed for legitimate data collection purposes. Users are responsible for ensuring compliance with Reddit's terms of service and applicable laws. Use data responsibly and respect rate limits.