Reddit Community Scraper ๐พ
Pricing
Pay per usage
Reddit Community Scraper ๐พ
Efficiently extract detailed data from Reddit communities and subreddits. This lightweight actor is designed for speed and simplicity. For optimal performance and to minimize the risk of rate limiting or blocking, the use of residential proxies is highly recommended.
Pricing
Pay per usage
Rating
5.0
(1)
Developer
Shahid Irfan
Maintained by CommunityActor stats
2
Bookmarked
27
Total users
7
Monthly active users
a day ago
Last modified
Categories
Share
Reddit Community Scraper
Extract Reddit posts and comments from subreddits, user feeds, and direct thread URLs in a structured dataset. Collect titles, text, engagement metrics, community metadata, comment depth, author details, and source references for research, monitoring, and analysis.
Features
- Subreddit collection โ Gather recent posts from one or more Reddit communities
- Thread comment extraction โ Collect top-level comments and nested replies from each post
- User feed support โ Pull post activity from Reddit user profile URLs
- Rich metadata output โ Save author, community, engagement, moderation, flair, and timestamp fields
- Clean datasets โ Skip duplicate records and omit null or empty values from saved items
- Flexible limits โ Control how many posts and comments to collect in each run
Use Cases
Community Research
Track what specific Reddit communities are discussing right now. Use structured post and comment data to study trends, recurring questions, and audience language.
Competitive Monitoring
Watch how people talk about products, services, and industries inside target subreddits. Capture both original posts and comment responses for a fuller view of sentiment and objections.
Content Analysis
Build datasets for topic clustering, moderation research, or engagement analysis. The actor saves community and author context alongside each record for easier downstream filtering.
User Activity Review
Collect recent post activity from public Reddit user profiles. This is useful for creator research, outreach preparation, and niche account monitoring.
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
startUrls | Array | Yes | Community example in actor input | Reddit URLs to scrape. Supports subreddit URLs, user profile URLs, and direct post URLs. |
maxPostCount | Integer | No | 4 | Maximum number of posts to save across the run. |
maxCommentsPerPost | Integer | No | 2 | Maximum number of comments to save for each post. Set 0 to skip comments. |
skipComments | Boolean | No | false | Skip comment extraction entirely. |
sort | String | No | "new" | Sort order for subreddit listings. |
time | String | No | "all" | Time filter used with supported sort modes. |
includeNSFW | Boolean | No | false | Include NSFW posts in the results. |
maxPostAgeDays | Integer | No | โ | Only save posts newer than the specified number of days. |
proxy | Object | No | Residential prefill | Proxy settings for higher-volume runs. |
Output Data
Each dataset item is either a post record or a comment record.
| Field | Type | Description |
|---|---|---|
dataType | String | Record type: post or comment |
id | String | Reddit full identifier such as t3_... or t1_... |
parsedId | String | Short Reddit item ID |
url | String | Absolute Reddit URL for the record |
permalink | String | Relative Reddit permalink |
title | String | Post title |
body | String | Post or comment text content |
html | String | HTML version of the text content when available |
username | String | Author username |
userId | String | Author full Reddit identifier |
communityName | String | Community name such as r/GrowthHacking |
parsedCommunityName | String | Community slug without the r/ prefix |
subredditId | String | Reddit community identifier |
postId | String | Parent post full identifier for comment records |
parsedPostId | String | Parent post short ID for comment records |
postTitle | String | Parent post title for comment records |
postAuthor | String | Parent post author for comment records |
postUrl | String | Parent post URL for comment records |
parentId | String | Parent item identifier for comment records |
numberOfComments | Integer | Total comment count on a post |
depth | Integer | Comment nesting depth |
upVotes | Integer | Score shown for the record |
ups | Integer | Upvote count when available |
downs | Integer | Downvote count when available |
upVoteRatio | Number | Post upvote ratio |
controversiality | Integer | Comment controversiality indicator |
flair | String | Post flair text |
authorFlairText | String | Comment author flair text |
authorFlairRichtext | String | Combined rich-text author flair content |
domain | String | Post domain value |
link | String | External link for link posts |
thumbnailUrl | String | Thumbnail URL or Reddit thumbnail marker |
imageUrls | Array | Extracted preview or gallery image URLs |
mediaType | String | Derived content type such as text, image, video, or link |
totalAwardsReceived | Integer | Number of awards received |
gilded | Integer | Gilded count |
isVideo | Boolean | Whether the post is a video |
isAd | Boolean | Whether the post is promoted |
isSelf | Boolean | Whether the post is a self-post |
isPinned | Boolean | Whether the post is pinned |
isStickied | Boolean | Whether the record is stickied |
isLocked | Boolean | Whether the record is locked |
isSpoiler | Boolean | Whether the post is marked as spoiler |
isArchived | Boolean | Whether the record is archived |
isCollapsed | Boolean | Whether the comment is collapsed |
isSubmitter | Boolean | Whether the comment author is the original poster |
over18 | Boolean | NSFW flag |
authorIsBlocked | Boolean | Author blocked flag when exposed |
authorPremium | Boolean | Reddit premium flag for the author |
distinguished | String | Moderator or admin distinction value |
discussionType | String | Post discussion type when present |
category | String | Category value when present |
removedByCategory | String | Removal category when exposed |
createdAt | String | Creation timestamp in ISO 8601 format |
editedAt | String | Edit timestamp in ISO 8601 format |
scrapedAt | String | Extraction timestamp in ISO 8601 format |
sourceUrl | String | Source URL from the input run |
retrievalSource | String | Extraction path used for the record |
Usage Examples
Scrape a Subreddit
Collect recent posts and comments from one subreddit:
{"startUrls": [{ "url": "https://www.reddit.com/r/GrowthHacking/" }],"maxPostCount": 20,"maxCommentsPerPost": 10,"sort": "new"}
Scrape Multiple Communities
Collect posts from several subreddits in one run:
{"startUrls": [{ "url": "https://www.reddit.com/r/technology/" },{ "url": "https://www.reddit.com/r/startups/" }],"maxPostCount": 30,"maxCommentsPerPost": 5,"sort": "top","time": "week"}
Scrape a Direct Post Thread
Collect a single thread and its comments:
{"startUrls": [{"url": "https://www.reddit.com/r/GrowthHacking/comments/1tuorhf/best_inbound_ai_sdr_tools_in_2026_or_are_we_all/"}],"maxPostCount": 1,"maxCommentsPerPost": 50}
Scrape a User Feed
Collect recent posts from a Reddit user profile:
{"startUrls": [{ "url": "https://www.reddit.com/user/example_user/" }],"maxPostCount": 15,"skipComments": true}
Sample Output
{"dataType": "comment","id": "t1_opaxdor","parsedId": "opaxdor","url": "https://www.reddit.com/r/GrowthHacking/comments/1tuorhf/best_inbound_ai_sdr_tools_in_2026_or_are_we_all/opaxdor/","permalink": "/r/GrowthHacking/comments/1tuorhf/best_inbound_ai_sdr_tools_in_2026_or_are_we_all/opaxdor/","parentId": "t3_1tuorhf","postId": "t3_1tuorhf","parsedPostId": "1tuorhf","postTitle": "Best inbound ai sdr tools in 2026 or are we all just paying for better dashboards?","postAuthor": "GoldTap9957","postUrl": "https://www.reddit.com/r/GrowthHacking/comments/1tuorhf/best_inbound_ai_sdr_tools_in_2026_or_are_we_all/","username": "LeaderAtLeading","userId": "t2_2c0mv5otpl","communityName": "r/GrowthHacking","parsedCommunityName": "GrowthHacking","subredditId": "t5_2vpgj","body": "Most are dashboards with LLMs glued on. Real signal is still manual to verify.","html": "<div class=\"md\"><p>Most are dashboards with LLMs glued on. Real signal is still manual to verify.</p></div>","depth": 0,"upVotes": 1,"ups": 1,"downs": 0,"controversiality": 0,"totalAwardsReceived": 0,"isSubmitter": false,"isStickied": false,"isLocked": false,"isArchived": false,"isCollapsed": false,"authorIsBlocked": false,"authorPremium": true,"createdAt": "2026-06-02T12:06:29.000Z","scrapedAt": "2026-06-02T13:43:02.535Z","sourceUrl": "https://www.reddit.com/r/GrowthHacking","retrievalSource": "comment_thread"}
Tips for Best Results
Choose Strong Source URLs
- Use direct subreddit, user, or post URLs from
www.reddit.com - Start with a small
maxPostCountwhen validating a new target - Use direct post URLs when you care most about one specific thread
Control Dataset Size
- Keep
maxCommentsPerPostlow for fast monitoring runs - Increase comment limits only when thread depth matters to your analysis
- Use
skipComments: truewhen you only need post-level data
Improve Signal Quality
- Use
sort: "top"with atimefilter for high-engagement posts - Use
maxPostAgeDaysto focus on fresh discussions - Leave
includeNSFWdisabled unless that content is required
Proxy Configuration
For larger or more sensitive runs, configure Apify Proxy in the actor input:
{"proxy": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Integrations
Connect your Reddit dataset with:
- Google Sheets โ Review community activity in a familiar spreadsheet workflow
- Airtable โ Build searchable records of posts and comments
- Slack โ Share noteworthy thread activity with your team
- Webhooks โ Send new records into your own systems
- Make โ Automate monitoring and reporting flows
- Zapier โ Trigger downstream actions from dataset exports
Export Formats
- JSON โ For structured application workflows
- CSV โ For spreadsheets and bulk analysis
- Excel โ For reporting and review
- XML โ For system integrations that require XML
Frequently Asked Questions
What kinds of Reddit URLs can I use?
You can provide subreddit URLs, user profile URLs, or direct thread URLs. Mixed inputs are supported in the same run.
Does the actor collect comments?
Yes. Comments are collected for saved posts unless you set skipComments to true or set maxCommentsPerPost to 0.
How are duplicate records handled?
The actor tracks seen post and comment IDs during the run and skips repeat records before saving them.
Why do some fields appear only on certain items?
Reddit does not expose every field on every post or comment. Empty values are omitted from saved records to keep the dataset cleaner.
Can I focus on only recent posts?
Yes. Use maxPostAgeDays to restrict the run to newer posts.
What happens when a thread has more hidden comments?
The actor continues expanding thread comments until it reaches your per-post comment limit or the thread has no more retrievable comments.
Do I need a proxy for every run?
Not always. Smaller local runs can work without a proxy, while larger or repeated collection is more reliable with Apify Proxy enabled.
Support
For issues or feature requests, contact support through the Apify Console.
Resources
Legal Notice
This actor is designed for legitimate data collection purposes. Users are responsible for ensuring compliance with Reddit's terms of service and applicable laws. Use data responsibly and respect platform limits.
