Reddit Community Scraper ๐Ÿ‘พ avatar

Reddit Community Scraper ๐Ÿ‘พ

Pricing

Pay per usage

Go to Apify Store
Reddit Community Scraper ๐Ÿ‘พ

Reddit Community Scraper ๐Ÿ‘พ

Efficiently extract detailed data from Reddit communities and subreddits. This lightweight actor is designed for speed and simplicity. For optimal performance and to minimize the risk of rate limiting or blocking, the use of residential proxies is highly recommended.

Pricing

Pay per usage

Rating

5.0

(1)

Developer

Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

2

Bookmarked

27

Total users

7

Monthly active users

a day ago

Last modified

Share

Reddit Community Scraper

Extract Reddit posts and comments from subreddits, user feeds, and direct thread URLs in a structured dataset. Collect titles, text, engagement metrics, community metadata, comment depth, author details, and source references for research, monitoring, and analysis.

Features

  • Subreddit collection โ€” Gather recent posts from one or more Reddit communities
  • Thread comment extraction โ€” Collect top-level comments and nested replies from each post
  • User feed support โ€” Pull post activity from Reddit user profile URLs
  • Rich metadata output โ€” Save author, community, engagement, moderation, flair, and timestamp fields
  • Clean datasets โ€” Skip duplicate records and omit null or empty values from saved items
  • Flexible limits โ€” Control how many posts and comments to collect in each run

Use Cases

Community Research

Track what specific Reddit communities are discussing right now. Use structured post and comment data to study trends, recurring questions, and audience language.

Competitive Monitoring

Watch how people talk about products, services, and industries inside target subreddits. Capture both original posts and comment responses for a fuller view of sentiment and objections.

Content Analysis

Build datasets for topic clustering, moderation research, or engagement analysis. The actor saves community and author context alongside each record for easier downstream filtering.

User Activity Review

Collect recent post activity from public Reddit user profiles. This is useful for creator research, outreach preparation, and niche account monitoring.


Input Parameters

ParameterTypeRequiredDefaultDescription
startUrlsArrayYesCommunity example in actor inputReddit URLs to scrape. Supports subreddit URLs, user profile URLs, and direct post URLs.
maxPostCountIntegerNo4Maximum number of posts to save across the run.
maxCommentsPerPostIntegerNo2Maximum number of comments to save for each post. Set 0 to skip comments.
skipCommentsBooleanNofalseSkip comment extraction entirely.
sortStringNo"new"Sort order for subreddit listings.
timeStringNo"all"Time filter used with supported sort modes.
includeNSFWBooleanNofalseInclude NSFW posts in the results.
maxPostAgeDaysIntegerNoโ€”Only save posts newer than the specified number of days.
proxyObjectNoResidential prefillProxy settings for higher-volume runs.

Output Data

Each dataset item is either a post record or a comment record.

FieldTypeDescription
dataTypeStringRecord type: post or comment
idStringReddit full identifier such as t3_... or t1_...
parsedIdStringShort Reddit item ID
urlStringAbsolute Reddit URL for the record
permalinkStringRelative Reddit permalink
titleStringPost title
bodyStringPost or comment text content
htmlStringHTML version of the text content when available
usernameStringAuthor username
userIdStringAuthor full Reddit identifier
communityNameStringCommunity name such as r/GrowthHacking
parsedCommunityNameStringCommunity slug without the r/ prefix
subredditIdStringReddit community identifier
postIdStringParent post full identifier for comment records
parsedPostIdStringParent post short ID for comment records
postTitleStringParent post title for comment records
postAuthorStringParent post author for comment records
postUrlStringParent post URL for comment records
parentIdStringParent item identifier for comment records
numberOfCommentsIntegerTotal comment count on a post
depthIntegerComment nesting depth
upVotesIntegerScore shown for the record
upsIntegerUpvote count when available
downsIntegerDownvote count when available
upVoteRatioNumberPost upvote ratio
controversialityIntegerComment controversiality indicator
flairStringPost flair text
authorFlairTextStringComment author flair text
authorFlairRichtextStringCombined rich-text author flair content
domainStringPost domain value
linkStringExternal link for link posts
thumbnailUrlStringThumbnail URL or Reddit thumbnail marker
imageUrlsArrayExtracted preview or gallery image URLs
mediaTypeStringDerived content type such as text, image, video, or link
totalAwardsReceivedIntegerNumber of awards received
gildedIntegerGilded count
isVideoBooleanWhether the post is a video
isAdBooleanWhether the post is promoted
isSelfBooleanWhether the post is a self-post
isPinnedBooleanWhether the post is pinned
isStickiedBooleanWhether the record is stickied
isLockedBooleanWhether the record is locked
isSpoilerBooleanWhether the post is marked as spoiler
isArchivedBooleanWhether the record is archived
isCollapsedBooleanWhether the comment is collapsed
isSubmitterBooleanWhether the comment author is the original poster
over18BooleanNSFW flag
authorIsBlockedBooleanAuthor blocked flag when exposed
authorPremiumBooleanReddit premium flag for the author
distinguishedStringModerator or admin distinction value
discussionTypeStringPost discussion type when present
categoryStringCategory value when present
removedByCategoryStringRemoval category when exposed
createdAtStringCreation timestamp in ISO 8601 format
editedAtStringEdit timestamp in ISO 8601 format
scrapedAtStringExtraction timestamp in ISO 8601 format
sourceUrlStringSource URL from the input run
retrievalSourceStringExtraction path used for the record

Usage Examples

Scrape a Subreddit

Collect recent posts and comments from one subreddit:

{
"startUrls": [
{ "url": "https://www.reddit.com/r/GrowthHacking/" }
],
"maxPostCount": 20,
"maxCommentsPerPost": 10,
"sort": "new"
}

Scrape Multiple Communities

Collect posts from several subreddits in one run:

{
"startUrls": [
{ "url": "https://www.reddit.com/r/technology/" },
{ "url": "https://www.reddit.com/r/startups/" }
],
"maxPostCount": 30,
"maxCommentsPerPost": 5,
"sort": "top",
"time": "week"
}

Scrape a Direct Post Thread

Collect a single thread and its comments:

{
"startUrls": [
{
"url": "https://www.reddit.com/r/GrowthHacking/comments/1tuorhf/best_inbound_ai_sdr_tools_in_2026_or_are_we_all/"
}
],
"maxPostCount": 1,
"maxCommentsPerPost": 50
}

Scrape a User Feed

Collect recent posts from a Reddit user profile:

{
"startUrls": [
{ "url": "https://www.reddit.com/user/example_user/" }
],
"maxPostCount": 15,
"skipComments": true
}

Sample Output

{
"dataType": "comment",
"id": "t1_opaxdor",
"parsedId": "opaxdor",
"url": "https://www.reddit.com/r/GrowthHacking/comments/1tuorhf/best_inbound_ai_sdr_tools_in_2026_or_are_we_all/opaxdor/",
"permalink": "/r/GrowthHacking/comments/1tuorhf/best_inbound_ai_sdr_tools_in_2026_or_are_we_all/opaxdor/",
"parentId": "t3_1tuorhf",
"postId": "t3_1tuorhf",
"parsedPostId": "1tuorhf",
"postTitle": "Best inbound ai sdr tools in 2026 or are we all just paying for better dashboards?",
"postAuthor": "GoldTap9957",
"postUrl": "https://www.reddit.com/r/GrowthHacking/comments/1tuorhf/best_inbound_ai_sdr_tools_in_2026_or_are_we_all/",
"username": "LeaderAtLeading",
"userId": "t2_2c0mv5otpl",
"communityName": "r/GrowthHacking",
"parsedCommunityName": "GrowthHacking",
"subredditId": "t5_2vpgj",
"body": "Most are dashboards with LLMs glued on. Real signal is still manual to verify.",
"html": "<div class=\"md\"><p>Most are dashboards with LLMs glued on. Real signal is still manual to verify.</p></div>",
"depth": 0,
"upVotes": 1,
"ups": 1,
"downs": 0,
"controversiality": 0,
"totalAwardsReceived": 0,
"isSubmitter": false,
"isStickied": false,
"isLocked": false,
"isArchived": false,
"isCollapsed": false,
"authorIsBlocked": false,
"authorPremium": true,
"createdAt": "2026-06-02T12:06:29.000Z",
"scrapedAt": "2026-06-02T13:43:02.535Z",
"sourceUrl": "https://www.reddit.com/r/GrowthHacking",
"retrievalSource": "comment_thread"
}

Tips for Best Results

Choose Strong Source URLs

  • Use direct subreddit, user, or post URLs from www.reddit.com
  • Start with a small maxPostCount when validating a new target
  • Use direct post URLs when you care most about one specific thread

Control Dataset Size

  • Keep maxCommentsPerPost low for fast monitoring runs
  • Increase comment limits only when thread depth matters to your analysis
  • Use skipComments: true when you only need post-level data

Improve Signal Quality

  • Use sort: "top" with a time filter for high-engagement posts
  • Use maxPostAgeDays to focus on fresh discussions
  • Leave includeNSFW disabled unless that content is required

Proxy Configuration

For larger or more sensitive runs, configure Apify Proxy in the actor input:

{
"proxy": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Integrations

Connect your Reddit dataset with:

  • Google Sheets โ€” Review community activity in a familiar spreadsheet workflow
  • Airtable โ€” Build searchable records of posts and comments
  • Slack โ€” Share noteworthy thread activity with your team
  • Webhooks โ€” Send new records into your own systems
  • Make โ€” Automate monitoring and reporting flows
  • Zapier โ€” Trigger downstream actions from dataset exports

Export Formats

  • JSON โ€” For structured application workflows
  • CSV โ€” For spreadsheets and bulk analysis
  • Excel โ€” For reporting and review
  • XML โ€” For system integrations that require XML

Frequently Asked Questions

What kinds of Reddit URLs can I use?

You can provide subreddit URLs, user profile URLs, or direct thread URLs. Mixed inputs are supported in the same run.

Does the actor collect comments?

Yes. Comments are collected for saved posts unless you set skipComments to true or set maxCommentsPerPost to 0.

How are duplicate records handled?

The actor tracks seen post and comment IDs during the run and skips repeat records before saving them.

Why do some fields appear only on certain items?

Reddit does not expose every field on every post or comment. Empty values are omitted from saved records to keep the dataset cleaner.

Can I focus on only recent posts?

Yes. Use maxPostAgeDays to restrict the run to newer posts.

What happens when a thread has more hidden comments?

The actor continues expanding thread comments until it reaches your per-post comment limit or the thread has no more retrievable comments.

Do I need a proxy for every run?

Not always. Smaller local runs can work without a proxy, while larger or repeated collection is more reliable with Apify Proxy enabled.


Support

For issues or feature requests, contact support through the Apify Console.

Resources


This actor is designed for legitimate data collection purposes. Users are responsible for ensuring compliance with Reddit's terms of service and applicable laws. Use data responsibly and respect platform limits.