Pricing

Pay per usage

Try for free

Go to Apify Store

Reddit Community Scraper 👾

Try for free

Efficiently extract detailed data from Reddit communities and subreddits. This lightweight actor is designed for speed and simplicity. For optimal performance and to minimize the risk of rate limiting or blocking, the use of residential proxies is highly recommended.

Pricing

Pay per usage

Rating

5.0

(1)

Developer

Shahid Irfan

Actor stats

Bookmarked

Total users

Monthly active users

a day ago

Last modified

Reddit Community Scraper

Extract Reddit posts and comments from subreddits, user feeds, and direct thread URLs in a structured dataset. Collect titles, text, engagement metrics, community metadata, comment depth, author details, and source references for research, monitoring, and analysis.

Features

Subreddit collection — Gather recent posts from one or more Reddit communities
Thread comment extraction — Collect top-level comments and nested replies from each post
User feed support — Pull post activity from Reddit user profile URLs
Rich metadata output — Save author, community, engagement, moderation, flair, and timestamp fields
Clean datasets — Skip duplicate records and omit null or empty values from saved items
Flexible limits — Control how many posts and comments to collect in each run

Use Cases

Community Research

Track what specific Reddit communities are discussing right now. Use structured post and comment data to study trends, recurring questions, and audience language.

Competitive Monitoring

Watch how people talk about products, services, and industries inside target subreddits. Capture both original posts and comment responses for a fuller view of sentiment and objections.

Content Analysis

Build datasets for topic clustering, moderation research, or engagement analysis. The actor saves community and author context alongside each record for easier downstream filtering.

User Activity Review

Collect recent post activity from public Reddit user profiles. This is useful for creator research, outreach preparation, and niche account monitoring.

Input Parameters

Parameter	Type	Required	Default	Description
`startUrls`	Array	Yes	Community example in actor input	Reddit URLs to scrape. Supports subreddit URLs, user profile URLs, and direct post URLs.
`maxPostCount`	Integer	No	`4`	Maximum number of posts to save across the run.
`maxCommentsPerPost`	Integer	No	`2`	Maximum number of comments to save for each post. Set `0` to skip comments.
`skipComments`	Boolean	No	`false`	Skip comment extraction entirely.
`sort`	String	No	`"new"`	Sort order for subreddit listings.
`time`	String	No	`"all"`	Time filter used with supported sort modes.
`includeNSFW`	Boolean	No	`false`	Include NSFW posts in the results.
`maxPostAgeDays`	Integer	No	—	Only save posts newer than the specified number of days.
`proxy`	Object	No	Residential prefill	Proxy settings for higher-volume runs.

Output Data

Each dataset item is either a post record or a comment record.

Field	Type	Description
`dataType`	String	Record type: `post` or `comment`
`id`	String	Reddit full identifier such as `t3_...` or `t1_...`
`parsedId`	String	Short Reddit item ID
`url`	String	Absolute Reddit URL for the record
`permalink`	String	Relative Reddit permalink
`title`	String	Post title
`body`	String	Post or comment text content
`html`	String	HTML version of the text content when available
`username`	String	Author username
`userId`	String	Author full Reddit identifier
`communityName`	String	Community name such as `r/GrowthHacking`
`parsedCommunityName`	String	Community slug without the `r/` prefix
`subredditId`	String	Reddit community identifier
`postId`	String	Parent post full identifier for comment records
`parsedPostId`	String	Parent post short ID for comment records
`postTitle`	String	Parent post title for comment records
`postAuthor`	String	Parent post author for comment records
`postUrl`	String	Parent post URL for comment records
`parentId`	String	Parent item identifier for comment records
`numberOfComments`	Integer	Total comment count on a post
`depth`	Integer	Comment nesting depth
`upVotes`	Integer	Score shown for the record
`ups`	Integer	Upvote count when available
`downs`	Integer	Downvote count when available
`upVoteRatio`	Number	Post upvote ratio
`controversiality`	Integer	Comment controversiality indicator
`flair`	String	Post flair text
`authorFlairText`	String	Comment author flair text
`authorFlairRichtext`	String	Combined rich-text author flair content
`domain`	String	Post domain value
`link`	String	External link for link posts
`thumbnailUrl`	String	Thumbnail URL or Reddit thumbnail marker
`imageUrls`	Array	Extracted preview or gallery image URLs
`mediaType`	String	Derived content type such as `text`, `image`, `video`, or `link`
`totalAwardsReceived`	Integer	Number of awards received
`gilded`	Integer	Gilded count
`isVideo`	Boolean	Whether the post is a video
`isAd`	Boolean	Whether the post is promoted
`isSelf`	Boolean	Whether the post is a self-post
`isPinned`	Boolean	Whether the post is pinned
`isStickied`	Boolean	Whether the record is stickied
`isLocked`	Boolean	Whether the record is locked
`isSpoiler`	Boolean	Whether the post is marked as spoiler
`isArchived`	Boolean	Whether the record is archived
`isCollapsed`	Boolean	Whether the comment is collapsed
`isSubmitter`	Boolean	Whether the comment author is the original poster
`over18`	Boolean	NSFW flag
`authorIsBlocked`	Boolean	Author blocked flag when exposed
`authorPremium`	Boolean	Reddit premium flag for the author
`distinguished`	String	Moderator or admin distinction value
`discussionType`	String	Post discussion type when present
`category`	String	Category value when present
`removedByCategory`	String	Removal category when exposed
`createdAt`	String	Creation timestamp in ISO 8601 format
`editedAt`	String	Edit timestamp in ISO 8601 format
`scrapedAt`	String	Extraction timestamp in ISO 8601 format
`sourceUrl`	String	Source URL from the input run
`retrievalSource`	String	Extraction path used for the record

Usage Examples

Scrape a Subreddit

Collect recent posts and comments from one subreddit:

{
    "startUrls": [
        { "url": "https://www.reddit.com/r/GrowthHacking/" }
    ],
    "maxPostCount": 20,
    "maxCommentsPerPost": 10,
    "sort": "new"
}

Scrape Multiple Communities

Collect posts from several subreddits in one run:

{
    "startUrls": [
        { "url": "https://www.reddit.com/r/technology/" },
        { "url": "https://www.reddit.com/r/startups/" }
    ],
    "maxPostCount": 30,
    "maxCommentsPerPost": 5,
    "sort": "top",
    "time": "week"
}

Scrape a Direct Post Thread

Collect a single thread and its comments:

{
    "startUrls": [
        {
            "url": "https://www.reddit.com/r/GrowthHacking/comments/1tuorhf/best_inbound_ai_sdr_tools_in_2026_or_are_we_all/"
        }
    ],
    "maxPostCount": 1,
    "maxCommentsPerPost": 50
}

Scrape a User Feed

Collect recent posts from a Reddit user profile:

{
    "startUrls": [
        { "url": "https://www.reddit.com/user/example_user/" }
    ],
    "maxPostCount": 15,
    "skipComments": true
}

Sample Output

{
    "dataType": "comment",
    "id": "t1_opaxdor",
    "parsedId": "opaxdor",
    "url": "https://www.reddit.com/r/GrowthHacking/comments/1tuorhf/best_inbound_ai_sdr_tools_in_2026_or_are_we_all/opaxdor/",
    "permalink": "/r/GrowthHacking/comments/1tuorhf/best_inbound_ai_sdr_tools_in_2026_or_are_we_all/opaxdor/",
    "parentId": "t3_1tuorhf",
    "postId": "t3_1tuorhf",
    "parsedPostId": "1tuorhf",
    "postTitle": "Best inbound ai sdr tools in 2026 or are we all just paying for better dashboards?",
    "postAuthor": "GoldTap9957",
    "postUrl": "https://www.reddit.com/r/GrowthHacking/comments/1tuorhf/best_inbound_ai_sdr_tools_in_2026_or_are_we_all/",
    "username": "LeaderAtLeading",
    "userId": "t2_2c0mv5otpl",
    "communityName": "r/GrowthHacking",
    "parsedCommunityName": "GrowthHacking",
    "subredditId": "t5_2vpgj",
    "body": "Most are dashboards with LLMs glued on. Real signal is still manual to verify.",
    "html": "<div class=\"md\"><p>Most are dashboards with LLMs glued on. Real signal is still manual to verify.</p></div>",
    "depth": 0,
    "upVotes": 1,
    "ups": 1,
    "downs": 0,
    "controversiality": 0,
    "totalAwardsReceived": 0,
    "isSubmitter": false,
    "isStickied": false,
    "isLocked": false,
    "isArchived": false,
    "isCollapsed": false,
    "authorIsBlocked": false,
    "authorPremium": true,
    "createdAt": "2026-06-02T12:06:29.000Z",
    "scrapedAt": "2026-06-02T13:43:02.535Z",
    "sourceUrl": "https://www.reddit.com/r/GrowthHacking",
    "retrievalSource": "comment_thread"
}

Tips for Best Results

Choose Strong Source URLs

Use direct subreddit, user, or post URLs from www.reddit.com
Start with a small maxPostCount when validating a new target
Use direct post URLs when you care most about one specific thread

Control Dataset Size

Keep maxCommentsPerPost low for fast monitoring runs
Increase comment limits only when thread depth matters to your analysis
Use skipComments: true when you only need post-level data

Improve Signal Quality

Use sort: "top" with a time filter for high-engagement posts
Use maxPostAgeDays to focus on fresh discussions
Leave includeNSFW disabled unless that content is required

Proxy Configuration

For larger or more sensitive runs, configure Apify Proxy in the actor input:

{
    "proxy": {
        "useApifyProxy": true,
        "apifyProxyGroups": ["RESIDENTIAL"]
    }
}

Integrations

Connect your Reddit dataset with:

Google Sheets — Review community activity in a familiar spreadsheet workflow
Airtable — Build searchable records of posts and comments
Slack — Share noteworthy thread activity with your team
Webhooks — Send new records into your own systems
Make — Automate monitoring and reporting flows
Zapier — Trigger downstream actions from dataset exports

Export Formats

JSON — For structured application workflows
CSV — For spreadsheets and bulk analysis
Excel — For reporting and review
XML — For system integrations that require XML

Frequently Asked Questions

What kinds of Reddit URLs can I use?

You can provide subreddit URLs, user profile URLs, or direct thread URLs. Mixed inputs are supported in the same run.

Does the actor collect comments?

Yes. Comments are collected for saved posts unless you set skipComments to true or set maxCommentsPerPost to 0.

How are duplicate records handled?

The actor tracks seen post and comment IDs during the run and skips repeat records before saving them.

Why do some fields appear only on certain items?

Reddit does not expose every field on every post or comment. Empty values are omitted from saved records to keep the dataset cleaner.

Can I focus on only recent posts?

Yes. Use maxPostAgeDays to restrict the run to newer posts.

What happens when a thread has more hidden comments?

The actor continues expanding thread comments until it reaches your per-post comment limit or the thread has no more retrievable comments.

Do I need a proxy for every run?

Not always. Smaller local runs can work without a proxy, while larger or repeated collection is more reliable with Apify Proxy enabled.

Support

For issues or feature requests, contact support through the Apify Console.

Resources

Legal Notice

This actor is designed for legitimate data collection purposes. Users are responsible for ensuring compliance with Reddit's terms of service and applicable laws. Use data responsibly and respect platform limits.

NHS UK Jobs Scraper

shahidirfan/NHS-UK-jobs-Scraper

Efficiently extract vacancies from the UK's official health job board. This lightweight actor is designed for speed and reliability. For the best performance and to avoid blocking, using residential proxies is highly recommended. Streamline your healthcare recruitment data today!

Shahid Irfan

Redfin Property Scraper 🏠

shahidirfan/Redfin-Property-Scraper

Extract real estate listings, property details, and market insights from Redfin. This lightweight scraper is optimized for speed and efficiency. For consistent results and to prevent blocking, the use of residential proxies is highly recommended.

Shahid Irfan

5.0

FinnNO Job Scraper

shahidirfan/FinnNO-Job-Scraper

Meet the FinnNO Job Scraper, a lightweight actor designed for efficiently extracting job listings from Finn.no. Fast and reliable. For optimal performance and to minimize blocking, the use of residential proxies is strongly recommended. Access Norwegian job market data effortlessly!

Shahid Irfan

5.0

Propertyfinder Scraper 🏠

shahidirfan/Propertyfinder-Scraper

Efficiently scrape real estate listings from Propertyfinder with this lightweight actor. Extract property details, prices, and locations quickly. For the most reliable performance and to minimize blocking risks, using residential proxies is strongly recommended.

Shahid Irfan

5.0

Timesjobs Scraper 💼

shahidirfan/Timesjobs-Scraper

Extract job listings efficiently from Timesjobs, a leading Indian career portal. This lightweight actor is designed for fast data collection. For optimal stability and to prevent blocking, the use of residential proxies is strongly recommended.

Shahid Irfan

Wuzzuf Jobs Scraper 🔍

shahidirfan/Wuzzuf-Jobs-Scraper

Extract job listings efficiently from Wuzzuf, Egypt's leading employment platform. This lightweight actor is designed for speed and ease of use. To ensure the best stability and avoid potential blocking, using residential proxies is highly recommended.

Shahid Irfan

5.0

Randstad Job Scraper

shahidirfan/Randstad-Job-Scraper

Extract job listings efficiently with the Randstad Job Scraper. This lightweight solution is built for speed and ease of use. To ensure seamless extraction and reliable performance, the use of residential proxies is highly recommended. Start gathering recruitment data instantly.

Shahid Irfan

5.0

ClearedJobs Scraper

shahidirfan/ClearedJobs-Scraper

Effortlessly extract security-cleared job listings with the ClearedJobs Scraper. This lightweight actor is designed for fast, efficient data extraction. For optimal performance and to avoid IP blocking, using residential proxies is highly recommended. Streamline your recruitment data today!

Shahid Irfan

Roberthalf Jobs Scraper

shahidirfan/Roberthalf-Jobs-Scraper

Efficiently extract detailed job listings from Robert Half, a premier professional staffing agency. This lightweight actor is designed for speed and reliability. To ensure seamless access and avoid blocking, using residential proxies is highly recommended.

Shahid Irfan

5.0

WeWorkRemotely Jobs Scraper

shahidirfan/WeWorkRemotely-Job-Scrapper

Introducing the WeWorkRemotely Jobs Scrapper, a lightweight actor designed to efficiently extract remote job listings from WeWorkRemotely. Fast, simple, and reliable. For optimal performance and to avoid blocking, the use of residential proxies is highly recommended. Start scraping today!

Shahid Irfan

5.0

Reddit Community Scraper 👾

Reddit Community Scraper

Features

Use Cases

Community Research

Competitive Monitoring

Content Analysis

User Activity Review

Input Parameters

Output Data

Usage Examples

Scrape a Subreddit

Scrape Multiple Communities

Scrape a Direct Post Thread

Scrape a User Feed

Sample Output

Tips for Best Results

Choose Strong Source URLs

Control Dataset Size

Improve Signal Quality

Proxy Configuration

Integrations

Export Formats

Frequently Asked Questions

What kinds of Reddit URLs can I use?

Does the actor collect comments?

How are duplicate records handled?

Why do some fields appear only on certain items?

Can I focus on only recent posts?

What happens when a thread has more hidden comments?

Do I need a proxy for every run?

Support

Resources

Legal Notice

You might also like

NHS UK Jobs Scraper

Redfin Property Scraper 🏠

FinnNO Job Scraper

Propertyfinder Scraper 🏠

Timesjobs Scraper 💼

Wuzzuf Jobs Scraper 🔍

Randstad Job Scraper

ClearedJobs Scraper

Roberthalf Jobs Scraper

WeWorkRemotely Jobs Scraper

Related articles