# Reddit Scraper (`brilliant_gum/reddit-scraper`) Actor

Extract posts, comments, and user profiles from any subreddit, search query, or Reddit URL. No API key required. Built for market research, brand monitoring, sentiment analysis, and AI/LLM training datasets.

- **URL**: https://apify.com/brilliant\_gum/reddit-scraper.md
- **Developed by:** [Yuliia Kulakova](https://apify.com/brilliant_gum) (community)
- **Categories:** Social media, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $8.00 / 1,000 post scrapeds

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Reddit Scraper — Posts, Comments & Profiles

![Reddit Scraper](https://i.postimg.cc/W3D6j2Hj/reddit-banner-clean.png)

Extract posts, comments, and user profiles from any subreddit, search query, or Reddit URL — no API key required. Built for market research, brand monitoring, sentiment analysis, competitor intelligence, and AI/LLM training datasets.

---

### 💰 Pricing

Pay only for what you extract — three separate billing events:

| What               | Cost                    |
|--------------------|-------------------------|
| 📄 Posts           | **$8 per 1,000**        |
| 💬 Comments        | **$6 per 1,000**        |
| 👤 User profiles   | **$8 per 1,000**        |

A small one-time actor-start fee applies per run. Posts filtered out by score, date, or comment count are **not** charged.

---

### ✨ Key Features

#### 📄 Full Post Data
Every post record includes title, full body text, author, subreddit, score, upvote ratio, comment count, publish date, flair, NSFW flag, award count, thumbnail URL, and external link (for link posts). Rich structured JSON ready for analysis or AI pipelines.

#### 💬 Comments with Full Thread Structure
Captures top-level comments and all nested replies in a single flat dataset. Each comment includes the full body text, author, score, depth level, and `parentId` for reconstructing threads. Deleted and removed comments are automatically skipped.

#### 👤 User Profiles
Fetches the Reddit profile of each post author: total karma, link karma, comment karma, account age, gold status. Each unique author is fetched only once per run — no duplicate charges.

#### 🔄 Four Input Types
- **Subreddit URLs** — scrape posts from any public subreddit
- **Post URLs** — scrape a specific post and optionally its comments
- **User profile URLs** — scrape a specific Reddit user's profile
- **Search queries** — find posts matching keywords across all of Reddit

#### 🔃 Sort & Time Filters
Choose how posts are sorted: **Hot**, **New**, **Top**, or **Rising**. For Top posts, filter by time range: past hour, day, week, month, year, or all time.

#### 🔍 Powerful Filters
- **Minimum score** — skip low-engagement posts
- **Minimum comments** — only posts with real discussion
- **Date from** — only posts published after a specific date
- **Exclude NSFW** — filter out adult content

---

### 🚀 Quick Start

#### Option 1 — Subreddit
Paste a subreddit URL to scrape its posts.
````

https://www.reddit.com/r/technology/
https://www.reddit.com/r/MachineLearning/
https://www.reddit.com/r/startups/

```

#### Option 2 — Specific Post
Paste a post URL to scrape that post and optionally all its comments.
```

https://www.reddit.com/r/technology/comments/abc123/post\_title/

```

#### Option 3 — User Profile
Paste a user profile URL to scrape their Reddit profile data.
```

https://www.reddit.com/u/spez
https://www.reddit.com/user/spez

````

#### Option 4 — Search Queries
Provide one or more search terms as a list. The scraper returns the most relevant posts for each query.
```json
["ChatGPT alternatives", "best productivity tools 2025", "startup advice"]
````

***

### ⚙️ Input Parameters

| Parameter              | Type    | Default  | Description |
|------------------------|---------|----------|-------------|
| `startUrls`            | array   | —        | Reddit URLs: subreddit, post, or user profile |
| `searchQueries`        | array   | —        | Keywords to search across Reddit |
| `maxPosts`             | integer | 50       | Maximum posts to scrape (hard cap: 1,000) |
| `sort`                 | string  | `hot`    | Post sort order: `hot`, `new`, `top`, `rising` |
| `timeFilter`           | string  | `week`   | Time range for Top sort: `hour`, `day`, `week`, `month`, `year`, `all` |
| `scrapeComments`       | boolean | false    | Extract comments for each post |
| `maxCommentsPerPost`   | integer | 100      | Maximum comments per post (including replies) |
| `scrapeUserProfiles`   | boolean | false    | Fetch author profile for each post |
| `filterByMinScore`     | integer | 0        | Skip posts with fewer upvotes than this |
| `filterByMinComments`  | integer | 0        | Skip posts with fewer comments than this |
| `filterByDateFrom`     | string  | —        | Only posts published on or after `YYYY-MM-DD` |
| `excludeNsfw`          | boolean | false    | Exclude NSFW (18+) posts |

***

### 📦 Output Format

All results are saved to the Apify dataset. Three record types are mixed in a single dataset and can be filtered by the `type` field.

***

#### Post Record (`type: "post"`)

One record per scraped post.

```json
{
  "type": "post",
  "postId": "1abc23",
  "url": "https://www.reddit.com/r/technology/comments/1abc23/title/",
  "title": "OpenAI releases new model with 10x lower cost",
  "body": "The full post body text goes here...",
  "author": "tech_user_42",
  "subreddit": "technology",
  "subredditSubscribers": 14500000,
  "score": 8420,
  "upvoteRatio": 0.94,
  "numComments": 312,
  "createdAt": "2026-04-01",
  "flair": "AI",
  "isNsfw": false,
  "isSelf": true,
  "isStickied": false,
  "isLocked": false,
  "awards": 3,
  "thumbnailUrl": null,
  "externalUrl": null,
  "scrapedAt": "2026-04-09T10:00:00.000Z"
}
```

**Field reference:**

| Field | Type | Description |
|-------|------|-------------|
| `postId` | string | Reddit post ID |
| `url` | string | Full URL to the post |
| `title` | string | Post title |
| `body` | string | Post body text (empty for link posts) |
| `author` | string | Reddit username of the author |
| `subreddit` | string | Subreddit name |
| `subredditSubscribers` | integer | Number of subreddit members |
| `score` | integer | Net upvotes (upvotes minus downvotes) |
| `upvoteRatio` | number | Ratio of upvotes to total votes (0–1) |
| `numComments` | integer | Number of comments on the post |
| `createdAt` | string | Date the post was published (YYYY-MM-DD) |
| `flair` | string | Post flair tag set by the author or moderators |
| `isNsfw` | boolean | True if the post is marked as 18+ |
| `isSelf` | boolean | True if this is a text post; false for link posts |
| `isStickied` | boolean | True if the post is pinned by moderators |
| `isLocked` | boolean | True if comments are disabled |
| `awards` | integer | Total number of awards received |
| `thumbnailUrl` | string | Thumbnail image URL (link posts only) |
| `externalUrl` | string | External link URL (link posts only) |
| `scrapedAt` | string | ISO timestamp of when the record was created |

***

#### Comment Record (`type: "comment"`)

One record per comment or reply. Use `depth` and `parentId` to reconstruct the thread structure.

```json
{
  "type": "comment",
  "commentId": "k1x2y3z",
  "postId": "1abc23",
  "postTitle": "OpenAI releases new model with 10x lower cost",
  "parentId": "t3_1abc23",
  "body": "This is a really interesting development. The cost reduction alone changes everything for startups.",
  "author": "ml_engineer_99",
  "score": 342,
  "depth": 0,
  "createdAt": "2026-04-01",
  "isStickied": false,
  "distinguished": null,
  "awards": 1,
  "scrapedAt": "2026-04-09T10:00:00.000Z"
}
```

**Field reference:**

| Field | Type | Description |
|-------|------|-------------|
| `commentId` | string | Unique Reddit comment ID |
| `postId` | string | ID of the parent post |
| `postTitle` | string | Title of the parent post |
| `parentId` | string | ID of the parent comment or post (`t1_...` for comment parent, `t3_...` for top-level) |
| `body` | string | Full comment text |
| `author` | string | Reddit username of the commenter |
| `score` | integer | Net upvotes on the comment |
| `depth` | integer | Nesting level (0 = top-level, 1 = reply, 2 = reply to reply, etc.) |
| `createdAt` | string | Date the comment was posted (YYYY-MM-DD) |
| `isStickied` | boolean | True if the comment is pinned by moderators |
| `distinguished` | string | `"moderator"` or `"admin"` if applicable, otherwise null |
| `awards` | integer | Number of awards on the comment |

***

#### Profile Record (`type: "profile"`)

One record per unique user. Only created when `scrapeUserProfiles` is enabled.

```json
{
  "type": "profile",
  "username": "tech_user_42",
  "profileUrl": "https://www.reddit.com/user/tech_user_42",
  "totalKarma": 48200,
  "linkKarma": 12000,
  "commentKarma": 36200,
  "isGold": false,
  "isEmployee": false,
  "createdAt": "2019-03-15",
  "iconUrl": "https://styles.redditmedia.com/...",
  "scrapedAt": "2026-04-09T10:00:00.000Z"
}
```

**Field reference:**

| Field | Type | Description |
|-------|------|-------------|
| `username` | string | Reddit username |
| `profileUrl` | string | Full URL to the user's profile |
| `totalKarma` | integer | Total karma (link + comment) |
| `linkKarma` | integer | Karma from posts |
| `commentKarma` | integer | Karma from comments |
| `isGold` | boolean | True if the user has Reddit Gold |
| `isEmployee` | boolean | True if the user is a Reddit employee |
| `createdAt` | string | Account creation date (YYYY-MM-DD) |
| `iconUrl` | string | Profile avatar image URL |

***

### 🔍 Use Case Examples

#### Brand monitoring — find mentions of your product

```json
{
  "searchQueries": ["notion app", "notion review", "notion alternative"],
  "maxPosts": 200,
  "scrapeComments": true,
  "maxCommentsPerPost": 200,
  "filterByMinScore": 10
}
```

#### Competitor sentiment analysis

```json
{
  "searchQueries": ["linear vs jira", "figma vs sketch 2026", "shopify vs woocommerce"],
  "maxPosts": 100,
  "scrapeComments": true,
  "maxCommentsPerPost": 300,
  "filterByMinComments": 20
}
```

#### Trending topics in a niche subreddit

```json
{
  "startUrls": [{ "url": "https://www.reddit.com/r/MachineLearning/" }],
  "maxPosts": 100,
  "sort": "top",
  "timeFilter": "week",
  "filterByMinScore": 100,
  "scrapeComments": true,
  "maxCommentsPerPost": 100
}
```

#### AI/LLM training dataset from a subreddit

```json
{
  "startUrls": [{ "url": "https://www.reddit.com/r/personalfinance/" }],
  "maxPosts": 1000,
  "sort": "top",
  "timeFilter": "year",
  "scrapeComments": true,
  "maxCommentsPerPost": 200,
  "filterByMinScore": 50
}
```

#### Lead generation — find people asking for recommendations

```json
{
  "searchQueries": ["looking for CRM recommendation", "best project management tool", "need accounting software"],
  "maxPosts": 100,
  "scrapeComments": true,
  "maxCommentsPerPost": 100,
  "filterByMinComments": 5,
  "scrapeUserProfiles": true
}
```

#### Monitor a subreddit for recent posts

```json
{
  "startUrls": [{ "url": "https://www.reddit.com/r/entrepreneur/" }],
  "maxPosts": 50,
  "sort": "new",
  "filterByDateFrom": "2026-04-01",
  "scrapeComments": false
}
```

#### Scrape a specific viral post with all comments

```json
{
  "startUrls": [{ "url": "https://www.reddit.com/r/AskReddit/comments/xyz123/post_title/" }],
  "scrapeComments": true,
  "maxCommentsPerPost": 500
}
```

#### Research influencers in a subreddit

```json
{
  "startUrls": [{ "url": "https://www.reddit.com/r/webdev/" }],
  "maxPosts": 100,
  "sort": "top",
  "timeFilter": "month",
  "scrapeUserProfiles": true,
  "filterByMinScore": 200
}
```

***

### 📊 Who Uses This

| Use Case | Who | What They Get |
|----------|-----|---------------|
| **Brand monitoring** | Marketing teams | All Reddit mentions of a brand or product in structured JSON |
| **Competitor research** | Product managers | What users say about competitor products across relevant subreddits |
| **Sentiment analysis** | Analysts | Comment corpora with scores, dates, and thread context |
| **Lead generation** | Sales teams | Posts where people ask for product/service recommendations |
| **LLM training data** | AI & ML teams | High-quality discussion threads from expert communities |
| **Trend discovery** | Marketers & creators | What's going viral in a niche before it hits mainstream |
| **Academic research** | Researchers | Public discussion datasets for NLP and social science |
| **Influencer identification** | Agencies | Top contributors in niche subreddits with karma and activity |
| **Market research** | Consultants | Consumer opinions, pain points, and demand signals |
| **Financial research** | Investors | Retail investor sentiment from finance subreddits |

***

### 💡 Pro Tips

**1. Use Top + time filter for the best content**
Set `sort: "top"` with `timeFilter: "month"` or `"year"` to get the highest-quality, most-upvoted posts in a subreddit. These tend to have the most valuable comments and discussion.

**2. Combine subreddits and search in one run**
You can mix `startUrls` (subreddits) and `searchQueries` in a single run. Results from all sources are deduplicated — each post is processed only once.

**3. Filter by minimum score to skip noise**
Set `filterByMinScore: 50` or higher to skip low-engagement posts that have few votes and are likely low quality. This reduces cost and improves dataset quality.

**4. Author profiles are deduplicated automatically**
When `scrapeUserProfiles` is enabled, each unique author is fetched only once — even if they authored multiple posts in the run. You are only charged once per author.

**5. Use search for cross-subreddit coverage**
A search query like `"best CRM tool"` finds posts from r/sales, r/startups, r/smallbusiness, and more — all in one run. More comprehensive than scraping individual subreddits.

**6. Nested comments via parentId**
Comment records include a `parentId` field. If `parentId` starts with `t3_`, the comment is a top-level reply to the post. If it starts with `t1_`, it is a reply to another comment. Use `depth` to quickly filter by nesting level.

**7. Schedule weekly incremental runs**
Use Apify Scheduler with `filterByDateFrom` set to the previous Monday. This way each run only picks up new posts and you never scrape the same content twice.

**8. NSFW filtering**
Enable `excludeNsfw` when scraping general-topic subreddits (like r/AskReddit or r/funny) to keep datasets clean for professional or academic use.

***

### ❓ FAQ

**Q: Do I need a Reddit API key or account?**
No. The scraper uses Reddit's public JSON API — accessible by appending `.json` to any Reddit URL. No API key, OAuth token, or Reddit account is required.

**Q: Why is there a 1,000 post limit?**
This is a hard limit enforced by Reddit's API. Regardless of pagination, Reddit's listing endpoints return a maximum of 1,000 posts per sort category. This limit cannot be bypassed. For most use cases 1,000 posts provides more than enough data.

**Q: Can I scrape private subreddits?**
No. The scraper only accesses publicly available content — the same content visible to any logged-out user. Private, quarantined, and banned subreddits return an error and are skipped.

**Q: Can I scrape NSFW subreddits?**
NSFW subreddit content requires Reddit account authentication, which this scraper does not use. NSFW content from public feeds (mixed in with regular posts) is accessible, but dedicated NSFW subreddits are not.

**Q: Why might some posts show score: 0?**
Reddit applies vote fuzzing to all posts — the displayed score is slightly randomized to prevent vote manipulation detection. Posts with very few votes may show 0 even if they have some upvotes.

**Q: How are comments structured?**
Comments are returned as a flat list. Use `depth` (0 = top-level, 1 = reply, 2 = reply to reply) and `parentId` to reconstruct the full thread tree in your own code.

**Q: Are deleted comments included?**
No. Comments where the body is `[deleted]` or `[removed]` are automatically skipped. Only comments with actual text content are saved.

**Q: How does billing work?**
You are charged per event: **$8 per 1,000 posts**, **$6 per 1,000 comments**, and **$8 per 1,000 user profiles**. Posts that are filtered out by score, date, or comment count are not billed. A small one-time actor-start fee applies per run.

**Q: Can I run this on a schedule?**
Yes. Use Apify Scheduler to run the actor daily or weekly. Set `filterByDateFrom` to avoid re-scraping old content. Each run only processes newly published posts from the specified date onward.

**Q: What happens if Reddit rate-limits the scraper?**
The scraper automatically reads Reddit's rate-limit headers (`X-Ratelimit-Remaining`, `X-Ratelimit-Reset`) and pauses when the quota is nearly exhausted. On HTTP 429 responses, it backs off with increasing delays before retrying. You will never lose data due to rate limiting.

***

### ⚠️ Limits & Notes

- **1,000 post cap** — Reddit's API hard limit per listing endpoint. Documented honestly; cannot be bypassed.
- **Public content only** — Private, quarantined, and banned subreddits are not accessible without authentication.
- **Vote fuzzing** — Reddit randomizes vote counts slightly; `score` values may differ slightly from what you see in the browser.
- **Comment depth** — Reddit limits comment thread depth to 10 levels. Deeply nested replies beyond level 10 are not returned by the API.
- **`[deleted]` content** — Posts or comments where the author deleted their account show `author: "[deleted]"`. The content may still be present or also deleted.
- **Relative dates** — All dates are converted to `YYYY-MM-DD` format from Unix timestamps for consistency.
- **NSFW subreddits** — Dedicated adult-content subreddits require OAuth authentication and are not accessible with this scraper.

***

### ⚖️ Legal & Ethical Use

This scraper accesses publicly available data on Reddit — the same data visible to any user without logging in. Use it for legitimate research, content analysis, and data science purposes.

Always comply with:

- [Reddit Terms of Service](https://www.redditinc.com/policies/user-agreement)
- [Reddit Privacy Policy](https://www.reddit.com/policies/privacy-policy)
- Applicable data protection regulations (GDPR, CCPA, etc.)

Do not use scraped data to harass individual users, build spam systems, or engage in vote manipulation.

***

### 🛠️ Technical Notes

- Built on the **Apify SDK** with pay-per-event billing (`Actor.charge()`)
- Uses **Reddit's public JSON API** via `www.reddit.com` — no browser automation, pure HTTP requests
- No browser automation required — pure HTTP requests for speed and low resource usage
- Automatically reads `X-Ratelimit-*` response headers and pauses before quota exhaustion
- Exponential backoff on HTTP 429 (rate limit) and transient HTTP errors
- Residential proxy routing on all requests for reliable access
- Comment threads are fully flattened recursively — all nested replies are captured regardless of depth
- Author profiles are deduplicated per run — each unique username is fetched at most once

# Actor input Schema

## `startUrls` (type: `array`):

Reddit URLs to scrape. Supports: subreddit URLs (/r/subreddit), post URLs (/r/sub/comments/id/...), and user profile URLs (/u/username or /user/username).

## `searchQueries` (type: `array`):

Keywords to search for across all of Reddit. Each query returns the most relevant posts.

## `maxPosts` (type: `integer`):

Maximum number of posts to scrape. Default: 50. Maximum: 1,000 (Reddit API hard limit).

## `sort` (type: `string`):

How to sort posts when scraping a subreddit.

## `timeFilter` (type: `string`):

Time range for Top posts. Only applies when Sort by is set to Top.

## `scrapeComments` (type: `boolean`):

If enabled, scrapes comments for each post. Charged separately at $6 per 1,000 comments.

## `maxCommentsPerPost` (type: `integer`):

Maximum number of comments to extract per post (including nested replies). Default: 100.

## `scrapeUserProfiles` (type: `boolean`):

If enabled, fetches the Reddit profile of each post's author (karma, account age, etc.). Each unique author is fetched only once. Charged at $8 per 1,000 profiles.

## `filterByMinScore` (type: `integer`):

Only include posts with at least this many upvotes. Leave 0 for no minimum.

## `filterByMinComments` (type: `integer`):

Only include posts with at least this many comments. Leave 0 for no minimum.

## `filterByDateFrom` (type: `string`):

Only include posts published on or after this date. Format: YYYY-MM-DD.

## `excludeNsfw` (type: `boolean`):

If enabled, posts marked as NSFW (18+) are excluded from results.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://www.reddit.com/r/technology/"
    }
  ],
  "searchQueries": [],
  "maxPosts": 50,
  "sort": "hot",
  "timeFilter": "week",
  "scrapeComments": false,
  "maxCommentsPerPost": 100,
  "scrapeUserProfiles": false,
  "filterByMinScore": 0,
  "filterByMinComments": 0,
  "excludeNsfw": false
}
```

# Actor output Schema

## `results` (type: `string`):

All scraped records in the default dataset. Post records include postId, url, title, body, author, subreddit, score, upvoteRatio, numComments, createdAt, flair, isNsfw, awards, thumbnailUrl, externalUrl. Comment records include commentId, postId, postTitle, parentId, body, author, score, depth, createdAt. Profile records include username, profileUrl, totalKarma, linkKarma, commentKarma, isGold, createdAt.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://www.reddit.com/r/technology/"
        }
    ],
    "searchQueries": []
};

// Run the Actor and wait for it to finish
const run = await client.actor("brilliant_gum/reddit-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://www.reddit.com/r/technology/" }],
    "searchQueries": [],
}

# Run the Actor and wait for it to finish
run = client.actor("brilliant_gum/reddit-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://www.reddit.com/r/technology/"
    }
  ],
  "searchQueries": []
}' |
apify call brilliant_gum/reddit-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=brilliant_gum/reddit-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Reddit Scraper",
        "description": "Extract posts, comments, and user profiles from any subreddit, search query, or Reddit URL. No API key required. Built for market research, brand monitoring, sentiment analysis, and AI/LLM training datasets.",
        "version": "0.1",
        "x-build-id": "qPi1qW81sKfO12jP0"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/brilliant_gum~reddit-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-brilliant_gum-reddit-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/brilliant_gum~reddit-scraper/runs": {
            "post": {
                "operationId": "runs-sync-brilliant_gum-reddit-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/brilliant_gum~reddit-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-brilliant_gum-reddit-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "Reddit URLs to scrape. Supports: subreddit URLs (/r/subreddit), post URLs (/r/sub/comments/id/...), and user profile URLs (/u/username or /user/username).",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "searchQueries": {
                        "title": "Search queries",
                        "type": "array",
                        "description": "Keywords to search for across all of Reddit. Each query returns the most relevant posts.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxPosts": {
                        "title": "Max posts",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of posts to scrape. Default: 50. Maximum: 1,000 (Reddit API hard limit).",
                        "default": 50
                    },
                    "sort": {
                        "title": "Sort by",
                        "enum": [
                            "hot",
                            "new",
                            "top",
                            "rising"
                        ],
                        "type": "string",
                        "description": "How to sort posts when scraping a subreddit.",
                        "default": "hot"
                    },
                    "timeFilter": {
                        "title": "Time filter",
                        "enum": [
                            "hour",
                            "day",
                            "week",
                            "month",
                            "year",
                            "all"
                        ],
                        "type": "string",
                        "description": "Time range for Top posts. Only applies when Sort by is set to Top.",
                        "default": "week"
                    },
                    "scrapeComments": {
                        "title": "Scrape comments",
                        "type": "boolean",
                        "description": "If enabled, scrapes comments for each post. Charged separately at $6 per 1,000 comments.",
                        "default": false
                    },
                    "maxCommentsPerPost": {
                        "title": "Max comments per post",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Maximum number of comments to extract per post (including nested replies). Default: 100.",
                        "default": 100
                    },
                    "scrapeUserProfiles": {
                        "title": "Scrape author profiles",
                        "type": "boolean",
                        "description": "If enabled, fetches the Reddit profile of each post's author (karma, account age, etc.). Each unique author is fetched only once. Charged at $8 per 1,000 profiles.",
                        "default": false
                    },
                    "filterByMinScore": {
                        "title": "Minimum score (upvotes)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Only include posts with at least this many upvotes. Leave 0 for no minimum.",
                        "default": 0
                    },
                    "filterByMinComments": {
                        "title": "Minimum comments",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Only include posts with at least this many comments. Leave 0 for no minimum.",
                        "default": 0
                    },
                    "filterByDateFrom": {
                        "title": "Published from date",
                        "pattern": "^(\\d{4}-\\d{2}-\\d{2})?$",
                        "type": "string",
                        "description": "Only include posts published on or after this date. Format: YYYY-MM-DD."
                    },
                    "excludeNsfw": {
                        "title": "Exclude NSFW posts",
                        "type": "boolean",
                        "description": "If enabled, posts marked as NSFW (18+) are excluded from results.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
