# Reddit Historical Archive Scraper (`logiover/reddit-historical-archive-scraper`) Actor

Access 10+ years of archived Reddit posts and comments via PullPush. Full-text comment search (Reddit can't do this). No login, no proxy. $0.001/item.

- **URL**: https://apify.com/logiover/reddit-historical-archive-scraper.md
- **Developed by:** [Logiover](https://apify.com/logiover) (community)
- **Categories:** Developer tools, Social media, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $1.50 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 🎯 Reddit All-in-One Scraper — Posts, Comments, Users, Subreddits, Search

**The most complete Reddit scraper on Apify. One actor, six modes, zero authentication.** Scrape subreddit listings, full post comment trees (including "load more" expansion), user profiles with their full submission and comment history, global search results, and subreddit discovery — all in a single run, all via Reddit's public `.json` endpoints. **$0.001 per item — the cheapest Reddit scraper available, half the price of the next-cheapest competitor.** Free 50 items per run.

This is the only Reddit scraper you need. No PRAW. No OAuth tokens. No API keys. No login. Just paste subreddits, URLs, usernames, or search queries — get clean structured JSON back. Pure HTTP, no browser overhead, no cookies, no risk of rate-limit bans on your account (because there's no account involved).

---

### 🚀 Why this Reddit scraper beats every alternative

| Feature | This actor | Most Apify Reddit scrapers |
|---|---|---|
| Price per 1,000 items | **$1.00** | $2.00 – $5.00 |
| Login required | ❌ No | ⚠️ Some need OAuth |
| Subreddit listings (hot/new/top/rising/controversial/best) | ✅ Yes | ✅ Most |
| Full post + complete comment tree | ✅ Yes (with recursive "load more" expansion) | ⚠️ Often shallow |
| **"More comments" expansion** (Reddit's hidden comments) | ✅ Yes (automatic) | ❌ Usually missing |
| User profile + posts + comments | ✅ Yes (all in one mode) | ⚠️ Often separate actors |
| Global search (posts) | ✅ Yes | ⚠️ Some |
| Subreddit discovery search | ✅ Yes | ❌ Rare |
| Auto-detect any Reddit URL type | ✅ Yes | ❌ No |
| Mix multiple input modes in ONE run | ✅ Yes (subreddits + URLs + users + searches together) | ❌ One mode per run |
| Image / video / gallery extraction | ✅ Yes (preview URLs + dimensions + gallery items) | ⚠️ Partial |
| Comment depth tracking | ✅ Yes | ❌ Rare |
| NSFW / spoiler / locked / stickied flags | ✅ Yes | ⚠️ Partial |
| Custom request delay | ✅ Yes (default 1500ms) | ⚠️ Often hardcoded |
| Auto-retry with exponential backoff on 429 | ✅ Yes | ⚠️ Partial |
| Pure HTTP (256 MB memory) | ✅ Yes | ⚠️ Some use Playwright |
| Free items per run | **50** | 0 – 25 |

**Used by:** Brand monitoring teams, market researchers, AI training data engineers, journalists, academic researchers, social listening platforms, sentiment analysis tools, B2B lead generation, content marketing agencies, crypto/finance signal extraction, recruitment scouts, customer support intelligence, product feedback aggregators, niche community analysts.

---

### 💎 What makes this scraper different — 6 unique angles

#### 1. All six modes in ONE actor — not six separate purchases
Most Reddit "scrapers" on Apify only do one thing. Subreddit-only. User-only. Search-only. Comments-only. To get a complete Reddit dataset you'd run 4-6 different actors, pay 4-6 separate fees, and stitch the data together yourself. **This one does everything.** Subreddits, posts (with comments), users, search, subreddit discovery, and any-URL auto-detection — all in a single run, single price, unified output schema. The dataset comes back tagged with `type: "post"`, `type: "comment"`, `type: "user"`, or `type: "subreddit"` so you can filter or split it however you want.

#### 2. Full comment tree expansion — including hidden "load more" branches
Reddit's API returns comment trees in chunks. After the first few replies, you get a `kind: "more"` placeholder — a list of comment IDs that weren't included in the response. **Most scrapers ignore these.** This one automatically calls Reddit's `/api/morechildren.json` endpoint to fetch them, recursively, until the entire tree is captured. On a popular post with 500+ comments, this is often the difference between seeing 100 comments and seeing 800. Configurable via `expandComments` and `maxCommentDepth`.

#### 3. Mix-and-match input — one run, many sources
You can pass **all of these in the same input**:
- 10 subreddits
- 5 specific post URLs
- 3 usernames you want to profile
- 4 search queries
- And let the scraper auto-detect 7 random Reddit URLs you pasted from your browser

Everything runs in sequence with a unified budget. Want all comments on a specific viral post + a profile of its author + the top 100 posts in that subreddit + 50 results from a related search query? **One run.** Other scrapers force you to set up four separate jobs.

#### 4. The cheapest Reddit scraper on Apify — by 50%+
- **This actor**: $0.001 per item ($1 per 1,000)
- Most competitors: $0.002–$0.005 per item ($2–$5 per 1,000)
- Subscription-model scrapers: $20–$45/month flat (only break even if you scrape 20K+ items)

If you scrape 100,000 Reddit items per month, you save **$100–$400/month** vs. alternatives. If you scrape less than 5,000/month, you avoid subscription lock-in entirely.

#### 5. Built for AI training and RAG pipelines
Output is clean structured JSON with consistent field names across all record types. `selftext`, `body`, `title` are flat strings (no HTML escaping headaches). `createdUtc` is ISO 8601, not Unix epoch. Comments include `depth` for tree reconstruction. Posts include media URLs, gallery items, and preview images for multimodal training. Drop the dataset directly into LangChain, LlamaIndex, Pinecone, Weaviate, or any vector DB. Used by teams training instruction-following models on Reddit Q&A data and sentiment-analysis models on long-form discussion.

#### 6. Zero authentication = zero ban risk
This scraper accesses Reddit's PUBLIC `.json` endpoints — the same data anyone can see by appending `.json` to any Reddit URL in their browser. **No OAuth tokens, no Reddit account, no PRAW credentials.** You can't get your Reddit account rate-limited or banned because no account is involved. Compare this to OAuth-based scrapers where a heavy run can flag your developer app for review or your account for temporary suspension.

---

### 💡 What you can do with this data

#### 1. **Brand & product mention monitoring**
Set up search queries for your brand name, your competitors, and your product category. Schedule daily runs and pipe results into Slack, Notion, or your CRM. Catch a viral negative thread within hours of it going up. Used by DTC brands, SaaS companies, and PR teams.

#### 2. **Customer support and bug intelligence**
Scrape your product's subreddit (or related discussion subreddits) on the `new` sort weekly. Filter for posts with low `score` + high `numComments` — these are usually unresolved complaints or bugs. Pipe into your support ticketing system as proactive leads.

#### 3. **AI training corpus assembly**
Build a high-quality conversational corpus by scraping subreddits like r/AskHistorians, r/explainlikeimfive, r/AskScience — each containing curated Q&A pairs. Use `includeComments: true` to capture full discussion trees. Hundreds of thousands of high-quality training examples for instruction-following models.

#### 4. **Influencer and thought leader discovery**
Use `searchType: "user"` with topic keywords to find the most-active Reddit voices in your niche. Filter by `commentKarma > 50000` and recent activity. These are your potential AMA guests, podcast guests, or community partnerships.

#### 5. **Subreddit discovery for niche targeting**
Use `subredditSearch` mode with industry keywords to find every relevant community for your product. A SaaS company in the fitness space might discover 50+ relevant subreddits — most far smaller and more engaged than r/fitness — perfect for grassroots community marketing.

#### 6. **Sentiment analysis on real consumer language**
Scrape Reddit discussions about products, brands, or topics. Pipe into a sentiment analysis model. Get richer, more candid signal than Twitter (Reddit comments are longer and less performative) and broader than survey responses. Used by hedge funds, CPG product teams, and market research firms.

#### 7. **Reddit trend monitoring for content marketing**
Schedule daily scrapes of `top` posts in your category subreddits with `timeframe: "day"`. Tomorrow's content trends are today's top Reddit posts. Used by content marketing teams and editorial newsrooms.

#### 8. **Crypto and finance signal extraction**
Subreddits like r/wallstreetbets, r/CryptoCurrency, r/stocks are leading indicators for retail sentiment. Scrape on a schedule, run NLP for ticker mention frequency, build a sentiment-tracking dashboard. Used by quant funds and finance content sites.

#### 9. **Competitive intelligence for SaaS**
Scrape your competitors' subreddits or search for their product names. See what features users wish existed, where they're frustrated, what alternatives they mention. **Free product research** — far more honest than user interviews.

#### 10. **Academic and journalistic research**
Scrape historical posts from any subreddit. Build longitudinal datasets of community discourse. Track how community language, sentiment, or topic emphasis has shifted over time. Used by linguistics researchers, political scientists, and investigative journalists.

#### 11. **Lead generation from active redditors**
Find users posting in your category's subreddits with `submitted` or `comments` user mode. Their post history reveals their professional role, location, and interests — qualifying signal for B2B outreach (when paired with a careful, consent-respecting outreach approach).

#### 12. **Content idea engine**
Stuck on what to post? Scrape `top` posts from your category subreddit, study the titles (every title is a tested headline) and discussions, and reverse-engineer winning content angles. Feed scraped titles into an AI summarizer for emerging themes.

---

### 📦 Output fields

Every record has a `type` field telling you what it is: `post`, `comment`, `user`, or `subreddit`. Fields are populated as relevant to each type.

#### Posts (`type: "post"`)

| Field | Description |
|-------|-------------|
| `id`, `fullname` | Reddit post ID (`1abc234`) and fullname (`t3_1abc234`) |
| `subreddit`, `subredditId`, `subredditNamePrefixed` | Where the post lives |
| `author`, `authorFullname` | Poster's username and fullname (`t2_...`) |
| `title` | Post title |
| `selftext`, `selftextHtml` | Post body (text posts only) |
| `url`, `permalink` | External link / direct Reddit permalink |
| `domain` | Domain of the linked URL |
| `isSelf`, `isVideo`, `isGallery` | Type flags |
| `over18`, `spoiler`, `locked`, `stickied`, `archived` | Status flags |
| `score`, `upvoteRatio`, `ups`, `downs` | Engagement metrics |
| `numComments`, `numCrossposts` | Discussion volume |
| `gilded`, `totalAwardsReceived` | Award counts |
| `flairText`, `flairCss`, `authorFlairText` | Subreddit flair |
| `thumbnail`, `thumbnailWidth`, `thumbnailHeight` | Thumbnail (when available) |
| `preview` | Array of preview images with URLs and dimensions |
| `media` | Embedded media (YouTube, video, gif) with oembed and direct URLs |
| `galleryImages` | For multi-image gallery posts — array of all images with captions |
| `crosspostParent` | If this is a crosspost |
| `createdUtc`, `edited` | ISO timestamps |
| `distinguished`, `suggestedSort` | Mod/admin flags |

#### Comments (`type: "comment"`)

| Field | Description |
|-------|-------------|
| `id`, `fullname` | Comment ID (`c3v7f8u`) and fullname (`t1_c3v7f8u`) |
| `parentId`, `linkId` | Parent comment/post and parent post fullnames |
| `subreddit`, `author`, `authorFullname` | Where and who |
| `body`, `bodyHtml` | Raw text and HTML-formatted version |
| `score`, `scoreHidden`, `ups`, `downs` | Engagement (downvotes are usually 0 due to Reddit's fuzzing) |
| `gilded`, `controversial` | Award and controversy flags |
| `depth` | How deep in the reply tree (0 = top-level, 1 = direct reply, etc.) |
| `permalink`, `createdUtc`, `edited` | URLs and timestamps |
| `distinguished`, `isSubmitter`, `stickied` | Mod/OP/sticky flags |
| `flairText` | Author flair text |

#### Users (`type: "user"`)

| Field | Description |
|-------|-------------|
| `id`, `fullname`, `username` | Reddit user ID and username |
| `linkKarma`, `commentKarma`, `totalKarma` | Karma breakdown |
| `awardeeKarma`, `awarderKarma` | Award-related karma |
| `isMod`, `isGold`, `isEmployee` | Account flags |
| `verified`, `hasVerifiedEmail` | Verification status |
| `createdUtc` | Account age |
| `iconImg` | Avatar URL |
| `subredditTitle`, `subredditDescription`, `subredditSubscribers` | The user's own profile page subreddit |
| `subredditNsfw` | Whether profile is marked NSFW |
| `hiddenFromBots` | If user opted out of search indexing |

#### Subreddits (`type: "subreddit"`)

| Field | Description |
|-------|-------------|
| `id`, `fullname` | Subreddit ID (`t5_...`) |
| `displayName`, `displayNamePrefixed` | `programming`, `r/programming` |
| `title`, `description`, `publicDescription` | Subreddit metadata |
| `subscribers`, `activeUserCount` | Community size and currently online |
| `subredditType` | `public`, `private`, `restricted`, `gold_restricted` |
| `over18`, `quarantine` | Status flags |
| `url`, `iconImg`, `bannerImg`, `communityIcon`, `headerImg` | Visual assets |
| `primaryColor`, `keyColor` | Branding |
| `lang` | Primary language |
| `createdUtc` | Subreddit age |
| `submissionType`, `submitText` | What kind of posts are allowed |
| `wikiEnabled` | Whether wiki is available |

---

### ⚙️ Input configuration

#### Input sources (use any combination — they all run together)

| Field | Description |
|-------|-------------|
| `subreddits` | List of subreddit names (no /r/ prefix) |
| `postIds` | List of post IDs from `/comments/XXXX/` URLs |
| `usernames` | List of usernames (no /u/ prefix) |
| `searchQueries` | Free-text search queries (Reddit-wide unless `restrictToSubreddit` set) |
| `subredditSearch` | Find subreddits matching a keyword (community discovery) |
| `startUrls` | Any Reddit URLs — type auto-detected from URL structure |

#### Filters

| Field | Default | Options |
|-------|---------|---------|
| `sort` | `hot` | `hot`, `new`, `top`, `rising`, `controversial`, `best` |
| `timeframe` | `all` | `hour`, `day`, `week`, `month`, `year`, `all` |
| `userContent` | `overview` | `overview`, `submitted`, `comments`, `about` |
| `searchType` | `link` | `link` (posts), `sr` (subreddits), `user` |
| `searchSort` | `relevance` | `relevance`, `hot`, `top`, `new`, `comments` |
| `restrictToSubreddit` | null | Subreddit name to scope search to one community |

#### Volume & depth

| Field | Default | Description |
|-------|---------|-------------|
| `maxItems` | `1000` | Hard ceiling across ALL targets. `0` = unlimited. |
| `maxItemsPerTarget` | `200` | Cap per subreddit/user/post/search. Prevents one big target from eating the budget. |
| `expandComments` | `true` | Recursively call Reddit's `/api/morechildren` to fetch hidden comments. |
| `maxCommentDepth` | `10` | Max nesting depth for comment trees. |
| `includeComments` | `false` | When scraping subreddit listings, also fetch ALL comments for each post. Dramatically increases data volume. |

#### Politeness

| Field | Default | Description |
|-------|---------|-------------|
| `requestDelayMs` | `1500` | Milliseconds between requests. Lower = faster but risks 429. |
| `maxRetries` | `4` | Retry attempts on 429/5xx with exponential backoff. |

---

### 💡 Example inputs

#### Quick subreddit scrape — top posts of the week
```json
{
  "subreddits": ["programming"],
  "sort": "top",
  "timeframe": "week",
  "maxItemsPerTarget": 100
}
````

\~100 posts, $0.10, runs in ~3 minutes. The simplest possible run.

#### Multi-subreddit brand monitoring

```json
{
  "subreddits": ["programming", "webdev", "javascript", "node", "reactjs"],
  "sort": "new",
  "searchQueries": ["my-product-name"],
  "maxItems": 500,
  "maxItemsPerTarget": 100
}
```

500 items, mixed across 5 subreddit listings AND a global search for your brand name. $0.50.

#### Full deep-dive on one viral post

```json
{
  "postIds": ["1abc234"],
  "expandComments": true,
  "maxCommentDepth": 10,
  "maxItemsPerTarget": 5000
}
```

Captures the post plus every comment (including all "load more" expansions). For a 2000-comment viral thread: ~$2.00. Perfect for post-mortem analysis of viral content.

#### Full user profile and history

```json
{
  "usernames": ["spez", "kn0thing"],
  "userContent": "overview",
  "maxItemsPerTarget": 200
}
```

Returns each user's profile + their most recent 200 posts and comments. $0.40 total.

#### Search Reddit-wide for emerging trends

```json
{
  "searchQueries": ["AI coding agents", "vibe coding", "AI editor"],
  "searchSort": "new",
  "timeframe": "month",
  "maxItemsPerTarget": 100
}
```

Captures fresh posts mentioning each query across all of Reddit. Daily-scheduled = real-time emerging trend tracker. $0.30.

#### Subreddit discovery for niche targeting

```json
{
  "subredditSearch": ["3d printing", "home brewing", "vintage cameras"],
  "maxItemsPerTarget": 50
}
```

Finds 50 most-relevant subreddits per topic. Returns subscriber counts and descriptions so you can identify the most valuable niches. $0.15.

#### Mix-and-match — full Reddit intelligence run

```json
{
  "subreddits": ["startups", "smallbusiness"],
  "searchQueries": ["my-product", "competitor-product"],
  "usernames": ["my-power-user", "competitor-founder"],
  "startUrls": [
    "https://www.reddit.com/r/saas/comments/abc123/...",
    "https://www.reddit.com/r/entrepreneurship/top/"
  ],
  "sort": "new",
  "maxItems": 2000
}
```

2 subreddits + 2 search queries + 2 users + 2 auto-detected URLs in ONE run. $2.00. The flagship use case.

#### URL-only mode (paste any Reddit URLs)

```json
{
  "startUrls": [
    "https://www.reddit.com/r/programming/",
    "https://www.reddit.com/r/webdev/top/?t=week",
    "https://www.reddit.com/r/javascript/comments/abc123/some_post_title/",
    "https://www.reddit.com/user/spez/",
    "https://www.reddit.com/search?q=startup"
  ]
}
```

The lazy mode. Paste any 5 Reddit URLs from your browser. Each one is auto-detected and scraped accordingly. $0–$2 depending on what's behind the URLs.

***

### 📊 Output sample (post)

```json
{
  "type": "post",
  "id": "1abc234",
  "fullname": "t3_1abc234",
  "subreddit": "programming",
  "subredditId": "t5_2fwo",
  "subredditNamePrefixed": "r/programming",
  "author": "example_user",
  "authorFullname": "t2_abcdef",
  "title": "Why X always beats Y for production workloads",
  "selftext": "After 5 years running this in production, I've learned...",
  "selftextHtml": "&lt;div class=\"md\"&gt;...&lt;/div&gt;",
  "url": "https://example.com/article",
  "permalink": "https://www.reddit.com/r/programming/comments/1abc234/why_x_always_beats_y/",
  "domain": "example.com",
  "isSelf": false,
  "isVideo": false,
  "isGallery": false,
  "over18": false,
  "score": 4823,
  "upvoteRatio": 0.94,
  "numComments": 412,
  "gilded": 3,
  "flairText": "Discussion",
  "thumbnail": "https://b.thumbs.redditmedia.com/...",
  "preview": [
    { "url": "https://preview.redd.it/...", "width": 1200, "height": 630, "variants": ["gif", "mp4"] }
  ],
  "createdUtc": "2026-05-05T14:32:11.000Z"
}
```

### 📊 Output sample (comment)

```json
{
  "type": "comment",
  "id": "c3v7f8u",
  "fullname": "t1_c3v7f8u",
  "parentId": "t3_1abc234",
  "linkId": "t3_1abc234",
  "subreddit": "programming",
  "author": "thoughtful_reply_user",
  "body": "Great points, but I'd push back on the third one because...",
  "score": 234,
  "depth": 0,
  "isSubmitter": false,
  "permalink": "https://www.reddit.com/r/programming/comments/1abc234/.../c3v7f8u/",
  "createdUtc": "2026-05-05T15:01:44.000Z"
}
```

***

### 💰 Pricing

Pay-per-event model. **You pay only for items actually saved.**

| Volume | Estimated cost |
|--------|---------------|
| 50 items | **FREE** (every run) |
| 100 items | $0.05 |
| 500 items | $0.45 |
| 1,000 items | $0.95 |
| 5,000 items | $4.95 |
| 10,000 items | $9.95 |
| 50,000 items | $49.95 |
| 100,000 items | $99.95 |

| Subscription tier | Effective price per 1,000 items |
|---|---|
| Free / Starter | $1.00 |
| Bronze | $0.90 |
| Silver | $0.80 |
| Gold | $0.65 |

#### Cost comparison vs other Apify Reddit scrapers

| Scraper | Price / 1,000 items | Monthly minimum |
|---------|--------------------|-----------------|
| **This actor** | **$1.00** | **$0** (pay-as-you-go) |
| Fast Reddit Scraper (practicaltools) | $2.00 | $0 |
| Reddit API Scraper (comchat) | $5.00 | $0 |
| Reddit Scraper Pro (harshmaur) | "Unlimited" | $20/month flat |
| Reddit Scraper (trudax) | + Apify usage | $45/month + usage |
| Reddit Scraper Plus (ctrlaltwin) | + usage | $30/month + usage |

**At 100K items/month**: this actor = $100, competitors = $200-$500+.
**At 1M items/month**: this actor = $1,000, competitors = $2,000-$5,000+.

***

### ⚡ Performance

- **Pure HTTP, no browser** — `.json` endpoints return clean JSON, 10× faster than Playwright-based scrapers. Runs in 256 MB memory.
- **No login or OAuth** — public `.json` endpoints, no Reddit account ever touched.
- **No proxy required** for most workloads — Apify Datacenter proxy is sufficient. Add Residential for very large daily volumes.
- **100 items per page** for listings (Reddit's max).
- **Throughput**: ~40 items per minute at default 1500ms delay. Lower the delay (e.g., 800ms) for ~75 items/minute if you're willing to risk occasional 429s.
- **Auto-retry**: 4 retries with exponential backoff on 429/503 — recovers cleanly from temporary throttling.
- **Auto-deduplication** within a single comment tree by fullname.
- **Stable selectors** — Reddit's `.json` endpoints have been stable for 10+ years. No DOM fragility.

***

### 🔗 Integrations

Export as **JSON**, **CSV**, **Excel**, or **XML**. Connect via:

- **Zapier / Make / n8n** — auto-add new Reddit mentions to Slack/CRM
- **Google Sheets** — live brand-monitoring dashboard, refreshed hourly via Apify Schedules
- **Slack / Discord** — daily/hourly digest of new mentions
- **REST API** — programmatic access from Python, Node.js, any language
- **Airtable / Notion** — visual content swipe file or community CRM
- **LangChain / LlamaIndex** — RAG pipelines on Reddit Q\&A and discussion data
- **HubSpot / Salesforce** — enrich lead records with their Reddit activity
- **Apollo / Outreach / SalesLoft** — feed active redditors into B2B sequencer
- **BigQuery / Snowflake / PostgreSQL** — data warehouse for sentiment analytics
- **Pinecone / Weaviate / Chroma** — vector DBs for semantic search
- **Webhooks** — push every new item to your backend in real time
- **MCP (Model Context Protocol)** — usable by Claude, ChatGPT, and other AI assistants for natural-language Reddit research

***

### 🆚 Reddit All-in-One Scraper vs alternatives

#### vs Reddit's official API (OAuth)

Reddit's official API requires OAuth app registration, scopes, refresh tokens, and 100 requests/minute per app. It locks you to a specific Reddit account that can be rate-limited or banned. This scraper uses the public `.json` endpoints — same data, no account, no OAuth, no app review. **Faster to set up, safer to operate at scale.**

#### vs PRAW (Python Reddit API Wrapper)

PRAW is a great library if you're building a Reddit bot or moderator tool — but it requires OAuth credentials and shares your account's rate-limit budget. For data extraction, this scraper is the right tool: no Python install, no OAuth, no account, full structured JSON output.

#### vs Pushshift / Reddit data dumps

Pushshift used to be the standard for Reddit historical research, but its API was effectively shut down after Reddit's 2023 policy changes. Public Reddit data dumps exist but require terabytes of storage and complex querying. This scraper gives you targeted, real-time access to exactly the slice of Reddit you need.

#### vs Brand24 / Mention / Brandwatch

Social listening platforms cost $99–$5,000/month and aggregate Reddit alongside Twitter, news, etc. Their Reddit coverage is often shallow (titles only, not full comment trees). This scraper at $1/1000 items delivers deeper Reddit-specific intelligence at 1–10% of the cost.

#### vs other Apify Reddit scrapers

There are 10+ Reddit scrapers on Apify, but most have at least one major limitation: single-mode (only subreddits OR only comments), shallow comment extraction (skipping "load more"), or 2–5× the price. **This is the only Reddit scraper that combines all six modes, deep comment expansion, mix-and-match input, and lowest-on-store pricing.**

***

### ❓ Frequently asked questions

#### Does this require a Reddit account or login?

**No.** This uses Reddit's public `.json` endpoints — the same data anyone can see by appending `.json` to any Reddit URL in their browser (try it: `reddit.com/r/programming.json`). No account, no OAuth, no API keys, no credentials.

#### Is this legal?

This scraper accesses **only publicly available data**. Public Reddit posts and comments are explicitly accessible without authentication and have been programmatically accessible via `.json` endpoints for over a decade. You are responsible for complying with Reddit's User Agreement and applicable privacy laws (GDPR, CCPA) when processing the scraped data. **Do not** scrape private subreddits, user direct messages, or any content requiring authentication.

#### How does the comment tree expansion work?

When you scrape a post, Reddit returns the comment tree in chunks. Deep replies and very long threads include `kind: "more"` placeholders — lists of comment IDs that weren't included in the response (Reddit truncates large threads). With `expandComments: true` (default), this scraper automatically calls Reddit's `/api/morechildren.json` to fetch those hidden comments, recursively, until the tree is complete. Without this, you'd often only see 20-30% of comments on heavily-discussed posts.

#### What's the difference between `subreddits` and `subredditSearch` modes?

- `subreddits` = scrape POSTS from specific subreddits you already know (`r/programming`)
- `subredditSearch` = DISCOVER subreddits matching a keyword (find every subreddit about "3D printing")

The first is about content; the second is about community discovery.

#### Can I scrape private or quarantined subreddits?

**No.** Private subreddits require authentication and are out of scope. Quarantined subreddits may sometimes work depending on Reddit's current policy, but are not officially supported. NSFW subreddits work but may require setting `over18` consent in some regions — flagged in output as `over18: true`.

#### How do I know what post IDs to use?

A post URL like `reddit.com/r/programming/comments/1abc234/some_post_title/` has the post ID `1abc234`. Pass that to `postIds`. Or just paste the full URL into `startUrls` — the scraper auto-extracts it.

#### How long does it take?

Reddit's anonymous rate limit is roughly 60 requests/minute. At default 1500ms delay, you get ~40 items/min for listings. A 1000-item run takes ~25 minutes. Comment-heavy runs are slower due to per-post detail calls; user runs are fast. You can lower `requestDelayMs` to 800-1000ms to roughly double throughput at higher 429-risk.

#### What if I hit rate limits?

The scraper auto-retries 429/503 responses with exponential backoff up to `maxRetries` (default 4). If retries are exhausted, the run aborts gracefully and logs the failure. To avoid: use higher `requestDelayMs`, smaller `maxItemsPerTarget`, or enable Apify Residential proxy for IP rotation on very large runs.

#### Why is `downs` always 0?

Reddit deliberately fuzzes upvote and downvote totals to disrupt bots. The `score` field (`ups - downs`) is accurate; the individual counts are obfuscated. Use `score` and `upvoteRatio` (the % of upvotes among voters) for real engagement signal.

#### Can I get a user's email or phone number?

**No.** Reddit never exposes these publicly. The scraper returns only publicly displayed user data: username, karma, account age, avatar, profile description, and submitted/comment history. For enrichment beyond that, pipe `username` into Apollo, Clearbit, or another B2B enrichment service in a separate step.

#### Can I monitor Reddit continuously?

**Yes.** Use Apify Schedules to run this actor every N minutes/hours/days. Combine with webhooks to push every new item to Slack, Discord, or your backend in real time. The classic setup: `searchQueries: ["my-brand"]`, `searchSort: "new"`, run every 15 minutes, webhook to Slack on completion.

#### Are deleted or removed posts/comments included?

- **Deleted by user**: `author` becomes `"[deleted]"`, `body`/`selftext` becomes `"[deleted]"`. The record is still returned.
- **Removed by mods**: `body`/`selftext` becomes `"[removed]"` but other fields stay. Returned.
- **Removed by Reddit (admin/AEO)**: usually invisible to the API entirely. Not returned.

#### Can I scrape Reddit search results sorted by date?

**Yes.** Set `searchSort: "new"` and `timeframe: "day"`/`"week"`/etc. This is the recommended setup for real-time brand monitoring.

#### What's the difference between `score` and `upvoteRatio`?

- `score` = net votes (`ups - downs`). E.g., 200 = net 200 upvotes.
- `upvoteRatio` = % of voters who upvoted (0.0–1.0). E.g., 0.85 = 85% upvoted.

A post can have score=200 with upvoteRatio=0.95 (broad consensus, 200 net upvotes, almost all up) or score=200 with upvoteRatio=0.55 (controversial — 1000 upvotes and 800 downvotes). Use both fields for nuance.

#### Can I integrate with Make / Zapier / n8n?

**Yes.** All three platforms have native Apify integrations. Common automations: new brand mentions → Slack, viral posts in your category → Airtable, new posts by your target users → email digest.

#### Is the output AI-ready / RAG-friendly?

**Yes.** Clean structured JSON, consistent field types, ISO 8601 timestamps. Each post/comment is a self-contained document. Common embedding strategy: use `title` + `selftext` for posts, `body` for comments. Filter by `subreddit`, `type`, `createdUtc`, `score` for retrieval. Used in production by AI training pipelines and RAG-based research assistants.

#### What's the rate of completeness for fields?

- `id`, `fullname`, `type`, `subreddit`, `createdUtc`: 100%
- `author`, `body`/`title`, `score`, `permalink`: ~99% (occasional deleted records)
- `selftext` for self-posts: 100% if not deleted
- `preview`/`thumbnail`/`media`: depends on post type — ~30-50% of posts have one
- `galleryImages`: only for gallery posts (~5% of posts)
- `flairText`, `authorFlairText`: varies by subreddit (~30-60%)
- `gilded`, `awards`: depends on post popularity

#### What payment methods does Apify support?

Credit card, invoicing for enterprise, and platform credits. New users get **$5 free credits** monthly — enough to scrape ~5,000 Reddit items for free. No credit card required to start.

***

### ⚖️ Legal & Compliance

This scraper accesses **only publicly available data** from Reddit's public `.json` endpoints — the same data Reddit exposes to anonymous browser users. No private subreddits, no authentication, no internal API endpoints.

**You are responsible** for ensuring your specific use of the scraped data complies with:

- **Reddit's User Agreement** and Content Policy
- **GDPR** (EU/UK) — lawful basis for processing personal data of EU/UK individuals
- **CCPA** (California) — consumer rights for CA residents
- **Local data protection laws** in any jurisdiction
- **Anti-spam laws** — CAN-SPAM (US), CASL (Canada), GDPR consent (EU) for any outreach use

**Best-practice guidelines for Reddit data:**

- Treat usernames as PII; anonymize before redistribution
- Do not republish full post/comment text without attribution
- Do not use scraped contact info (from user bios) for unsolicited outreach
- Respect users who set `hiddenFromBots: true` — exclude them from your downstream processing where feasible

This scraper is a **general-purpose tool**. The actor author and Apify provide no warranty regarding the legality of any specific use case. When in doubt, consult legal counsel.

**Not affiliated with Reddit, Inc.** Reddit® is a registered trademark of Reddit, Inc. All trademarks belong to their respective owners.

***

### 🛠️ Technical details

- **Endpoints used**:
  - `GET /r/{subreddit}/{sort}.json` — listings
  - `GET /r/{subreddit}/comments/{id}.json` — post + comment tree
  - `GET /user/{name}/(about|submitted|comments|overview).json` — user data
  - `GET /search.json` — global post/subreddit/user search
  - `GET /subreddits/search.json` — subreddit discovery
  - `GET /api/morechildren.json` — comment expansion
- **Method**: GET, JSON responses, anonymous (no auth)
- **Headers**: Required `User-Agent` header (Reddit returns 429 without it). This scraper sends a distinctive UA.
- **Pagination**: Cursor-based via `after` parameter (max 100 items/page)
- **Comment tree expansion**: Recursive via `/api/morechildren.json` with depth cap
- **Rate limiting**: Configurable inter-request delay (default 1500ms) + exponential backoff retry
- **Concurrency**: Sequential by design — Reddit's per-IP limit makes parallel requests risky
- **Memory**: Runs comfortably in 256 MB
- **Tech stack**: Apify SDK v3, Crawlee v3, Node.js 20+, native fetch

***

### 🚦 Getting started in 30 seconds

1. **Click "Try for free"** on this actor's page
2. **Paste a subreddit name** into `subreddits` (e.g., `programming`) **or paste any Reddit URL** into `startUrls`
3. **Click "Start"**
4. **Wait ~60 seconds** for the first 50-100 items
5. **Download** as JSON / CSV / Excel from the Storage tab

No credit card required. First 50 items per run are always free. Paid usage starts after that, billed monthly via Apify.

***

### 💬 Support

- **Issues / feature requests**: Open a ticket in the **Issues** tab on this actor's page
- **Custom scraping needs**: Contact the actor author for tailored solutions
- **General Apify support**: [help.apify.com](https://help.apify.com)

***

### 🔍 Search keywords

Reddit scraper, Reddit API scraper, Reddit JSON scraper, scrape Reddit, Reddit data extractor, Reddit post scraper, Reddit comment scraper, Reddit user scraper, Reddit subreddit scraper, Reddit search scraper, Reddit community scraper, Reddit no login scraper, Reddit anonymous scraper, Reddit without API, Reddit without OAuth, Reddit without PRAW, Reddit pure HTTP scraper, cheapest Reddit scraper, Reddit data dump, Reddit bulk download, Reddit JSON export, Reddit CSV export, Reddit data for AI, Reddit AI training data, Reddit RAG dataset, Reddit LLM corpus, Reddit sentiment analysis, Reddit brand monitoring, Reddit mention tracker, Reddit search monitoring, Reddit trend tracking, Reddit competitor monitoring, Reddit market research, Reddit social listening, Reddit lead generation, Reddit influencer discovery, Reddit power user finder, Reddit niche subreddit finder, Reddit community discovery, Reddit subreddit search, Reddit post search, Reddit user profile scraper, Reddit karma scraper, Reddit comment tree scraper, Reddit nested comment scraper, Reddit full thread scraper, Reddit hot posts scraper, Reddit new posts scraper, Reddit top posts scraper, Reddit rising posts scraper, Reddit controversial posts scraper, Reddit gallery scraper, Reddit image scraper, Reddit video scraper, Reddit NSFW scraper, Pushshift alternative, PRAW alternative, Reddit Wayback alternative, r/wallstreetbets scraper, r/CryptoCurrency scraper, r/AskReddit scraper, Brand24 alternative, Mention alternative, Brandwatch alternative, Reddit Scraper Pro alternative, Reddit Scraper Plus alternative, trudax Reddit scraper alternative, harshmaur Reddit scraper alternative, practicaltools Reddit scraper alternative.

***

**Ready to scrape Reddit at half the price of any competitor?** Hit "Try for free" above. First 50 items on us. No credit card. No login. No risk.

# Actor input Schema

## `subreddits` (type: `array`):

List of subreddit names to scrape (no /r/ prefix needed). Each runs as a listing scrape using the global `sort` setting. Example: \['programming', 'webdev', 'startups']

## `postIds` (type: `array`):

Reddit post IDs (the alphanumeric piece in /comments/XXXX/). Each post is fetched along with its full comment tree. Example: \['1abc234', '5xyz678']

## `usernames` (type: `array`):

Reddit usernames to scrape (no /u/ prefix needed). Returns the user's profile plus their submitted posts and/or comments based on `userContent`.

## `searchQueries` (type: `array`):

Free-text search queries. Searches across all of Reddit unless `restrictToSubreddit` is set. Returns posts, subreddits, or users based on `searchType`.

## `subredditSearch` (type: `array`):

Find subreddits by keyword. Returns metadata for all matching subreddits (name, subscribers, description, NSFW flag, etc). Different from `searchQueries` — this is specifically for finding communities, not posts.

## `startUrls` (type: `array`):

Any Reddit URL — subreddit, post, user, or search. The scraper auto-detects the type and scrapes accordingly. Most flexible way to add targets. Mix and match URL types in one input.

## `sort` (type: `string`):

How to sort subreddit listings.

## `timeframe` (type: `string`):

Time window when sorting by `top` or `controversial`.

## `userContent` (type: `string`):

When scraping users, choose what to return after the profile: their posts, their comments, both, or just profile metadata.

## `searchType` (type: `string`):

What to return from a search query.

## `searchSort` (type: `string`):

How to rank search results.

## `restrictToSubreddit` (type: `string`):

Limit `searchQueries` to a single subreddit. Leave blank for global Reddit-wide search.

## `maxItems` (type: `integer`):

Total cap across ALL targets in this run. Hard ceiling — run stops when reached. `0` = unlimited.

## `maxItemsPerTarget` (type: `integer`):

Cap per subreddit / post / user / search. Prevents one big target from consuming the entire budget.

## `expandComments` (type: `boolean`):

When scraping posts, automatically call Reddit's /api/morechildren endpoint to fetch comments hidden behind 'load more' links. Doubles the data on heavily-commented posts.

## `maxCommentDepth` (type: `integer`):

How deep to follow nested reply chains. Reddit's natural max is ~10. Lower this if you only care about top-level discussion.

## `includeComments` (type: `boolean`):

When scraping subreddit listings, ALSO fetch the full comment tree for each post. Massively increases data volume — use only when you genuinely need comments alongside posts.

## `requestDelayMs` (type: `integer`):

Milliseconds between requests. Reddit's anonymous limit is ~60/min — 1500ms (default) is safe. Lower at your own risk.

## `maxRetries` (type: `integer`):

How many times to retry a rate-limited or failed request with exponential backoff.

## Actor input object example

```json
{
  "subreddits": [],
  "postIds": [],
  "usernames": [],
  "searchQueries": [],
  "subredditSearch": [],
  "startUrls": [],
  "sort": "hot",
  "timeframe": "all",
  "userContent": "overview",
  "searchType": "link",
  "searchSort": "relevance",
  "restrictToSubreddit": null,
  "maxItems": 1000,
  "maxItemsPerTarget": 200,
  "expandComments": true,
  "maxCommentDepth": 10,
  "includeComments": false,
  "requestDelayMs": 1500,
  "maxRetries": 4
}
```

# Actor output Schema

## `type` (type: `string`):

No description

## `subreddit` (type: `string`):

No description

## `author` (type: `string`):

No description

## `title` (type: `string`):

No description

## `score` (type: `string`):

No description

## `numComments` (type: `string`):

No description

## `permalink` (type: `string`):

No description

## `createdUtc` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("logiover/reddit-historical-archive-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("logiover/reddit-historical-archive-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call logiover/reddit-historical-archive-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=logiover/reddit-historical-archive-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Reddit Historical Archive Scraper",
        "description": "Access 10+ years of archived Reddit posts and comments via PullPush. Full-text comment search (Reddit can't do this). No login, no proxy. $0.001/item.",
        "version": "0.0",
        "x-build-id": "8eVODXMqjrgJchNYp"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/logiover~reddit-historical-archive-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-logiover-reddit-historical-archive-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/logiover~reddit-historical-archive-scraper/runs": {
            "post": {
                "operationId": "runs-sync-logiover-reddit-historical-archive-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/logiover~reddit-historical-archive-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-logiover-reddit-historical-archive-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "subreddits": {
                        "title": "Subreddits",
                        "type": "array",
                        "description": "List of subreddit names to scrape (no /r/ prefix needed). Each runs as a listing scrape using the global `sort` setting. Example: ['programming', 'webdev', 'startups']",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "postIds": {
                        "title": "Post IDs",
                        "type": "array",
                        "description": "Reddit post IDs (the alphanumeric piece in /comments/XXXX/). Each post is fetched along with its full comment tree. Example: ['1abc234', '5xyz678']",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "usernames": {
                        "title": "Usernames",
                        "type": "array",
                        "description": "Reddit usernames to scrape (no /u/ prefix needed). Returns the user's profile plus their submitted posts and/or comments based on `userContent`.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "searchQueries": {
                        "title": "Post Search Queries",
                        "type": "array",
                        "description": "Free-text search queries. Searches across all of Reddit unless `restrictToSubreddit` is set. Returns posts, subreddits, or users based on `searchType`.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "subredditSearch": {
                        "title": "Subreddit Discovery Queries",
                        "type": "array",
                        "description": "Find subreddits by keyword. Returns metadata for all matching subreddits (name, subscribers, description, NSFW flag, etc). Different from `searchQueries` — this is specifically for finding communities, not posts.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "startUrls": {
                        "title": "Reddit URLs (auto-detected)",
                        "type": "array",
                        "description": "Any Reddit URL — subreddit, post, user, or search. The scraper auto-detects the type and scrapes accordingly. Most flexible way to add targets. Mix and match URL types in one input.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "sort": {
                        "title": "Subreddit Sort",
                        "enum": [
                            "hot",
                            "new",
                            "top",
                            "rising",
                            "controversial",
                            "best"
                        ],
                        "type": "string",
                        "description": "How to sort subreddit listings.",
                        "default": "hot"
                    },
                    "timeframe": {
                        "title": "Timeframe (for top/controversial)",
                        "enum": [
                            "hour",
                            "day",
                            "week",
                            "month",
                            "year",
                            "all"
                        ],
                        "type": "string",
                        "description": "Time window when sorting by `top` or `controversial`.",
                        "default": "all"
                    },
                    "userContent": {
                        "title": "User Content Type",
                        "enum": [
                            "overview",
                            "submitted",
                            "comments",
                            "about"
                        ],
                        "type": "string",
                        "description": "When scraping users, choose what to return after the profile: their posts, their comments, both, or just profile metadata.",
                        "default": "overview"
                    },
                    "searchType": {
                        "title": "Search Type",
                        "enum": [
                            "link",
                            "sr",
                            "user"
                        ],
                        "type": "string",
                        "description": "What to return from a search query.",
                        "default": "link"
                    },
                    "searchSort": {
                        "title": "Search Sort",
                        "enum": [
                            "relevance",
                            "hot",
                            "top",
                            "new",
                            "comments"
                        ],
                        "type": "string",
                        "description": "How to rank search results.",
                        "default": "relevance"
                    },
                    "restrictToSubreddit": {
                        "title": "Restrict Search to Subreddit",
                        "type": "string",
                        "description": "Limit `searchQueries` to a single subreddit. Leave blank for global Reddit-wide search.",
                        "default": null
                    },
                    "maxItems": {
                        "title": "Max Items (Global)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Total cap across ALL targets in this run. Hard ceiling — run stops when reached. `0` = unlimited.",
                        "default": 1000
                    },
                    "maxItemsPerTarget": {
                        "title": "Max Items Per Target",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Cap per subreddit / post / user / search. Prevents one big target from consuming the entire budget.",
                        "default": 200
                    },
                    "expandComments": {
                        "title": "Expand 'More Comments' Placeholders",
                        "type": "boolean",
                        "description": "When scraping posts, automatically call Reddit's /api/morechildren endpoint to fetch comments hidden behind 'load more' links. Doubles the data on heavily-commented posts.",
                        "default": true
                    },
                    "maxCommentDepth": {
                        "title": "Max Comment Depth",
                        "minimum": 0,
                        "maximum": 50,
                        "type": "integer",
                        "description": "How deep to follow nested reply chains. Reddit's natural max is ~10. Lower this if you only care about top-level discussion.",
                        "default": 10
                    },
                    "includeComments": {
                        "title": "Include Comments in Subreddit Runs",
                        "type": "boolean",
                        "description": "When scraping subreddit listings, ALSO fetch the full comment tree for each post. Massively increases data volume — use only when you genuinely need comments alongside posts.",
                        "default": false
                    },
                    "requestDelayMs": {
                        "title": "Request Delay (ms)",
                        "minimum": 100,
                        "type": "integer",
                        "description": "Milliseconds between requests. Reddit's anonymous limit is ~60/min — 1500ms (default) is safe. Lower at your own risk.",
                        "default": 1500
                    },
                    "maxRetries": {
                        "title": "Max Retries on 429/5xx",
                        "minimum": 0,
                        "type": "integer",
                        "description": "How many times to retry a rate-limited or failed request with exponential backoff.",
                        "default": 4
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
