# Reddit Crawler (`lighthouse_keeper/redditcrawler`) Actor

Works after reddit 11/06/2026 update!
Crawl and scrape Reddit subreddits, user profiles, and posts.

- **URL**: https://apify.com/lighthouse\_keeper/redditcrawler.md
- **Developed by:** [r. mann](https://apify.com/lighthouse_keeper) (community)
- **Categories:** Automation, Social media
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

from $1.00 / 1,000 record scrapeds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## REDDIT API UPDATE NOTICE

This crawler is freshly built after the 11/06/2026 reddit API update, where they have made it much more difficult to scrape.
This crawler does NOT care.

## Reddit Scraper

Crawl and scrape Reddit: subreddits, user profiles, and posts (with their
comments), and get structured JSON output.

### Scope

This module does one thing well: pull posts and comments off Reddit and push
them to the dataset. Point it at any mix of subreddits, users, and post URLs and
it crawls each target, following listing pagination up to a configurable depth.
It does not log in, post, vote, or modify anything - it is read-only and stealthy.

### Capabilities

- Crawl subreddit listings (front-page style feeds) with pagination
- Crawl user profile listings (a user's posts and comments) with pagination
- Crawl individual posts together with their comments
- Configurable request cooldown, plus automatic backoff that reads Reddit's
  `X-Ratelimit` headers and slows down before you get blocked

### Input schema

Provide at least one of `subreddits`, `users`, or `postUrls`. Everything else is
optional and has sensible defaults.

```jsonc
{
  "subreddits": [],            // Subreddit names to crawl, without the "r/" prefix (e.g. "technology").
                               // Each entry is one subreddit.

  "users": [],                 // Reddit usernames to crawl, without the "u/" prefix (e.g. "spez").
                               // Crawls that user's profile listing.

  "postUrls": [],              // Crawl post with comments. Acccepts Reddit post URLs or permalinks (e.g. "/r/technology/comments/abc123/title/").
                               // Each post is crawled together with its comments.

  "maxPages": 1,               // Max number of listing pages to follow per subreddit/user.
                               // Each page is 25 items. 1 = 25, 2 = 50, and so on.

  "cooldown": 1,               // Baseline delay, in seconds, between requests. Raised automatically
                               // when Reddit reports the rate-limit quota is running low.

  "useProxy": true             // Route requests through Apify proxy (recommended).
}
````

### Output

Every crawled item is pushed to the dataset.

- **Subreddits and users** push one record per post:

  ```jsonc
  {
    "id": "t3_1u41vjv",                       // Reddit fullname
    "url": "https://example.com/article",     // The post's outbound/link URL
    "permalink": "/r/technology/comments/...",// Reddit permalink to the post
    "title": "Post title",
    "author": { "username": "username", "uri": null },
    "published": "2026-06-12T17:28:14+00:00", // ISO 8601 timestamp
    "updated": null,
    "content": "self-post text, if any",      // post/comment text; null for link posts
    "thumbnail": null,
    "source": { "type": "subreddit", "name": "technology" } // where this came from
  }
  ```

- **Post URLs** push one record per post, with its comments attached:

  ```jsonc
  {
    "post": { /* same shape as above */ },
    "comments": [ /* same shape, one per comment */ ],
    "source": { "type": "post", "name": "<the post url>" }
  }
  ```

  Comments share the post shape, but `title` is always null (comments have no
  title) and their text is in `content`.

### FAQ

**Does it work after the new Reddit API changes?**

Yes. this is why it's here.

**What does the proxy toggle do?**

When `useProxy` is on (the default), every request is routed through
Apify's residential proxies, which look like ordinary home connections and are
the only reliable way to scrape Reddit at any volume. This is the recommended
setting.

When it is off, requests go out directly from Apify's datacenter IPs. This is
cheaper - you pay no residential proxy usage - but riskier: Reddit blocks
datacenter IPs quickly, so you will likely get throttled or blocked after a
small number of requests. Turn it off only for quick tests or very light runs.

**It stopped returning results / I see rate-limit warnings.**

You are being throttled. Increase `cooldown`, lower `maxPages`, and make sure
`useProxy` is on. The actor already backs off automatically when
Reddit signals a low quota, but a heavier run needs gentler settings.

# Actor input Schema

## `subreddits` (type: `array`):

Subreddit names to crawl (without the 'r/' prefix), e.g. 'technology'.

## `users` (type: `array`):

Reddit usernames to crawl (without the 'u/' prefix).

## `postUrls` (type: `array`):

Reddit post URLs or permalinks to crawl together with their comments.

## `maxPages` (type: `integer`):

Maximum number of listing pages to follow per subreddit/user.

## `cooldown` (type: `integer`):

Baseline wait time between requests. Raised automatically when rate limits get low.

## `useProxy` (type: `boolean`):

Route requests through Apify proxies. Strongly recommended - Reddit blocks datacenter IPs fast. Turn off to run cheaper without a proxy, at the cost of getting blocked sooner.

## Actor input object example

```json
{
  "subreddits": [
    "technology"
  ],
  "users": [],
  "postUrls": [],
  "maxPages": 1,
  "cooldown": 1,
  "useProxy": true
}
```

# Actor output Schema

## `posts` (type: `string`):

Crawled posts and comments from the configured subreddits, users, and post URLs.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "subreddits": [
        "technology"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("lighthouse_keeper/redditcrawler").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "subreddits": ["technology"] }

# Run the Actor and wait for it to finish
run = client.actor("lighthouse_keeper/redditcrawler").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "subreddits": [
    "technology"
  ]
}' |
apify call lighthouse_keeper/redditcrawler --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=lighthouse_keeper/redditcrawler",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Reddit Crawler",
        "description": "Works after reddit 11/06/2026 update!\nCrawl and scrape Reddit subreddits, user profiles, and posts.",
        "version": "1.0",
        "x-build-id": "9bFYsgOXlER3S2eY5"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/lighthouse_keeper~redditcrawler/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-lighthouse_keeper-redditcrawler",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/lighthouse_keeper~redditcrawler/runs": {
            "post": {
                "operationId": "runs-sync-lighthouse_keeper-redditcrawler",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/lighthouse_keeper~redditcrawler/run-sync": {
            "post": {
                "operationId": "run-sync-lighthouse_keeper-redditcrawler",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "subreddits": {
                        "title": "Subreddits",
                        "type": "array",
                        "description": "Subreddit names to crawl (without the 'r/' prefix), e.g. 'technology'.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "users": {
                        "title": "Users",
                        "type": "array",
                        "description": "Reddit usernames to crawl (without the 'u/' prefix).",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "postUrls": {
                        "title": "Post URLs",
                        "type": "array",
                        "description": "Reddit post URLs or permalinks to crawl together with their comments.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxPages": {
                        "title": "Max pages",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum number of listing pages to follow per subreddit/user.",
                        "default": 1
                    },
                    "cooldown": {
                        "title": "Cooldown (seconds)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Baseline wait time between requests. Raised automatically when rate limits get low.",
                        "default": 1
                    },
                    "useProxy": {
                        "title": "Use proxy",
                        "type": "boolean",
                        "description": "Route requests through Apify proxies. Strongly recommended - Reddit blocks datacenter IPs fast. Turn off to run cheaper without a proxy, at the cost of getting blocked sooner.",
                        "default": true
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
