# Reddit Subreddit Scraper (`rambunctious_fingerprint/reddit-subreddit-scraper`) Actor

Extract Reddit posts from any subreddit without an API key. Get titles, scores, authors, comment counts, flairs, and URLs from old.reddit.com.

- **URL**: https://apify.com/rambunctious\_fingerprint/reddit-subreddit-scraper.md
- **Developed by:** [Casey Marsh](https://apify.com/rambunctious_fingerprint) (community)
- **Categories:** Social media, Automation
- **Stats:** 2 total users, 1 monthly users, 0.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

$1.50 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Reddit Subreddit Scraper

Extract posts from any public subreddit on Reddit with zero API authentication required. Get post titles, scores, authors, comment counts, flairs, domains, and full permalinks using old.reddit.com for reliable, fast, and lightweight scraping. Ideal for social media monitoring, content research, and sentiment analysis.

### Summary

The Reddit Subreddit Scraper is a production-grade Apify actor that extracts posts from any public subreddit without requiring a Reddit API key, app registration, or OAuth token. It uses `old.reddit.com` — Reddit's lightweight, server-rendered interface — which is far more scraper-friendly than the modern React-based Reddit frontend.

Built on Crawlee's CheerioCrawler with Apify residential proxy rotation, this actor handles Reddit's rate limiting gracefully, retries failed requests automatically, and extracts rich metadata including upvote ratios, NSFW flags, sticky post detection, award counts, and post flairs. Whether you need 10 posts or 500, the actor paginates automatically to collect your requested volume.

### How It Works

1. **Input**: You provide a subreddit name, sort order, and maximum post count.
2. **URL Construction**: The actor builds the correct `old.reddit.com/r/{subreddit}/{sort}/` URL — old Reddit renders complete HTML without JavaScript, making it ideal for Cheerio-based scraping.
3. **Rate Limit Detection**: On each page load, the actor checks for Reddit's rate limit or "blocked" pages and re-throws the error to trigger a retry with a fresh residential proxy IP.
4. **Post Extraction**: Each post (`.thing.link`) is parsed for title, URL, author, score, comment count, flair, domain, timestamp, and more using multiple CSS selector fallbacks.
5. **Session Management**: A Crawlee session pool rotates user agents and cookies to reduce fingerprinting.
6. **Pagination**: The actor continuously scrapes until it reaches your `maxPosts` limit.
7. **Output**: Clean, structured JSON saved to your Apify dataset with ISO 8601 timestamps.

### Input Parameters

| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `subreddit` | string | No | `popular` | Subreddit name without the `r/` prefix (e.g. `australia`, `programming`, `worldnews`) |
| `sort` | string | No | `hot` | Sort order: `hot`, `new`, `top`, or `rising` |
| `maxPosts` | integer | No | `50` | Maximum number of posts to extract (1–500) |
| `includeComments` | boolean | No | `false` | Whether to also scrape comments from each post (adds significant runtime) |

### Output Example

```json
{
  "title": "What's the most interesting fact you learned this week?",
  "url": "https://example.com/article",
  "author": "curious_user42",
  "subreddit": "popular",
  "score": 15420,
  "commentCount": 2301,
  "flair": "Discussion",
  "postedAt": "2026-07-04T08:00:00.000Z",
  "domain": "self.popular",
  "redditUrl": "https://old.reddit.com/r/popular/comments/abc123/",
  "isNSFW": false,
  "isSticky": false,
  "awards": 3,
  "upvoteRatio": "0.92",
  "sort": "hot",
  "scrapedAt": "2026-07-04T10:30:00.000Z"
}
````

### Pricing

This actor uses Apify's pay-per-result model. You only pay for the posts you successfully extract. No monthly subscriptions, no Reddit API costs, no minimums. A typical run of 50 posts from a single subreddit costs a fraction of an Apify platform credit.

Because the actor uses `old.reddit.com` (static HTML served directly from Reddit's servers without JavaScript rendering), it is extremely efficient — no headless browser overhead. Residential proxies provide reliability against rate limiting, but you can switch to datacenter proxies for lower costs if your use case allows occasional blocks.

### Use Cases

- **Social Media Monitoring**: Track trending topics, brand mentions, and community discussions across multiple subreddits. Monitor sentiment around products, companies, or public figures in real time.
- **Content Research and Curation**: Discover viral content, identify trending formats, and understand what resonates with specific communities. Source content ideas for blogs, newsletters, and social media channels.
- **Community Sentiment Analysis**: Analyze post titles, flairs, and scores to gauge community sentiment on specific topics. Feed scraped data into NLP pipelines for large-scale sentiment tracking.
- **Data Collection for Machine Learning**: Build training datasets for text classification, toxicity detection, or recommendation systems using Reddit's diverse, community-labeled content.
- **Competitor Research**: Monitor competitor subreddits, track product announcement threads, and analyze community engagement patterns.
- **Market Research**: Understand consumer pain points, feature requests, and product discussions within niche communities relevant to your industry.

### FAQ

**Q: Do I need a Reddit API key or OAuth token?**
A: No. This actor scrapes publicly available pages on old.reddit.com. No Reddit account, API key, or authentication is required. This is one of its key advantages over the official Reddit API, which requires app registration and has stricter rate limits on the free tier.

**Q: Why use old.reddit.com instead of the new Reddit?**
A: Old Reddit renders complete HTML server-side with predictable CSS classes (`.thing.link`, `.score.unvoted`, `.linkflairlabel`). The new Reddit is a React SPA that requires JavaScript execution, making it much slower and more expensive to scrape. Old Reddit is also less aggressively rate-limited.

**Q: What if a subreddit is private or banned?**
A: The actor can only scrape public subreddits. If a subreddit is private, banned, or quarantined, the request will fail, and an error record will be saved to the dataset.

**Q: Can I scrape comments too?**
A: Basic comment count extraction is included. For full comment scraping (comment bodies, nested threads), set `includeComments` to `true`. Note this significantly increases runtime and data volume, as each post spawns additional requests.

**Q: How does the actor handle Reddit's rate limiting?**
A: When a rate limit (HTTP 429) or block page is detected — identified by checking the page title — the error is re-thrown to Crawlee's retry mechanism, which rotates to a fresh residential proxy IP and retries the request automatically. Up to 4 retries are attempted.

**Q: Is it legal to scrape Reddit?**
A: This actor scrapes publicly accessible pages. You are responsible for complying with Reddit's terms of service, robots.txt, and applicable laws. For production use at scale, review Reddit's API terms and consider using the official API for sensitive data.

***

**Actor ID**: `reddit-subreddit-scraper` · **Runtime**: Node.js 20 · **Type**: CheerioCrawler

# Actor input Schema

## `subreddit` (type: `string`):

Subreddit name without r/

## `sort` (type: `string`):

hot, new, top, rising

## `maxPosts` (type: `integer`):

Maximum posts to extract

## `includeComments` (type: `boolean`):

Whether to also scrape comments

## Actor input object example

```json
{
  "subreddit": "popular",
  "sort": "hot",
  "maxPosts": 50,
  "includeComments": false
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("rambunctious_fingerprint/reddit-subreddit-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("rambunctious_fingerprint/reddit-subreddit-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call rambunctious_fingerprint/reddit-subreddit-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=rambunctious_fingerprint/reddit-subreddit-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Reddit Subreddit Scraper",
        "description": "Extract Reddit posts from any subreddit without an API key. Get titles, scores, authors, comment counts, flairs, and URLs from old.reddit.com.",
        "version": "0.0",
        "x-build-id": "8qQmcg8cCX9mmdScV"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/rambunctious_fingerprint~reddit-subreddit-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-rambunctious_fingerprint-reddit-subreddit-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/rambunctious_fingerprint~reddit-subreddit-scraper/runs": {
            "post": {
                "operationId": "runs-sync-rambunctious_fingerprint-reddit-subreddit-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/rambunctious_fingerprint~reddit-subreddit-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-rambunctious_fingerprint-reddit-subreddit-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "subreddit": {
                        "title": "Subreddit",
                        "type": "string",
                        "description": "Subreddit name without r/",
                        "default": "popular"
                    },
                    "sort": {
                        "title": "Sort By",
                        "enum": [
                            "hot",
                            "new",
                            "top",
                            "rising"
                        ],
                        "type": "string",
                        "description": "hot, new, top, rising",
                        "default": "hot"
                    },
                    "maxPosts": {
                        "title": "Max Posts",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Maximum posts to extract",
                        "default": 50
                    },
                    "includeComments": {
                        "title": "Include Comments",
                        "type": "boolean",
                        "description": "Whether to also scrape comments",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```