# Reddit Post & Comment Scraper (`miccho27/reddit-post-scraper`) Actor

Scrape Reddit posts and comments from any subreddit or thread URL. Extract titles, scores, authors, comment trees, and metadata. No Reddit API key or OAuth required.

- **URL**: https://apify.com/miccho27/reddit-post-scraper.md
- **Developed by:** [Tatsuya Mizuno](https://apify.com/miccho27) (community)
- **Categories:** Social media
- **Stats:** 1 total users, 0 monthly users, 0.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Reddit Post & Comment Scraper - Free Subreddit Data Extractor (Alternative to Reddit API, Pushshift, Arctic Shift)

Scrape Reddit posts and comments from any subreddit or thread URL -- titles, scores, authors, awards, flairs, and full comment trees. No Reddit API key, no OAuth, no developer application. The best free alternative to Reddit Data API ($0.24/1K calls), Pushshift (discontinued), Arctic Shift, and SocialGrep.

### Who Is This For?

- **Content marketers** -- Find trending topics, popular questions, and content gaps in your niche subreddits
- **Market researchers** -- Analyze sentiment, feature requests, and pain points from product-related subreddits
- **SEO specialists** -- Discover high-engagement keywords and questions people are asking on Reddit
- **Data scientists** -- Build NLP datasets from Reddit comments for sentiment analysis and topic modeling
- **Product managers** -- Monitor user feedback, feature requests, and bug reports on product subreddits
- **Competitive intelligence** -- Track competitor mentions, comparisons, and user sentiment across subreddits

### Pricing -- Free to Start

| Tier | Cost | What You Get |
|------|------|-------------|
| **Free trial** | $0 | Apify free tier includes monthly compute credits |
| **Pay per result** | ~$2.00 / 1,000 posts | Subreddit scraping with comments |
| **vs. Reddit API** | Saves $0.24/1K calls | No OAuth, no application, no rate limits |
| **vs. Pushshift** | Still works | Pushshift was discontinued in 2023 |

### Quick Start (3 Steps)

1. **Click "Try for free"** on this Actor's page in Apify Store
2. **Enter subreddits** (e.g., `["webdev", "javascript"]`) or paste post URLs
3. **Click "Start"** and get Reddit data as JSON, CSV, or Excel

### Features

- **Subreddit scraping**: Extract posts from any public subreddit (hot, new, top, rising)
- **Post detail scraping**: Scrape individual posts with full comment trees
- **Rich metadata**: Title, score, upvote ratio, author, flair, awards, NSFW flag, pinned status
- **Comment extraction**: Top-level comments with author, score, and OP indicator
- **Time filters**: Filter top posts by hour, day, week, month, year, or all time
- **No API key**: Uses Reddit's public JSON endpoints
- **Retry & rate limiting**: Automatic retries with configurable delays

### Input

| Field | Type | Description | Default |
|-------|------|-------------|---------|
| `subreddits` | array | Subreddit names without `r/` (e.g. `["webdev", "javascript"]`) | -- |
| `postUrls` | array | Direct Reddit post URLs to scrape with comments | -- |
| `sortBy` | string | `"hot"`, `"new"`, `"top"`, `"rising"` | `"hot"` |
| `timeFilter` | string | Time range for top sort: `"hour"`, `"day"`, `"week"`, `"month"`, `"year"`, `"all"` | `"week"` |
| `maxPostsPerSubreddit` | integer | Max posts per subreddit (1-100) | `25` |
| `includeComments` | boolean | Extract top-level comments for each post | `false` |
| `maxCommentsPerPost` | integer | Max comments per post (1-50) | `10` |
| `delayBetweenRequestsMs` | integer | Delay between requests in ms (min 1000) | `2000` |

#### Example Input -- Subreddit Scraping

```json
{
  "subreddits": ["webdev", "javascript", "reactjs"],
  "sortBy": "top",
  "timeFilter": "week",
  "maxPostsPerSubreddit": 25,
  "includeComments": false
}
````

#### Example Input -- Post with Comments

```json
{
  "postUrls": [
    "https://www.reddit.com/r/webdev/comments/abc123/best_frameworks_2024/"
  ],
  "includeComments": true,
  "maxCommentsPerPost": 20
}
```

#### Example Input -- Market Research

```json
{
  "subreddits": ["SaaS", "startups", "entrepreneur"],
  "sortBy": "top",
  "timeFilter": "month",
  "maxPostsPerSubreddit": 50,
  "includeComments": true,
  "maxCommentsPerPost": 5
}
```

### Output

```json
{
  "id": "1abc2de",
  "title": "What's the best JS framework in 2024?",
  "author": "webdev_user",
  "subreddit": "webdev",
  "score": 1247,
  "upvoteRatio": 0.94,
  "numComments": 384,
  "url": "https://www.reddit.com/r/webdev/comments/1abc2de/...",
  "permalink": "https://www.reddit.com/r/webdev/comments/1abc2de/...",
  "selfText": "I've been comparing React, Vue, and Svelte...",
  "flair": "Discussion",
  "awards": 5,
  "createdUtc": "2024-01-15T08:30:00.000Z",
  "isNsfw": false,
  "isPinned": false,
  "comments": [
    {
      "id": "k5f6g7h",
      "author": "senior_dev",
      "body": "React is still the safe bet for most teams...",
      "score": 523,
      "createdUtc": "2024-01-15T09:15:00.000Z",
      "isOp": false,
      "awards": 2
    }
  ],
  "scrapedAt": "2024-01-15T10:30:00.000Z"
}
```

### Real-World Use Cases

#### 1. Content Research for Blog Posts

Scrape top posts from niche subreddits to find the most discussed topics. Use titles and comments as inspiration for blog articles and YouTube videos.

#### 2. Product Feedback Mining

Monitor your product's subreddit for feature requests, bug reports, and user sentiment. Schedule weekly runs and export to Google Sheets for product team review.

#### 3. SEO Keyword Discovery

Extract post titles from relevant subreddits. Analyze the language users actually use when asking questions -- these become long-tail keyword opportunities.

#### 4. Competitive Intelligence

Track competitor mentions across industry subreddits. Compare sentiment and feature discussions to inform your product roadmap.

#### 5. Academic NLP Dataset

Build labeled datasets from subreddit comments for sentiment analysis, topic classification, and language model fine-tuning.

### FAQ

**Q: Can I scrape private subreddits?**
A: No. Only public subreddits are accessible.

**Q: What about Reddit's API pricing?**
A: This Actor uses Reddit's public JSON endpoints, not the official API. No API key or payment required.

**Q: How many posts can I scrape per run?**
A: Up to 100 per subreddit, multiple subreddits per run. For large-scale scraping, use multiple runs.

**Q: Will Reddit block me?**
A: The Actor includes rate limiting. Use Apify's proxy pool for consistent access with large batches.

### Notes & Limitations

- **Public subreddits only**: Private and quarantined subreddits are not accessible.
- **JSON endpoints**: Uses Reddit's `.json` endpoint (old.reddit.com). No OAuth required.
- **Rate limiting**: 2s+ delay between requests. Reddit may throttle aggressive scraping.
- **Comment depth**: Extracts top-level comments only. Nested reply trees are not included.
- **NSFW content**: Posts flagged as NSFW include the `isNsfw: true` field.
- **For research purposes**: Use in compliance with Reddit's Terms of Service.

# Actor input Schema

## `subreddits` (type: `array`):

Subreddit names without r/ (e.g. \["webdev", "javascript", "datascience"])

## `postUrls` (type: `array`):

Direct Reddit post URLs to scrape with comments

## `sortBy` (type: `string`):

How to sort subreddit posts

## `timeFilter` (type: `string`):

Time range when sorting by top

## `maxPostsPerSubreddit` (type: `integer`):

Maximum posts to extract per subreddit (1-100)

## `includeComments` (type: `boolean`):

Extract top-level comments for each post

## `maxCommentsPerPost` (type: `integer`):

Maximum comments to extract per post (1-50)

## `delayBetweenRequestsMs` (type: `integer`):

Delay between requests in milliseconds (min 1000)

## Actor input object example

```json
{
  "subreddits": [
    "webdev",
    "javascript"
  ],
  "postUrls": [],
  "sortBy": "hot",
  "timeFilter": "week",
  "maxPostsPerSubreddit": 25,
  "includeComments": false,
  "maxCommentsPerPost": 10,
  "delayBetweenRequestsMs": 2000
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "subreddits": [
        "webdev",
        "javascript"
    ],
    "postUrls": []
};

// Run the Actor and wait for it to finish
const run = await client.actor("miccho27/reddit-post-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "subreddits": [
        "webdev",
        "javascript",
    ],
    "postUrls": [],
}

# Run the Actor and wait for it to finish
run = client.actor("miccho27/reddit-post-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "subreddits": [
    "webdev",
    "javascript"
  ],
  "postUrls": []
}' |
apify call miccho27/reddit-post-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=miccho27/reddit-post-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Reddit Post & Comment Scraper",
        "description": "Scrape Reddit posts and comments from any subreddit or thread URL. Extract titles, scores, authors, comment trees, and metadata. No Reddit API key or OAuth required.",
        "version": "1.0",
        "x-build-id": "2MdpOyHBhawb84NcC"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/miccho27~reddit-post-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-miccho27-reddit-post-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/miccho27~reddit-post-scraper/runs": {
            "post": {
                "operationId": "runs-sync-miccho27-reddit-post-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/miccho27~reddit-post-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-miccho27-reddit-post-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "subreddits": {
                        "title": "Subreddits",
                        "type": "array",
                        "description": "Subreddit names without r/ (e.g. [\"webdev\", \"javascript\", \"datascience\"])",
                        "items": {
                            "type": "string"
                        }
                    },
                    "postUrls": {
                        "title": "Post URLs",
                        "type": "array",
                        "description": "Direct Reddit post URLs to scrape with comments",
                        "items": {
                            "type": "string"
                        }
                    },
                    "sortBy": {
                        "title": "Sort By",
                        "enum": [
                            "hot",
                            "new",
                            "top",
                            "rising"
                        ],
                        "type": "string",
                        "description": "How to sort subreddit posts",
                        "default": "hot"
                    },
                    "timeFilter": {
                        "title": "Time Filter (for top sort)",
                        "enum": [
                            "hour",
                            "day",
                            "week",
                            "month",
                            "year",
                            "all"
                        ],
                        "type": "string",
                        "description": "Time range when sorting by top",
                        "default": "week"
                    },
                    "maxPostsPerSubreddit": {
                        "title": "Max Posts Per Subreddit",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Maximum posts to extract per subreddit (1-100)",
                        "default": 25
                    },
                    "includeComments": {
                        "title": "Include Comments",
                        "type": "boolean",
                        "description": "Extract top-level comments for each post",
                        "default": false
                    },
                    "maxCommentsPerPost": {
                        "title": "Max Comments Per Post",
                        "minimum": 1,
                        "maximum": 50,
                        "type": "integer",
                        "description": "Maximum comments to extract per post (1-50)",
                        "default": 10
                    },
                    "delayBetweenRequestsMs": {
                        "title": "Delay Between Requests (ms)",
                        "minimum": 1000,
                        "type": "integer",
                        "description": "Delay between requests in milliseconds (min 1000)",
                        "default": 2000
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
