# Substack Scraper (`goat255/substack-scraper`) Actor

Scrape Substack posts and comments without a login. Pull a publication's archive of posts by newest or top, a single post by URL, and full comment trees. Walks pagination up to your chosen limit.

- **URL**: https://apify.com/goat255/substack-scraper.md
- **Developed by:** [Goutam Soni](https://apify.com/goat255) (community)
- **Categories:** Social media, News
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Substack Scraper

Scrape Substack posts and comments from any publication, newsletter, or post URL with no login and no API key required. Extract titles, authors, like and comment counts, restacks, word counts, tags, cover images, full post text, and complete comment threads as clean structured data.

### What it does

- **Publication archive scraper.** Give it a Substack username, address, or custom domain and it pulls the publication's posts, newest first or by top, paging automatically up to your chosen limit.
- **Single post scraper.** Pass any post URL and get that post back with its full plain-text body.
- **Comment scraper.** Optionally fetch the full comment tree for each post, flattened into clean rows with author, body, likes, and reply counts.
- **Rich metrics on every post.** Like count, comment count, restacks, and word count, all in one row.
- **No login, no password, no API key.** Just provide publications or post links.
- **Custom domains supported.** Works with both `*.substack.com` addresses and publications on their own domain.
- **Tunable scale.** Set how many posts per publication, how many comments per post, and how many publications to process in parallel.

### Use cases

- **Newsletter market research.** Track what topics, formats, and headlines are getting the most likes, comments, and restacks across the newsletters in your niche.
- **Lead generation.** Build lists of active writers and publications, with author names and post engagement, for outreach or partnerships.
- **Content monitoring.** Watch a set of publications and capture every new post with its metrics on a schedule.
- **Competitive and trend analysis.** Compare engagement and posting cadence across publications you follow.
- **Audience and sentiment research.** Pull comment threads to understand what readers actually say under top posts.
- **Dataset building.** Export thousands of posts to CSV, JSON, or Excel for analysis, dashboards, or model training.

### Input

| Field | Type | Description |
|---|---|---|
| `publications` | array | Publications to pull posts from. Use a username (`example`), a full address (`example.substack.com`), or a custom domain (`news.example.com`). |
| `postUrls` | array | Specific post links to fetch. The full post body is always included for these. |
| `maxItemsPerSource` | integer | Cap on posts returned per publication. Pagination is walked across pages until this is reached or the archive runs out. Default 50. |
| `sort` | string | Order of posts in a publication archive. `new` (newest first) or `top`. Default `new`. |
| `includeContent` | boolean | When on, each post carries a plain-text body. Default off. |
| `includeComments` | boolean | When on, the comment tree is also fetched for each post. Default off (keeps runs fast and cheap). |
| `maxCommentsPerPost` | integer | Cap on comments returned per post when comments are included. Default 100. |
| `concurrency` | integer | How many publications to process in parallel. Default 5. |
| `proxyConfig` | object | Proxy configuration. Residential is the default and recommended setting for the most reliable results. |

#### Example input

```json
{
  "publications": ["example.substack.com", "news.example.com"],
  "maxItemsPerSource": 200,
  "sort": "new",
  "includeContent": true,
  "includeComments": false,
  "proxyConfig": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }
}
````

### Output

Each post is one clean row. Columns are ordered by importance: identity first, then engagement metrics, then content, then media, then metadata.

```json
{
  "type": "post",
  "id": 123456789,
  "url": "https://example.substack.com/p/an-example-post",
  "slug": "an-example-post",
  "authorName": "Jane Doe",
  "authorHandle": "janedoe",
  "likeCount": 240,
  "commentCount": 18,
  "restackCount": 12,
  "wordCount": 1200,
  "title": "An example post",
  "subtitle": "A short standfirst.",
  "description": "A one line summary.",
  "tags": ["example", "writing"],
  "content": "Full plain-text body when includeContent is on.",
  "coverImage": "https://example.com/cover.png",
  "podcastUrl": null,
  "audience": "everyone",
  "postType": "newsletter",
  "publishedAt": "2026-06-01T12:00:00.000Z"
}
```

Each comment is also one row:

```json
{
  "type": "comment",
  "id": 987654321,
  "postId": 123456789,
  "authorName": "Reader Example",
  "authorHandle": "readerexample",
  "likeCount": 5,
  "restackCount": 0,
  "replyCount": 2,
  "body": "Great piece, thanks for writing this.",
  "createdAt": "2026-06-01T14:30:00.000Z",
  "editedAt": null
}
```

**Key fields:** `type` tells a post row from a comment row. `likeCount`, `commentCount`, and `restackCount` are the engagement metrics. `audience` is `everyone` for free posts or `only_paid` for paywalled ones. `publishedAt` is an ISO timestamp.

### FAQ

**Is it free? How is it priced?**
You pay only for what you scrape, billed per result. There is no separate start fee, so small test runs cost almost nothing and large pulls scale predictably.

**Do I need a Substack login or API key?**
No. It reads only public data, so no account, password, or key is needed.

**How many posts can it return per publication?**
As many as the publication has published. Set `maxItemsPerSource` to your target and the scraper pages through the archive until it reaches that number or runs out of posts.

**Can it scrape paywalled (subscriber-only) post content?**
No. Subscriber-only posts return their public fields (title, author, metrics, description) but not the locked body text, because no login is used. Free posts return their full body when `includeContent` is on.

**Does it work with custom domains?**
Yes. Enter the publication as a username, a `*.substack.com` address, or a custom domain. All three are accepted.

**How fast is it?**
Multiple publications are processed in parallel (set by `concurrency`), and each archive is paged efficiently. A few hundred posts typically complete in well under a minute.

### Notes

- Counts (likes, comments, restacks) reflect their values at scrape time.
- Only public posts and comments are returned.
- Output can be exported as JSON, CSV, Excel, or HTML, or pulled via the Apify API.

# Actor input Schema

## `publications` (type: `array`):

Publications to pull posts from. Use a username, a full publication address, or a custom domain. Example: example, example.substack.com, https://example.substack.com, news.example.com.

## `postUrls` (type: `array`):

Specific post links to fetch. The post body is included. Example: https://example.substack.com/p/an-example-post.

## `maxItemsPerSource` (type: `integer`):

Cap on posts returned per publication. Pagination is walked across multiple pages until this is reached or the archive is exhausted.

## `sort` (type: `string`):

Order of posts in a publication archive. Newest first or top.

## `includeContent` (type: `boolean`):

When on, each post carries a plain-text content body. Single posts from Post URLs always include the body.

## `includeComments` (type: `boolean`):

When on, the comment tree is also fetched for each post. Off by default to keep runs fast and cheap.

## `maxCommentsPerPost` (type: `integer`):

Cap on comments returned per post when comments are included.

## `concurrency` (type: `integer`):

How many sources to process in parallel.

## `proxyConfig` (type: `object`):

Apify proxy. RESIDENTIAL is the default and recommended option for the most reliable results.

## Actor input object example

```json
{
  "publications": [
    "astralcodexten.substack.com"
  ],
  "postUrls": [],
  "maxItemsPerSource": 50,
  "sort": "new",
  "includeContent": false,
  "includeComments": false,
  "maxCommentsPerPost": 100,
  "concurrency": 5,
  "proxyConfig": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "publications": [
        "astralcodexten.substack.com"
    ],
    "postUrls": [],
    "proxyConfig": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ]
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("goat255/substack-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "publications": ["astralcodexten.substack.com"],
    "postUrls": [],
    "proxyConfig": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
    },
}

# Run the Actor and wait for it to finish
run = client.actor("goat255/substack-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "publications": [
    "astralcodexten.substack.com"
  ],
  "postUrls": [],
  "proxyConfig": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}' |
apify call goat255/substack-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=goat255/substack-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Substack Scraper",
        "description": "Scrape Substack posts and comments without a login. Pull a publication's archive of posts by newest or top, a single post by URL, and full comment trees. Walks pagination up to your chosen limit.",
        "version": "0.1",
        "x-build-id": "IG2yV0j8ArIK12lMA"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/goat255~substack-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-goat255-substack-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/goat255~substack-scraper/runs": {
            "post": {
                "operationId": "runs-sync-goat255-substack-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/goat255~substack-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-goat255-substack-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "publications": {
                        "title": "Publications (archive mode)",
                        "type": "array",
                        "description": "Publications to pull posts from. Use a username, a full publication address, or a custom domain. Example: example, example.substack.com, https://example.substack.com, news.example.com.",
                        "default": [
                            "astralcodexten.substack.com"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "postUrls": {
                        "title": "Post URLs (single post mode)",
                        "type": "array",
                        "description": "Specific post links to fetch. The post body is included. Example: https://example.substack.com/p/an-example-post.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxItemsPerSource": {
                        "title": "Max posts per publication",
                        "minimum": 1,
                        "maximum": 50000,
                        "type": "integer",
                        "description": "Cap on posts returned per publication. Pagination is walked across multiple pages until this is reached or the archive is exhausted.",
                        "default": 50
                    },
                    "sort": {
                        "title": "Sort",
                        "enum": [
                            "new",
                            "top"
                        ],
                        "type": "string",
                        "description": "Order of posts in a publication archive. Newest first or top.",
                        "default": "new"
                    },
                    "includeContent": {
                        "title": "Include post body text",
                        "type": "boolean",
                        "description": "When on, each post carries a plain-text content body. Single posts from Post URLs always include the body.",
                        "default": false
                    },
                    "includeComments": {
                        "title": "Include comments",
                        "type": "boolean",
                        "description": "When on, the comment tree is also fetched for each post. Off by default to keep runs fast and cheap.",
                        "default": false
                    },
                    "maxCommentsPerPost": {
                        "title": "Max comments per post",
                        "minimum": 1,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Cap on comments returned per post when comments are included.",
                        "default": 100
                    },
                    "concurrency": {
                        "title": "Concurrency",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "How many sources to process in parallel.",
                        "default": 5
                    },
                    "proxyConfig": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Apify proxy. RESIDENTIAL is the default and recommended option for the most reliable results.",
                        "default": {
                            "useApifyProxy": true,
                            "apifyProxyGroups": [
                                "RESIDENTIAL"
                            ]
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
