# Substack Scraper – Newsletter Posts, Engagement & Monitoring (`bitofacoder/substack-scraper`) Actor

Scrape any Substack newsletter's full post archive with engagement metadata (likes, comments, paywall status, word count, authors), fetch single posts, and monitor newsletters incrementally — via Substack's public JSON API. No login.

- **URL**: https://apify.com/bitofacoder/substack-scraper.md
- **Developed by:** [Bobby](https://apify.com/bitofacoder) (community)
- **Categories:** Lead generation, Social media, News
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Substack Scraper – Newsletter Posts, Engagement & Monitoring

Give it the Substack newsletters you care about and pull their full post history with the numbers that matter — likes, comments, paywall status, word count, and authors — as a clean CSV/JSON. Then keep it fresh: **monitor mode** returns only the posts published since your last run, so a scheduled task becomes a live feed instead of a re-scrape.

No login, no cookies. Works with both `name.substack.com` subdomains and custom domains (e.g. `www.thefp.com`).

> **You provide the newsletter URLs** — this Actor scrapes publications you specify, it does not discover newsletters by topic/keyword (see *What this does — and doesn't — return* below).

### What you can do

| Mode | What it does | Fields it uses |
|---|---|---|
| **Publication archive** | Every post from one or more newsletters, newest-first (or top). | `publicationUrls`, `sort`, `maxItems`, `includePostBody` |
| **Single posts** | Full detail (incl. body + word count) for specific post URLs. | `postUrls` |
| **Monitor** *(incremental)* | Only posts published **since the last run**, per newsletter. State persists between runs. | `publicationUrls`, `maxItems`, `monitorStoreName` |

### Example input

Scrape the latest 50 posts from two newsletters:

```json
{
  "mode": "publication",
  "publicationUrls": ["astralcodexten.substack.com", "https://www.thefp.com"],
  "maxItems": 50
}
````

Track new posts daily (schedule this run; each run only returns what's new):

```json
{
  "mode": "monitor",
  "publicationUrls": ["platformer", "stratechery.com"],
  "maxItems": 50
}
```

### Output (per post)

```json
{
  "type": "post",
  "title": "Preliminary Thoughts On The Midjourney Scanner",
  "publication": "www.astralcodexten.com",
  "url": "https://www.astralcodexten.com/p/preliminary-thoughts-on-the-midjourney",
  "postDate": "2026-06-19T...Z",
  "postType": "newsletter",
  "audience": "everyone",
  "isPaywalled": false,
  "reactionCount": 270,
  "commentCount": 166,
  "wordcount": 3327,
  "authors": [{ "name": "Scott Alexander", "handle": "...", "url": "https://substack.com/@..." }],
  "authorNames": "Scott Alexander"
}
```

### What this does — and doesn't — return

To keep expectations honest:

- ✅ **Public, structured post data**: titles, subtitles, slugs, URLs, publish dates, post type (newsletter / podcast / thread), audience + paywall flag, public like counts, public comment counts, word counts, authors, cover images. Optional full HTML body for non-paywalled posts.
- ❌ **No private analytics.** Substack does not expose open rates, click rates, or private subscriber counts publicly, so this Actor cannot return them. Anyone promising those from a no-login scraper is over-promising.
- ❌ **No email/contact extraction.** This is a content-intelligence tool, not a lead-gen scraper.
- ❌ **No keyword/topic discovery.** Substack's global search endpoint is gated, so this Actor scrapes newsletters you name rather than finding them by topic. Point it at the publications you already know.
- ⚠️ **Paywalled posts return a preview body only.** For `only_paid`/`founding` posts, `wordcount` reflects the full article but the fetched `bodyHtml` is just the free preview — the Actor flags these with `isPreviewOnly: true` so you can filter them. All public metadata (title, engagement counts, audience, authors) is still returned.

### Notes

- **Custom domains just work.** Migrated newsletters 301-redirect from their `*.substack.com` subdomain; the Actor follows the redirect automatically.
- **Proxy is off by default.** Substack's public API is open; enable Apify Proxy only for large runs to spread the per-IP rate limit.
- **`maxTotalItems`** caps results (and charges) across the whole run, independent of the per-newsletter `maxItems`.

### Pricing

Pay per result — you're charged once per post stored to the dataset. Cap any run with `maxTotalItems`.

# Actor input Schema

## `mode` (type: `string`):

What to scrape. Each mode uses a different set of fields below.

## `publicationUrls` (type: `array`):

Used by 'publication' and 'monitor' modes. A Substack subdomain (acx.substack.com), a custom domain (www.thefp.com), or a bare name (platformer → platformer.substack.com). One run scrapes each.

## `sort` (type: `string`):

For 'publication' mode: archive order. 'new' = newest first, 'top' = most popular.

## `includePostBody` (type: `boolean`):

Fetch each post's full HTML body (one extra HTTP request per post — slower and uses more compute, though it does NOT add billable result items). Full body is only available for free posts; paywalled posts return the free preview only and are flagged with isPreviewOnly.

## `postUrls` (type: `array`):

Used by 'post' mode. Full Substack post URLs, e.g. \['https://www.astralcodexten.com/p/open-thread-439'].

## `monitorStoreName` (type: `string`):

Named key-value store that holds per-newsletter watermarks across runs. IMPORTANT: give each separate monitor task its OWN unique name — two tasks sharing a name (including this default) will overwrite each other's watermarks, and concurrent runs on the same name can lose new posts. Use the same name across runs of the SAME task to keep tracking incrementally.

## `resetMonitorState` (type: `boolean`):

For 'monitor' mode: clear the saved watermarks before running so this run re-baselines (re-emits recent posts) instead of returning only what's new. Use to start fresh on a newsletter.

## `maxItems` (type: `integer`):

Upper limit of posts per newsletter (per newsletter in monitor mode). Default is kept low for a fast, cheap first run — raise it to pull more; the full archive of a large newsletter can be hundreds of posts.

## `maxTotalItems` (type: `integer`):

Hard ceiling on the total number of results — and pay-per-result charges — across ALL newsletters in one run. Leave empty for no overall cap. 'Max posts per newsletter' above still applies to each.

## `requestDelayMs` (type: `integer`):

Optional pause before each HTTP request. Leave at 0 for normal runs. Raise to ~200–500 for very large archives or includePostBody runs (hundreds of requests to one site) when running without a proxy, to avoid rate limiting.

## `proxyConfiguration` (type: `object`):

Off by default — Substack's public API is open and small runs don't need a proxy. Enable Apify Proxy to rotate IPs and spread the per-IP rate limit on large runs.

## Actor input object example

```json
{
  "mode": "publication",
  "publicationUrls": [
    "astralcodexten.substack.com"
  ],
  "sort": "new",
  "includePostBody": false,
  "postUrls": [
    "https://www.astralcodexten.com/p/open-thread-439"
  ],
  "monitorStoreName": "substack-monitor-state",
  "resetMonitorState": false,
  "maxItems": 10,
  "requestDelayMs": 0,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "publicationUrls": [
        "astralcodexten.substack.com"
    ],
    "postUrls": [
        "https://www.astralcodexten.com/p/open-thread-439"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("bitofacoder/substack-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "publicationUrls": ["astralcodexten.substack.com"],
    "postUrls": ["https://www.astralcodexten.com/p/open-thread-439"],
}

# Run the Actor and wait for it to finish
run = client.actor("bitofacoder/substack-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "publicationUrls": [
    "astralcodexten.substack.com"
  ],
  "postUrls": [
    "https://www.astralcodexten.com/p/open-thread-439"
  ]
}' |
apify call bitofacoder/substack-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=bitofacoder/substack-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Substack Scraper – Newsletter Posts, Engagement & Monitoring",
        "description": "Scrape any Substack newsletter's full post archive with engagement metadata (likes, comments, paywall status, word count, authors), fetch single posts, and monitor newsletters incrementally — via Substack's public JSON API. No login.",
        "version": "0.1",
        "x-build-id": "CM3H9qft2ARNLA5Uv"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/bitofacoder~substack-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-bitofacoder-substack-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/bitofacoder~substack-scraper/runs": {
            "post": {
                "operationId": "runs-sync-bitofacoder-substack-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/bitofacoder~substack-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-bitofacoder-substack-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "mode"
                ],
                "properties": {
                    "mode": {
                        "title": "Mode",
                        "enum": [
                            "publication",
                            "post",
                            "monitor"
                        ],
                        "type": "string",
                        "description": "What to scrape. Each mode uses a different set of fields below.",
                        "default": "publication"
                    },
                    "publicationUrls": {
                        "title": "Newsletter URLs",
                        "type": "array",
                        "description": "Used by 'publication' and 'monitor' modes. A Substack subdomain (acx.substack.com), a custom domain (www.thefp.com), or a bare name (platformer → platformer.substack.com). One run scrapes each.",
                        "default": [
                            "astralcodexten.substack.com"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "sort": {
                        "title": "Sort order",
                        "enum": [
                            "new",
                            "top"
                        ],
                        "type": "string",
                        "description": "For 'publication' mode: archive order. 'new' = newest first, 'top' = most popular.",
                        "default": "new"
                    },
                    "includePostBody": {
                        "title": "Include full post body",
                        "type": "boolean",
                        "description": "Fetch each post's full HTML body (one extra HTTP request per post — slower and uses more compute, though it does NOT add billable result items). Full body is only available for free posts; paywalled posts return the free preview only and are flagged with isPreviewOnly.",
                        "default": false
                    },
                    "postUrls": {
                        "title": "Post URLs",
                        "type": "array",
                        "description": "Used by 'post' mode. Full Substack post URLs, e.g. ['https://www.astralcodexten.com/p/open-thread-439'].",
                        "items": {
                            "type": "string"
                        }
                    },
                    "monitorStoreName": {
                        "title": "Monitor state store name",
                        "type": "string",
                        "description": "Named key-value store that holds per-newsletter watermarks across runs. IMPORTANT: give each separate monitor task its OWN unique name — two tasks sharing a name (including this default) will overwrite each other's watermarks, and concurrent runs on the same name can lose new posts. Use the same name across runs of the SAME task to keep tracking incrementally.",
                        "default": "substack-monitor-state"
                    },
                    "resetMonitorState": {
                        "title": "Reset monitor state",
                        "type": "boolean",
                        "description": "For 'monitor' mode: clear the saved watermarks before running so this run re-baselines (re-emits recent posts) instead of returning only what's new. Use to start fresh on a newsletter.",
                        "default": false
                    },
                    "maxItems": {
                        "title": "Max posts per newsletter",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Upper limit of posts per newsletter (per newsletter in monitor mode). Default is kept low for a fast, cheap first run — raise it to pull more; the full archive of a large newsletter can be hundreds of posts.",
                        "default": 10
                    },
                    "maxTotalItems": {
                        "title": "Max items total (whole run)",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Hard ceiling on the total number of results — and pay-per-result charges — across ALL newsletters in one run. Leave empty for no overall cap. 'Max posts per newsletter' above still applies to each."
                    },
                    "requestDelayMs": {
                        "title": "Delay between requests (ms)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Optional pause before each HTTP request. Leave at 0 for normal runs. Raise to ~200–500 for very large archives or includePostBody runs (hundreds of requests to one site) when running without a proxy, to avoid rate limiting.",
                        "default": 0
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Off by default — Substack's public API is open and small runs don't need a proxy. Enable Apify Proxy to rotate IPs and spread the per-IP rate limit on large runs.",
                        "default": {
                            "useApifyProxy": false
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
