# Substack Scraper — Publication Posts | $1.50/1K (`bovi/substack-publication`) Actor

Scrape any Substack newsletter's post list via the official Substack public API. No auth, no proxy. Title, subtitle, date, free/paid audience, type, reactions, restacks, podcast\_url. Podcast posts billed at premium rate ($2.50/1K). Pay per post.

- **URL**: https://apify.com/bovi/substack-publication.md
- **Developed by:** [Vitalii Bondarev](https://apify.com/bovi) (community)
- **Categories:** Marketing, News, Social media
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.00 / 1,000 listings

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Substack Scraper — Publication Posts & Metadata | $1.50/1K | No Auth, Official API

For newsletter researchers, content agencies, competitive intelligence teams, and AI pipelines that need Substack content at scale.

**Pricing: $1.50 per 1,000 post records · $2.50 per 1,000 podcast posts** (posts where `type=podcast` and `podcast_url` is present — audio file URL included). No monthly fees. No authentication required.

Scrape any Substack publication's post listing via the **official public REST API** — no authentication, no proxy, no browser required. Returns structured metadata for every post: title, subtitle, publish date, audience (free vs paid), post type, reactions, comments, restacks, cover image, wordcount, and canonical URL.

**Pay per post returned** (PPE pricing).

---

### What you get

| Field | Description |
|---|---|
| `title` | Post title |
| `subtitle` | Deck / tagline |
| `canonical_url` | Full URL to the post |
| `slug` | URL slug |
| `post_date` | Published timestamp (ISO 8601 UTC) |
| `audience` | `everyone` (free) or `only_paid` (paywalled) |
| `type` | `newsletter`, `podcast`, `video`, etc. |
| `podcast_url` | Audio file URL (podcast posts only) |
| `reactions_count` | Total hearts |
| `comment_count` | Number of comments |
| `restacks` | Number of Substack reposts |
| `cover_image` | Cover image URL |
| `wordcount` | Approximate word count |
| `publication_slug` | Short publication identifier |
| `parse_confidence` | Data quality score 0–1 |
| `warnings` | List of missing-field codes |

> **Note:** Post bodies (full text / HTML) are not returned by the listing API. Paywalled posts return metadata only — body content requires a paid subscription and is not scraped.

---

### Pricing example

**$1.50 per 1,000 newsletter posts · $2.50 per 1,000 podcast posts** (posts where `type=podcast` and `podcast_url` is present). A 500-post archive = $0.75. Scraping 5 newsletters × 200 posts = $1.50. No per-run fee.

### Sample output

```json
{
  "title": "The Collapse of Web Scraping",
  "subtitle": "Why every major site now requires a browser — and what to do about it",
  "canonical_url": "https://on.substack.com/p/the-collapse-of-web-scraping",
  "post_date": "2026-05-18T14:00:00Z",
  "audience": "everyone",
  "type": "newsletter",
  "podcast_url": null,
  "reactions_count": 312,
  "comment_count": 47,
  "restacks": 89,
  "wordcount": 2100,
  "publication_slug": "on",
  "parse_confidence": 1.0,
  "scraped_at": "2026-06-05T09:00:00Z"
}
````

### Frequently asked questions

**Do I need a Substack account or API key?**
No. The actor uses the official Substack public listing API — no authentication required for public publications.

**Do I need a proxy?**
No. The Substack API is open. Zero proxy cost to you.

**What formats does the output come in?**
JSON, CSV, and Excel via the Apify dataset. Native integration with n8n, Make, Zapier.

**What if the publication returns a 403 or empty results?**
Some publications restrict their API access (e.g. bankless.substack.com). The actor logs the error and exits cleanly — it pushes nothing and charges nothing. Switch to a different slug and try again.

### Input

| Parameter | Type | Default | Description |
|---|---|---|---|
| `publication` | string | `on` | Slug (e.g. `on`), full URL (e.g. `https://on.substack.com`), or custom domain |
| `maxPosts` | integer | `100` | Max posts to return. `0` = no limit (fetch entire archive) |

#### Publication examples

```
on                          →  on.substack.com
bankless                    →  bankless.substack.com
https://on.substack.com     →  on.substack.com (same)
https://platformer.news     →  custom domain (supported)
```

***

### How it works

Uses the Substack per-publication public REST endpoint:

```
GET https://<publication>.substack.com/api/v1/posts?offset=0&limit=50
```

Paginates via `offset` until all posts are retrieved or `maxPosts` is reached. No auth headers needed. No proxy required for public publications.

***

### Our edge over incumbents

- **Reliable pagination** — offset-based, not page-based; survives large archives.
- **Reactions normalized** — raw `reactions` dict summed to `reactions_count` (compatible with publications adding new reaction types).
- **`parse_confidence` score** — every record includes a data-quality score and `warnings` list so you can detect schema drift without re-running.
- **`restacks` field** — Substack's repost count, absent from most competitor actors.
- **Custom domain support** — not just `*.substack.com` slugs.

***

### Limitations

- **Post bodies not included** — listing API returns metadata only. Full HTML/Markdown bodies require the individual post endpoint (not in this actor's scope).
- **Paywalled posts** — metadata is returned for all posts, but body content is not accessible without a paid subscription.
- **Publications blocking API access** — some publications (e.g. bankless.substack.com) return 403; this is a publication-level restriction, not a Substack platform restriction.

***

### Competitor comparison

| | This scraper | Other Substack actors |
|---|:---:|:---:|
| Official Substack API | ✓ | partial |
| `restacks` field | ✓ | ✗ |
| Custom domain support | ✓ | ✗ |
| `podcast_url` field | ✓ | ✗ |
| parse\_confidence on every record | ✓ | ✗ |
| No auth required | ✓ | ✓ |

### Podcast use case

The `podcast_url` field makes this actor useful for extracting Substack podcast episode lists without an RSS parser. Filter records where `type == "podcast"` and `podcast_url` is non-null.

Podcast posts are billed at the `podcast-post` premium event ($2.50/1K, set in the Apify console) because they include a direct audio file URL — useful for media monitoring tools, podcast discovery platforms, and content aggregators that need the actual audio stream. Regular newsletter posts are billed at the standard `post-item` rate ($1.50/1K).

### Monitoring use case

Track a competitor's newsletter for new posts — run daily and filter by `post_date` to see only new content. Set `maxPosts=20` for fast incremental runs.

### Use with AI Agents (MCP)

This Substack scraper is callable as a **tool by AI agents** (Claude Desktop, Cursor, VS Code, n8n, or any MCP-compatible client) via Apify's hosted Model Context Protocol server.

```json
{
  "mcpServers": {
    "apify": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "https://mcp.apify.com/?tools=bovi/substack-publication",
        "--header",
        "Authorization: Bearer <YOUR_APIFY_TOKEN>"
      ]
    }
  }
}
```

### Integrations

Built for newsletter researchers, content agencies, and AI-pipeline teams ingesting Substack post metadata at scale — the JSON/dataset output drops into the tools you already run, no glue code:

- **n8n / Make / Zapier** — trigger a run or pipe every new dataset item into 500+ apps (Google Sheets, Airtable, Slack, HubSpot, your database) with no code: [n8n](https://docs.apify.com/platform/integrations/n8n), [Make](https://docs.apify.com/platform/integrations/make), [Zapier](https://docs.apify.com/platform/integrations/zapier).
- **Webhooks** — fire your own endpoint the moment a run finishes, to push results straight into your pipeline ([docs](https://docs.apify.com/platform/integrations/webhooks)).
- **MCP server** — expose this actor as a tool to Claude, Cursor, or any [MCP client](https://mcp.apify.com) so an AI agent can pull this data mid-conversation ([guide](https://blog.apify.com/how-to-use-mcp/)).
- **API & SDKs** — fetch the dataset as JSON, CSV, or Excel through the Apify REST API or the Python / JS SDKs.

See all [Apify integrations](https://apify.com/integrations).

# Actor input Schema

## `publication` (type: `string`):

Substack publication to scrape. Accepts a slug (e.g. 'on'), a full URL (e.g. 'https://on.substack.com'), or a custom domain (e.g. 'https://www.platformer.news'). Required.

## `maxPosts` (type: `integer`):

Maximum number of posts to return. 0 = no limit (fetch all). Default 100.

## Actor input object example

```json
{
  "publication": "on",
  "maxPosts": 100
}
```

# Actor output Schema

## `results` (type: `string`):

Dataset containing Substack Publication records (title, subtitle, audience, type, reactions\_count, comment\_count, restacks, wordcount, publication\_slug, canonical\_url, post\_date, parse\_confidence).

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "publication": "on",
    "maxPosts": 100
};

// Run the Actor and wait for it to finish
const run = await client.actor("bovi/substack-publication").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "publication": "on",
    "maxPosts": 100,
}

# Run the Actor and wait for it to finish
run = client.actor("bovi/substack-publication").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "publication": "on",
  "maxPosts": 100
}' |
apify call bovi/substack-publication --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=bovi/substack-publication",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Substack Scraper — Publication Posts | $1.50/1K",
        "description": "Scrape any Substack newsletter's post list via the official Substack public API. No auth, no proxy. Title, subtitle, date, free/paid audience, type, reactions, restacks, podcast_url. Podcast posts billed at premium rate ($2.50/1K). Pay per post.",
        "version": "0.1",
        "x-build-id": "wbv0gOk5CdbYYUvju"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/bovi~substack-publication/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-bovi-substack-publication",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/bovi~substack-publication/runs": {
            "post": {
                "operationId": "runs-sync-bovi-substack-publication",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/bovi~substack-publication/run-sync": {
            "post": {
                "operationId": "run-sync-bovi-substack-publication",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "publication"
                ],
                "properties": {
                    "publication": {
                        "title": "Publication",
                        "type": "string",
                        "description": "Substack publication to scrape. Accepts a slug (e.g. 'on'), a full URL (e.g. 'https://on.substack.com'), or a custom domain (e.g. 'https://www.platformer.news'). Required."
                    },
                    "maxPosts": {
                        "title": "Max posts",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of posts to return. 0 = no limit (fetch all). Default 100."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
