# Substack Newsletter Scraper & Analytics (`buildtolaunch/substack-creator-research`) Actor

Scrape and analyze any Substack newsletter in seconds. Get engagement rate, reaction trends, paywall ratio, and publishing cadence. The only Substack analytics scraper that computes engagement rate — compare newsletters regardless of size. Built for creator research and competitive analysis.

- **URL**: https://apify.com/buildtolaunch/substack-creator-research.md
- **Developed by:** [Jenny Ouyang](https://apify.com/buildtolaunch) (community)
- **Categories:** Social media, Lead generation
- **Stats:** 2 total users, 0 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

from $2.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Substack Creator Research Tool

Research any Substack newsletter in seconds. Drop in a list of URLs and get back engagement metrics, monetization signals, publishing cadence, and full article content — everything you need to evaluate a creator before a sponsorship, partnership, or competitive analysis.

**Most Substack scrapers give you content. This one tells you whether anyone is actually reading it.**

---

### What It Does

Pass a list of newsletter URLs. For each one, the Actor pulls:

- **Engagement quality** — avg reactions, comments, restacks, and engagement rate (reactions per subscriber). The metric that separates a 50K-subscriber ghost town from a 5K-subscriber community.
- **Engagement trend** — the direction of reactions across recent posts: the 5 most recent posts old enough for reactions to have settled (>14 days) vs the 5 before them (`up` / `down` / `flat`, or `null` below 10 matured posts). Only matured posts are compared, so a newsletter isn't flagged "down" just because its freshest posts haven't collected reactions yet. Directional, not a precise score — one viral post can tilt it.
- **Publishing cadence** — posts per week + consistency score (very_consistent / consistent / irregular)
- **Monetization model** — paywall ratio, paid vs free post count, podcast format, voiceover usage
- **Content depth** — average wordcount across recent posts
- **Top performer** — the single highest-reaction post with its URL
- **Full article content** — HTML or plain text for all free posts (optional, great for LLM analysis)

---

### Output Modes

| Mode | Output | Best For |
|---|---|---|
| `analytics` (default) | One summary row per newsletter | Comparing 10–100 newsletters at once |
| `posts` | One row per article | Content analysis, NLP, LLM processing |
| `full` | Both — summary + all articles | Complete newsletter snapshot |

All rows include a `row_type` field (`"newsletter"` or `"post"`) so you can split them in downstream tools.

---

### Input

| Field | Type | Default | Description |
|---|---|---|---|
| `newsletterUrls` | array | — | Newsletter URLs or plain subdomains (e.g. `platformer` or `https://platformer.substack.com`) |
| `outputMode` | string | `analytics` | `analytics`, `posts`, or `full` |
| `maxPostsPerNewsletter` | integer | 25 | 1–100. In analytics mode this drives the engagement averages. |
| `bodyFormat` | string | `text` | `text` (clean plain text), `html` (raw Substack HTML), or `both` |
| `delayMs` | integer | 1000 | Milliseconds between newsletters. Increase if you hit rate limits. |
| `proxyConfiguration` | object | Auto (datacenter) | Proxy for requests. Apify's automatic datacenter proxy is the default: fast and reliable for Substack's HTTP API. Switch to Residential only if you hit blocks. |

---

### Sample Output (analytics mode)

```json
{
  "row_type": "newsletter",
  "subdomain": "lenny",
  "url": "https://lenny.substack.com",
  "publication_name": "Lenny's Newsletter",
  "author_name": "Lenny Rachitsky",
  "subscriber_count": 1200000,
  "avg_reactions": 395.4,
  "avg_comments": 42.1,
  "avg_restacks": 18.3,
  "engagement_rate": 0.033,
  "avg_wordcount": 2957,
  "paywall_ratio": 0.8,
  "paid_post_count": 20,
  "free_post_count": 5,
  "posts_per_week": 0.9,
  "posting_consistency": "consistent",
  "engagement_trend": "up",
  "monetized": true,
  "top_post": {
    "title": "How to get your first 1,000 users",
    "url": "https://lenny.substack.com/p/how-to-get-your-first-1000-users",
    "reactions": 2841,
    "comments": 187
  },
  "tags": ["product", "growth", "startups"],
  "scraped_at": "2026-06-05T08:00:00+00:00"
}
````

***

### Use Cases

**Sponsorship research** — Before paying $5K for a newsletter spot, verify the audience is actually engaged. Compare engagement rate (reactions/subscribers) across 20 newsletters in one run.

**Competitive analysis** — Map your category: who publishes how often, who's monetized, who dominates on reactions vs wordcount.

**Creator outreach lists** — Filter by `monetized: true`, `posts_per_week > 2`, and `subscriber_count > 10000` to find newsletters worth approaching.

**Content research** — Pull full article text from 50 newsletters in `posts` mode and feed to an LLM for topic mapping, writing style analysis, or gap analysis.

**Partnership discovery** — Use `tags` and engagement metrics together to find category-adjacent newsletters with real audiences.

***

### Use with Claude (MCP)

This Actor works as a tool inside Claude, Cursor, or any MCP client through Apify's MCP server — so you can research newsletters in plain language without leaving your chat.

Add it to Claude Desktop (`claude_desktop_config.json`):

```json
{
  "mcpServers": {
    "apify": {
      "command": "npx",
      "args": ["-y", "@apify/actors-mcp-server", "--tools", "buildtolaunch/substack-creator-research"],
      "env": { "APIFY_TOKEN": "your-apify-token" }
    }
  }
}
```

Then just ask:

> "Pull engagement analytics for lenny, platformer, and bensbites — rank them by engagement rate and tell me which one is trending up."

Claude calls the Actor, gets back the computed metrics (engagement rate, trend, paywall ratio, cadence), and does the ranking for you. No CSV exports, no manual math.

***

### How Engagement Rate Works

Raw reaction counts don't tell you much on their own. A newsletter with 50K subscribers getting 400 reactions is performing worse than a newsletter with 5K subscribers getting 200 reactions.

Engagement rate = `avg_reactions / subscriber_count × 100`

This Actor computes it automatically. Industry ballpark: >0.1% is solid, >0.3% is strong, >0.5% is exceptional.

***

### Notes

- **Paid post content**: Substack enforces the paywall server-side. Paid posts return `content_available: false` and `body_text: null`. Free post content is fully extracted.
- **Subscriber count**: Extracted from the newsletter homepage where visible. Some newsletters hide this; those will return `subscriber_count: null` and `engagement_rate: null`.
- **Custom domains**: Works with plain subdomains (`platformer`), full Substack URLs (`https://platformer.substack.com`), or custom domains that resolve to Substack.
- **Rate limiting**: Default 1s delay between newsletters. Increase `delayMs` to 2000–3000 for large batches.
- **Proxy**: Requests route through Apify's automatic datacenter proxy by default, which is fast and reliable for Substack's HTTP API. A fresh IP is used per newsletter, and any request that fails through the proxy automatically retries on a direct connection, so one bad IP never drops a newsletter. Switch to Residential in `proxyConfiguration` only if you run into blocks.

***

### Tech

Python 3.11 · requests · No browser / Playwright required · HTTP-only for fast execution

***

Built by [Build to Launch](https://buildtolaunch.ai). AI systems for one-person businesses.

# Actor input Schema

## `newsletterUrls` (type: `array`):

Substack newsletter URLs or subdomains to research. Accepts full URLs (https://platformer.substack.com) or plain subdomains (platformer).

## `outputMode` (type: `string`):

analytics — one row per newsletter with engagement metrics and creator signals. posts — one row per article with full content. full — both: a summary row followed by article rows for each newsletter.

## `maxPostsPerNewsletter` (type: `integer`):

How many recent posts to pull per newsletter. In analytics mode this drives the engagement averages. In posts/full mode each post becomes one output row.

## `bodyFormat` (type: `string`):

Format for article body when outputMode is posts or full. text is clean plain text (best for NLP/LLMs). html is raw Substack HTML. both includes text and html fields.

## `delayMs` (type: `integer`):

Milliseconds to wait between processing newsletters. Increase if you encounter rate limiting.

## `proxyConfiguration` (type: `object`):

Apify's automatic (datacenter) proxy is the default: fast and reliable for Substack's HTTP API. Switch to Residential here only if you hit blocks. Requests fall back to a direct connection if the proxy fails.

## Actor input object example

```json
{
  "newsletterUrls": [
    "https://platformer.substack.com",
    "https://lenny.substack.com"
  ],
  "outputMode": "analytics",
  "maxPostsPerNewsletter": 25,
  "bodyFormat": "text",
  "delayMs": 1000,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# Actor output Schema

## `analytics` (type: `string`):

One row per newsletter: engagement rate, reaction trend, paywall ratio, posting cadence, and the top post. The default output.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "newsletterUrls": [
        "https://platformer.substack.com",
        "https://lenny.substack.com"
    ],
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("buildtolaunch/substack-creator-research").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "newsletterUrls": [
        "https://platformer.substack.com",
        "https://lenny.substack.com",
    ],
    "proxyConfiguration": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("buildtolaunch/substack-creator-research").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "newsletterUrls": [
    "https://platformer.substack.com",
    "https://lenny.substack.com"
  ],
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}' |
apify call buildtolaunch/substack-creator-research --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=buildtolaunch/substack-creator-research",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Substack Newsletter Scraper & Analytics",
        "description": "Scrape and analyze any Substack newsletter in seconds. Get engagement rate, reaction trends, paywall ratio, and publishing cadence. The only Substack analytics scraper that computes engagement rate — compare newsletters regardless of size. Built for creator research and competitive analysis.",
        "version": "0.2",
        "x-build-id": "abuadUxmjSIS5de9g"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/buildtolaunch~substack-creator-research/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-buildtolaunch-substack-creator-research",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/buildtolaunch~substack-creator-research/runs": {
            "post": {
                "operationId": "runs-sync-buildtolaunch-substack-creator-research",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/buildtolaunch~substack-creator-research/run-sync": {
            "post": {
                "operationId": "run-sync-buildtolaunch-substack-creator-research",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "newsletterUrls": {
                        "title": "Newsletter URLs",
                        "type": "array",
                        "description": "Substack newsletter URLs or subdomains to research. Accepts full URLs (https://platformer.substack.com) or plain subdomains (platformer).",
                        "items": {
                            "type": "string"
                        }
                    },
                    "outputMode": {
                        "title": "Output mode",
                        "enum": [
                            "analytics",
                            "posts",
                            "full"
                        ],
                        "type": "string",
                        "description": "analytics — one row per newsletter with engagement metrics and creator signals. posts — one row per article with full content. full — both: a summary row followed by article rows for each newsletter.",
                        "default": "analytics"
                    },
                    "maxPostsPerNewsletter": {
                        "title": "Max posts per newsletter",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "How many recent posts to pull per newsletter. In analytics mode this drives the engagement averages. In posts/full mode each post becomes one output row.",
                        "default": 25
                    },
                    "bodyFormat": {
                        "title": "Article body format",
                        "enum": [
                            "text",
                            "html",
                            "both"
                        ],
                        "type": "string",
                        "description": "Format for article body when outputMode is posts or full. text is clean plain text (best for NLP/LLMs). html is raw Substack HTML. both includes text and html fields.",
                        "default": "text"
                    },
                    "delayMs": {
                        "title": "Delay between newsletters (ms)",
                        "minimum": 500,
                        "maximum": 5000,
                        "type": "integer",
                        "description": "Milliseconds to wait between processing newsletters. Increase if you encounter rate limiting.",
                        "default": 1000
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Apify's automatic (datacenter) proxy is the default: fast and reliable for Substack's HTTP API. Switch to Residential here only if you hit blocks. Requests fall back to a direct connection if the proxy fails.",
                        "default": {
                            "useApifyProxy": true
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
