Substack Newsletter Scraper & Analytics
Pricing
from $2.00 / 1,000 results
Substack Newsletter Scraper & Analytics
Scrape and analyze any Substack newsletter in seconds. Get engagement rate, reaction trends, paywall ratio, and publishing cadence. The only Substack analytics scraper that computes engagement rate — compare newsletters regardless of size. Built for creator research and competitive analysis.
Pricing
from $2.00 / 1,000 results
Rating
5.0
(1)
Developer
Jenny Ouyang
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
0
Monthly active users
4 days ago
Last modified
Categories
Share
Substack Creator Research Tool
Research any Substack newsletter in seconds. Drop in a list of URLs and get back engagement metrics, monetization signals, publishing cadence, and full article content — everything you need to evaluate a creator before a sponsorship, partnership, or competitive analysis.
Most Substack scrapers give you content. This one tells you whether anyone is actually reading it.
What It Does
Pass a list of newsletter URLs. For each one, the Actor pulls:
- Engagement quality — avg reactions, comments, restacks, and engagement rate (reactions per subscriber). The metric that separates a 50K-subscriber ghost town from a 5K-subscriber community.
- Engagement trend — the direction of reactions across recent posts: the 5 most recent posts old enough for reactions to have settled (>14 days) vs the 5 before them (
up/down/flat, ornullbelow 10 matured posts). Only matured posts are compared, so a newsletter isn't flagged "down" just because its freshest posts haven't collected reactions yet. Directional, not a precise score — one viral post can tilt it. - Publishing cadence — posts per week + consistency score (very_consistent / consistent / irregular)
- Monetization model — paywall ratio, paid vs free post count, podcast format, voiceover usage
- Content depth — average wordcount across recent posts
- Top performer — the single highest-reaction post with its URL
- Full article content — HTML or plain text for all free posts (optional, great for LLM analysis)
Output Modes
| Mode | Output | Best For |
|---|---|---|
analytics (default) | One summary row per newsletter | Comparing 10–100 newsletters at once |
posts | One row per article | Content analysis, NLP, LLM processing |
full | Both — summary + all articles | Complete newsletter snapshot |
All rows include a row_type field ("newsletter" or "post") so you can split them in downstream tools.
Input
| Field | Type | Default | Description |
|---|---|---|---|
newsletterUrls | array | — | Newsletter URLs or plain subdomains (e.g. platformer or https://platformer.substack.com) |
outputMode | string | analytics | analytics, posts, or full |
maxPostsPerNewsletter | integer | 25 | 1–100. In analytics mode this drives the engagement averages. |
bodyFormat | string | text | text (clean plain text), html (raw Substack HTML), or both |
delayMs | integer | 1000 | Milliseconds between newsletters. Increase if you hit rate limits. |
proxyConfiguration | object | Auto (datacenter) | Proxy for requests. Apify's automatic datacenter proxy is the default: fast and reliable for Substack's HTTP API. Switch to Residential only if you hit blocks. |
Sample Output (analytics mode)
{"row_type": "newsletter","subdomain": "lenny","url": "https://lenny.substack.com","publication_name": "Lenny's Newsletter","author_name": "Lenny Rachitsky","subscriber_count": 1200000,"avg_reactions": 395.4,"avg_comments": 42.1,"avg_restacks": 18.3,"engagement_rate": 0.033,"avg_wordcount": 2957,"paywall_ratio": 0.8,"paid_post_count": 20,"free_post_count": 5,"posts_per_week": 0.9,"posting_consistency": "consistent","engagement_trend": "up","monetized": true,"top_post": {"title": "How to get your first 1,000 users","url": "https://lenny.substack.com/p/how-to-get-your-first-1000-users","reactions": 2841,"comments": 187},"tags": ["product", "growth", "startups"],"scraped_at": "2026-06-05T08:00:00+00:00"}
Use Cases
Sponsorship research — Before paying $5K for a newsletter spot, verify the audience is actually engaged. Compare engagement rate (reactions/subscribers) across 20 newsletters in one run.
Competitive analysis — Map your category: who publishes how often, who's monetized, who dominates on reactions vs wordcount.
Creator outreach lists — Filter by monetized: true, posts_per_week > 2, and subscriber_count > 10000 to find newsletters worth approaching.
Content research — Pull full article text from 50 newsletters in posts mode and feed to an LLM for topic mapping, writing style analysis, or gap analysis.
Partnership discovery — Use tags and engagement metrics together to find category-adjacent newsletters with real audiences.
Use with Claude (MCP)
This Actor works as a tool inside Claude, Cursor, or any MCP client through Apify's MCP server — so you can research newsletters in plain language without leaving your chat.
Add it to Claude Desktop (claude_desktop_config.json):
{"mcpServers": {"apify": {"command": "npx","args": ["-y", "@apify/actors-mcp-server", "--tools", "buildtolaunch/substack-creator-research"],"env": { "APIFY_TOKEN": "your-apify-token" }}}}
Then just ask:
"Pull engagement analytics for lenny, platformer, and bensbites — rank them by engagement rate and tell me which one is trending up."
Claude calls the Actor, gets back the computed metrics (engagement rate, trend, paywall ratio, cadence), and does the ranking for you. No CSV exports, no manual math.
How Engagement Rate Works
Raw reaction counts don't tell you much on their own. A newsletter with 50K subscribers getting 400 reactions is performing worse than a newsletter with 5K subscribers getting 200 reactions.
Engagement rate = avg_reactions / subscriber_count × 100
This Actor computes it automatically. Industry ballpark: >0.1% is solid, >0.3% is strong, >0.5% is exceptional.
Notes
- Paid post content: Substack enforces the paywall server-side. Paid posts return
content_available: falseandbody_text: null. Free post content is fully extracted. - Subscriber count: Extracted from the newsletter homepage where visible. Some newsletters hide this; those will return
subscriber_count: nullandengagement_rate: null. - Custom domains: Works with plain subdomains (
platformer), full Substack URLs (https://platformer.substack.com), or custom domains that resolve to Substack. - Rate limiting: Default 1s delay between newsletters. Increase
delayMsto 2000–3000 for large batches. - Proxy: Requests route through Apify's automatic datacenter proxy by default, which is fast and reliable for Substack's HTTP API. A fresh IP is used per newsletter, and any request that fails through the proxy automatically retries on a direct connection, so one bad IP never drops a newsletter. Switch to Residential in
proxyConfigurationonly if you run into blocks.
Tech
Python 3.11 · requests · No browser / Playwright required · HTTP-only for fast execution
Built by Build to Launch. AI systems for one-person businesses.