Substack Scraper — Posts, Authors & Newsletters
Pricing
$4.99/month + usage
Substack Scraper — Posts, Authors & Newsletters
Scrape Substack newsletters and articles without subscription. Extract post titles, content preview, author info, subscriber counts, and publication stats. Search by topic or publication URL. Monitor newsletter growth and trends. Export to JSON/CSV.
Pricing
$4.99/month + usage
Rating
0.0
(0)
Developer
Web Data Labs
Actor stats
0
Bookmarked
17
Total users
9
Monthly active users
a day ago
Last modified
Categories
Share
Substack Scraper
Extract newsletter posts, author metadata, and subscriber signals from any Substack publication. No API key required — works with Substack's internal endpoints to deliver structured, queryable data.
Why Substack Data?
Substack hosts 35,000+ active publications with millions of posts. It's become the default platform for independent journalism, tech analysis, and niche expertise. Use this data for:
- Competitive analysis — track what newsletters in your niche publish and how they perform
- Content research — discover trending topics and engagement patterns
- Media monitoring — follow specific writers or publications for mentions and trends
- LLM training data — feed high-quality long-form content into AI pipelines
Input Parameters
| Parameter | Type | Required | Default | Description | Example |
|---|---|---|---|---|---|
publicationUrls | array | Yes | — | Substack publication URLs to scrape | ["https://stratechery.com"] |
maxPosts | integer | No | 100 | Maximum posts to return per publication | 50 |
includeBodyText | boolean | No | false | Include full post text (free posts only) | true |
freePostsOnly | boolean | No | false | Skip paywalled posts | true |
Output Fields
| Field | Type | Description | Example |
|---|---|---|---|
postId | string | Unique post identifier | "148293847" |
title | string | Post title | "The Year of AI Agents" |
subtitle | string | Post subtitle | "Why 2025 is different..." |
slug | string | URL slug | "the-year-of-ai-agents" |
publicationName | string | Newsletter name | "Lenny's Newsletter" |
authorName | string | Writer name | "Lenny Rachitsky" |
publishedAt | string | ISO 8601 publish date | "2025-01-08T13:00:00.000Z" |
type | string | Content type | "newsletter" / "thread" / "podcast" |
wordCount | integer | Word count | 3240 |
readingTime | integer | Estimated reading minutes | 13 |
isPaywalled | boolean | Behind paywall | false |
likeCount | integer | Likes/hearts | 4821 |
commentCount | integer | Comment count | 187 |
restackCount | integer | Restacks (shares) | 341 |
subscriberCount | integer | Approximate subscriber count | 780000 |
tags | array | Topic tags | ["product", "AI", "strategy"] |
sectionName | string | Publication section | "Main" |
publicationUrl | string | Publication base URL | "https://www.lennysnewsletter.com" |
bodyHtml | string | Full HTML content (free posts) | "<p>The year started with..." |
bodyText | string | Plain text content (free posts) | "The year started with..." |
Example Input
{"publicationUrls": ["https://stratechery.com","https://www.lennysnewsletter.com"],"maxPosts": 50,"includeBodyText": true,"freePostsOnly": false}
Example Output
{"postId": "148293847","title": "The Year of AI Agents","subtitle": "Why 2025 is different from every previous AI cycle","slug": "the-year-of-ai-agents","publicationName": "Lenny's Newsletter","authorName": "Lenny Rachitsky","publishedAt": "2025-01-08T13:00:00.000Z","type": "newsletter","wordCount": 3240,"readingTime": 13,"isPaywalled": false,"likeCount": 4821,"commentCount": 187,"restackCount": 341,"subscriberCount": 780000,"tags": ["product", "AI", "strategy"],"publicationUrl": "https://www.lennysnewsletter.com"}
Using with Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("cryptosignals/substack-scraper").call(run_input={"publicationUrls": ["https://www.lennysnewsletter.com"],"maxPosts": 50,"includeBodyText": True,})for post in client.dataset(run["defaultDatasetId"]).iterate_items():status = "PAID" if post["isPaywalled"] else "FREE"print(f"[{status}] {post['title']}")print(f" {post['likeCount']} likes | {post['commentCount']} comments | {post['wordCount']} words")
Using with JavaScript
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('cryptosignals/substack-scraper').call({publicationUrls: ['https://stratechery.com'],maxPosts: 25,includeBodyText: true,freePostsOnly: true,});const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach(post => {console.log(`${post.title} — ${post.wordCount} words, ${post.likeCount} likes`);});
Proxy
Substack rate-limits by IP and occasionally serves Cloudflare challenges. Residential proxies with real US/EU IP addresses resolve both issues and keep scraping stable across large publication archives.
ThorData offers residential proxies in 195+ countries that maintain high success rates with Substack's anti-bot protections.
Integrations
Connect this actor to Google Sheets, Airtable, BigQuery, Slack, Zapier, Make, or use the Apify API for programmatic access and webhook notifications.
Built by cryptosignals
⭐ Support This Actor
If this actor saved you time, please leave a quick review — it takes 30 seconds and helps others discover it. Thank you!
⭐ Found this useful?
If this actor saved you time, please leave a review on the Apify Store — it takes 30 seconds and helps other developers find it.
Questions or issues? Drop a comment below and I'll respond within 24 hours.