Substack Scraper — Posts, Authors & Newsletters avatar

Substack Scraper — Posts, Authors & Newsletters

Pricing

$4.99/month + usage

Go to Apify Store
Substack Scraper — Posts, Authors & Newsletters

Substack Scraper — Posts, Authors & Newsletters

Scrape Substack newsletters and articles without subscription. Extract post titles, content preview, author info, subscriber counts, and publication stats. Search by topic or publication URL. Monitor newsletter growth and trends. Export to JSON/CSV.

Pricing

$4.99/month + usage

Rating

0.0

(0)

Developer

Web Data Labs

Web Data Labs

Maintained by Community

Actor stats

0

Bookmarked

17

Total users

9

Monthly active users

a day ago

Last modified

Share

Substack Scraper

Extract newsletter posts, author metadata, and subscriber signals from any Substack publication. No API key required — works with Substack's internal endpoints to deliver structured, queryable data.

Why Substack Data?

Substack hosts 35,000+ active publications with millions of posts. It's become the default platform for independent journalism, tech analysis, and niche expertise. Use this data for:

  • Competitive analysis — track what newsletters in your niche publish and how they perform
  • Content research — discover trending topics and engagement patterns
  • Media monitoring — follow specific writers or publications for mentions and trends
  • LLM training data — feed high-quality long-form content into AI pipelines

Input Parameters

ParameterTypeRequiredDefaultDescriptionExample
publicationUrlsarrayYesSubstack publication URLs to scrape["https://stratechery.com"]
maxPostsintegerNo100Maximum posts to return per publication50
includeBodyTextbooleanNofalseInclude full post text (free posts only)true
freePostsOnlybooleanNofalseSkip paywalled poststrue

Output Fields

FieldTypeDescriptionExample
postIdstringUnique post identifier"148293847"
titlestringPost title"The Year of AI Agents"
subtitlestringPost subtitle"Why 2025 is different..."
slugstringURL slug"the-year-of-ai-agents"
publicationNamestringNewsletter name"Lenny's Newsletter"
authorNamestringWriter name"Lenny Rachitsky"
publishedAtstringISO 8601 publish date"2025-01-08T13:00:00.000Z"
typestringContent type"newsletter" / "thread" / "podcast"
wordCountintegerWord count3240
readingTimeintegerEstimated reading minutes13
isPaywalledbooleanBehind paywallfalse
likeCountintegerLikes/hearts4821
commentCountintegerComment count187
restackCountintegerRestacks (shares)341
subscriberCountintegerApproximate subscriber count780000
tagsarrayTopic tags["product", "AI", "strategy"]
sectionNamestringPublication section"Main"
publicationUrlstringPublication base URL"https://www.lennysnewsletter.com"
bodyHtmlstringFull HTML content (free posts)"<p>The year started with..."
bodyTextstringPlain text content (free posts)"The year started with..."

Example Input

{
"publicationUrls": [
"https://stratechery.com",
"https://www.lennysnewsletter.com"
],
"maxPosts": 50,
"includeBodyText": true,
"freePostsOnly": false
}

Example Output

{
"postId": "148293847",
"title": "The Year of AI Agents",
"subtitle": "Why 2025 is different from every previous AI cycle",
"slug": "the-year-of-ai-agents",
"publicationName": "Lenny's Newsletter",
"authorName": "Lenny Rachitsky",
"publishedAt": "2025-01-08T13:00:00.000Z",
"type": "newsletter",
"wordCount": 3240,
"readingTime": 13,
"isPaywalled": false,
"likeCount": 4821,
"commentCount": 187,
"restackCount": 341,
"subscriberCount": 780000,
"tags": ["product", "AI", "strategy"],
"publicationUrl": "https://www.lennysnewsletter.com"
}

Using with Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("cryptosignals/substack-scraper").call(run_input={
"publicationUrls": ["https://www.lennysnewsletter.com"],
"maxPosts": 50,
"includeBodyText": True,
})
for post in client.dataset(run["defaultDatasetId"]).iterate_items():
status = "PAID" if post["isPaywalled"] else "FREE"
print(f"[{status}] {post['title']}")
print(f" {post['likeCount']} likes | {post['commentCount']} comments | {post['wordCount']} words")

Using with JavaScript

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('cryptosignals/substack-scraper').call({
publicationUrls: ['https://stratechery.com'],
maxPosts: 25,
includeBodyText: true,
freePostsOnly: true,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(post => {
console.log(`${post.title}${post.wordCount} words, ${post.likeCount} likes`);
});

Proxy

Substack rate-limits by IP and occasionally serves Cloudflare challenges. Residential proxies with real US/EU IP addresses resolve both issues and keep scraping stable across large publication archives.

ThorData offers residential proxies in 195+ countries that maintain high success rates with Substack's anti-bot protections.

Integrations

Connect this actor to Google Sheets, Airtable, BigQuery, Slack, Zapier, Make, or use the Apify API for programmatic access and webhook notifications.


Built by cryptosignals

⭐ Support This Actor

If this actor saved you time, please leave a quick review — it takes 30 seconds and helps others discover it. Thank you!


⭐ Found this useful?

If this actor saved you time, please leave a review on the Apify Store — it takes 30 seconds and helps other developers find it.

Questions or issues? Drop a comment below and I'll respond within 24 hours.