Pricing

Pay per event

Substack Scraper

Scrape Substack newsletters — posts with full content, comments with nested replies, and publication metadata. Unlimited archive depth, no proxy needed, 100% success rate. Export to JSON, CSV, Excel.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

What does Substack Scraper do?

Substack Scraper extracts data from any Substack newsletter -- posts with full HTML content, comments with nested replies, and publication metadata including subscriber counts. It supports unlimited archive depth (no 12-post cap), works with both *.substack.com and custom domain newsletters, and exports to JSON, CSV, Excel, or connects via API.

Unlike other scrapers, this actor uses Substack's public JSON API directly -- no browser, no proxy, 100% success rate.

Key capabilities:

Scrape full post archives from any Substack newsletter (no post limit)
Extract complete HTML content, metadata, and engagement metrics for each post
Fetch full comment threads with nested replies and author info
Pull publication-level data: subscriber counts, pricing, author profiles
Filter by date range, content type (newsletter/podcast/thread), or paywall status
Works with both *.substack.com and custom domain newsletters

Who is Substack Scraper for?

Content strategists analyzing newsletter trends, posting cadence, and topic coverage across the Substack ecosystem
Market researchers benchmarking competitor newsletters by subscriber growth, engagement, and pricing models
Sales and marketing teams building lead lists from newsletter author profiles and publication metadata
Data journalists investigating media trends with structured datasets from newsletter archives
Academic researchers studying online discourse through newsletter content and comment analysis
AI/ML engineers building training datasets from high-quality long-form writing on Substack
Newsletter creators auditing their own archive performance and comparing with peers

Why use Substack Scraper?

Unlimited archive depth -- Scrape the complete archive of any newsletter. No 12-post cap like the market leader
100% success rate -- Uses Substack's public JSON API. No anti-bot, no proxy needed, no failures
Full comment threads -- Extract comments with nested replies, reaction counts, and author metadata
Publication metadata -- Subscriber counts, pricing plans, author info, and 100+ publication fields
No proxy cost -- Direct API access means zero proxy fees. Runs on minimal 256MB memory
Clean pay-per-event pricing -- No hidden start fees or completion charges. Pay only for results
66+ fields per post -- The richest output of any Substack scraper on Apify Store
Custom domain support -- Works with both newsletter.substack.com and custom domains like www.lennysnewsletter.com

What data can you extract from Substack?

Per post (30+ fields):

Field	Description
`title`, `subtitle`, `slug`	Post title, subtitle, and URL slug
`url`	Full canonical URL
`publishedAt`, `updatedAt`	Publication and update timestamps
`postType`	`newsletter`, `podcast`, or `thread`
`audience`, `isPaid`	Paywall status (`everyone` or `only_paid`)
`bodyHtml`	Full HTML content (free posts)
`truncatedBodyText`	Preview text for all posts including paywalled
`wordcount`	Total word count (even for paid posts)
`coverImage`	Cover image URL
`description`	Post description/excerpt
`tags`	Post tags/categories
`reactionCount`, `commentCount`, `restacks`	Engagement metrics
`authorName`, `authorHandle`, `authorBio`	Author information
`authorPhotoUrl`, `authorId`	Author photo and ID
`publicationName`, `publicationId`, `publicationUrl`	Newsletter metadata
`subscriberCount`	Newsletter subscriber count
`podcastUrl`, `podcastDuration`	Podcast episode data (if applicable)
`hasVoiceover`	Whether the post has audio narration
`scrapedAt`	Timestamp when the data was collected

Per comment (12 fields): id, body, date, editedAt, name, handle, photoUrl, reactionCount, restacks, isAuthor, isPinned, nested replies

Per publication (14 fields): id, name, subdomain, customDomain, baseUrl, authorName, authorHandle, authorBio, authorPhotoUrl, subscriberCount, logoUrl, heroText, language, paymentsEnabled

How much does it cost to scrape Substack?

This actor uses pay-per-event pricing -- you pay only for what you scrape. No monthly subscription required. All platform compute costs are included in the per-event price.

Event	Free plan	Starter ($49/mo)	Scale ($499/mo)
Start	$0.005	$0.004	$0.003
Per post (metadata)	$0.001	$0.0008	$0.0006
Per post (with content)	$0.002	$0.0017	$0.0014
Per comment	$0.0005	$0.0004	$0.0003

Real-world cost examples:

Scenario	Results	Duration	Cost (Free tier)
1 newsletter, 50 posts (metadata)	50 posts	~3s	~$0.06
1 newsletter, 50 posts (with content)	50 posts	~5s	~$0.11
1 newsletter, 50 posts + comments	50 posts + ~200 comments	~15s	~$0.21
1 newsletter, full archive (500 posts)	500 posts	~30s	~$1.01
5 newsletters, 100 posts each	500 posts	~60s	~$1.03

Substack Scraper is one of the cheapest content scrapers on Apify. Because it uses direct API calls instead of a browser, there are no proxy or rendering costs -- you only pay for the data you receive.

How to scrape Substack newsletters step by step

Go to the Substack Scraper page on Apify Store
Click Start for free to open the actor configuration screen
Enter one or more newsletter URLs in the Newsletter URLs field (e.g., https://www.lennysnewsletter.com)
Choose your output options:
- Include Post Content -- toggle on for full HTML body, off for metadata-only (faster and cheaper)
- Include Comments -- toggle on to fetch comment threads with nested replies
- Include Publication Info -- toggle on for subscriber counts and newsletter metadata
Optionally set filters:
- Content Type -- limit to newsletters, podcasts, or threads
- Start/End Date -- restrict to a date range (YYYY-MM-DD format)
- Free Posts Only -- skip paywalled content
Set Max Posts per Newsletter (default 100, set to 0 for unlimited full archive)
Click Start and wait for results
Download your data in JSON, CSV, or Excel from the Dataset tab, or connect via API

How do I scrape Substack newsletters and posts?

Parameter	Type	Default	Description
`urls`	array	required	Substack newsletter URLs. Accepts homepage, custom domain, post URLs, or /archive URLs
`maxPostsPerNewsletter`	integer	`100`	Max posts per newsletter. `0` = unlimited (full archive)
`includeContent`	boolean	`true`	Include full HTML body. Disable for metadata-only (faster, cheaper)
`includeComments`	boolean	`false`	Fetch comments for each post. Adds one API call per post
`includePublicationInfo`	boolean	`true`	Include newsletter metadata (subscriber count, pricing, author)
`contentType`	string	`all`	Filter: `all`, `newsletter`, `podcast`, or `thread`
`startDate`	string	--	Only posts after this date (YYYY-MM-DD)
`endDate`	string	--	Only posts before this date (YYYY-MM-DD)
`onlyFree`	boolean	`false`	Only include free posts. Skip paywalled content

What data can I extract from Substack?

{
    "postId": 186226252,
    "title": "How to build AI product sense",
    "subtitle": "The secret is using Cursor for non-technical work",
    "slug": "how-to-build-ai-product-sense",
    "url": "https://www.lennysnewsletter.com/p/how-to-build-ai-product-sense",
    "publishedAt": "2026-02-03T13:45:58.303Z",
    "updatedAt": "2026-02-04T17:29:56.949Z",
    "postType": "newsletter",
    "audience": "everyone",
    "isPaid": false,
    "wordcount": 5867,
    "coverImage": "https://substackcdn.com/image/fetch/...",
    "tags": ["AI"],
    "reactionCount": 298,
    "commentCount": 31,
    "childCommentCount": 15,
    "restacks": 20,
    "hasVoiceover": false,
    "bodyHtml": "<div class=\"body markup\">...</div>",
    "authorName": "Tal Raviv",
    "authorHandle": "talsraviv",
    "publicationName": "Lenny's Newsletter",
    "subscriberCount": "1,100,000",
    "comments": [
        {
            "id": 209331673,
            "body": "This article creates a whole new paradigm for learning...",
            "date": "2026-02-03T15:34:25.318Z",
            "name": "Jack Cohen",
            "handle": "jackcohen10",
            "reactionCount": 9,
            "isAuthor": false,
            "replies": [
                {
                    "id": 209340123,
                    "body": "Thanks Jack!",
                    "name": "Tal Raviv",
                    "isAuthor": true,
                    "replies": []
                }
            ]
        }
    ],
    "scrapedAt": "2026-02-06T02:07:09.750Z"
}

Tips for best results

Start with metadata-only (includeContent: false) to quickly survey a newsletter's archive before doing a full content scrape
Use date filters to scrape only recent posts instead of full archives -- saves time and money
Comments are optional -- each post with comments requires an extra API call, so only enable when needed
Paid posts return all metadata (title, wordcount, reactions) but bodyHtml will be empty
Custom domains work the same as *.substack.com URLs -- just paste the full URL
Use maxPostsPerNewsletter: 0 for unlimited archive depth -- scrapes every post ever published
Batch multiple newsletters in a single run by adding multiple URLs -- more efficient than running separate jobs
Filter by content type to target only newsletter posts, podcast episodes, or threads separately

Use with Claude AI (MCP)

This actor is available as a tool in Claude AI through the Model Context Protocol (MCP). Add it to Claude Desktop, Cursor, Windsurf, or any MCP-compatible client.

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com"

Setup for Claude Desktop, Cursor, or VS Code

Add this to your MCP config file:

{
    "mcpServers": {
        "apify": {
            "url": "https://mcp.apify.com"
        }
    }
}

Example prompts

"Scrape the last 30 posts from Lenny's Newsletter and summarize the main topics covered."
"Get all free posts from this Substack newsletter published in 2025 with their engagement metrics."
"Fetch the comments on this Substack post and tell me what readers think about the article."

Learn more in the Apify MCP documentation.

Integrations

Connect Substack Scraper with your existing tools and automate newsletter data workflows:

Make -- Automate workflows triggered by new newsletter data
Zapier -- Connect to 5,000+ apps
Google Sheets -- Export directly to spreadsheets
Slack -- Get notifications for new posts
GitHub -- Trigger workflows on new data
Webhooks -- Send data to any endpoint

You can also schedule the scraper to run automatically at regular intervals. Set up a schedule to monitor newsletters for new posts daily or weekly.

Programmatic access via API

Node.js:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });

const run = await client.actor('automation-lab/substack-scraper').call({
    urls: ['https://www.lennysnewsletter.com'],
    maxPostsPerNewsletter: 50,
    includeContent: true,
    includeComments: false,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python:

from apify_client import ApifyClient

client = ApifyClient('YOUR_API_TOKEN')

run = client.actor('automation-lab/substack-scraper').call(run_input={
    'urls': ['https://www.lennysnewsletter.com'],
    'maxPostsPerNewsletter': 50,
    'includeContent': True,
    'includeComments': False,
})

items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

cURL:

curl "https://api.apify.com/v2/acts/automation-lab~substack-scraper/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
  -X POST -H "Content-Type: application/json" \
  -d '{"urls": ["https://www.lennysnewsletter.com"], "maxPostsPerNewsletter": 50, "includeContent": true}'

Is it legal to scrape Substack?

Substack Scraper accesses only publicly available data through Substack's public API endpoints. It does not bypass any authentication, paywalls, or access controls. Paywalled content metadata (title, wordcount, engagement metrics) is returned, but the full body text of paid posts is not extracted.

Key points:

The scraper uses the same public API that powers Substack's own website
No login credentials are required or used
Paywalled post content is not accessed or extracted
The scraper respects rate limits with polite delays between requests
Extracted data should be used in compliance with applicable laws and Substack's Terms of Service

Users are responsible for ensuring their specific use case complies with local data protection regulations (GDPR, CCPA, etc.) and Substack's Terms of Service. For more information, see the Apify blog on web scraping legality.

FAQ

How fast is the scraper? Very fast. 50 posts (metadata only) complete in ~3 seconds. 50 posts with full content in ~5 seconds. Full archives of 500+ posts finish in under 30 seconds. No browser or proxy overhead.

Can I scrape paid/paywalled posts? You get all metadata for paid posts (title, subtitle, wordcount, reactions, comments count) but bodyHtml will be empty since content access requires an active subscription.

Does it work with custom domains? Yes. Enter the full URL (e.g., https://www.lennysnewsletter.com) and the scraper auto-detects it as a Substack newsletter.

How many posts can I scrape? There is no limit. Set maxPostsPerNewsletter: 0 to scrape the complete archive. This is the only Substack scraper on Apify with unlimited archive depth.

Does it extract comments? Yes. Set includeComments: true to get full comment threads with nested replies, author info, and reaction counts. Each post with comments requires one extra API call.

What about rate limits? Substack's public API has no detected rate limits. The scraper adds a polite delay between requests to be respectful.

The bodyHtml field is empty for some posts. This is expected for paid/paywalled posts. The scraper returns all metadata (title, wordcount, reactions) but cannot access content behind a paywall without an active subscription.

The scraper fails with "not a valid Substack" for a custom domain. Make sure you are using the full URL including https://. The scraper auto-detects Substack newsletters on custom domains, but the URL must be reachable and point to a valid Substack publication.

Can I scrape multiple newsletters at once? Yes. Add multiple URLs to the urls input field. The scraper processes them sequentially and outputs all posts to a single dataset.

What URL formats are accepted? The scraper accepts homepage URLs (https://newsletter.substack.com), custom domains (https://www.lennysnewsletter.com), individual post URLs, and /archive URLs. All formats are auto-detected.

Dev.to Scraper -- Scrape articles and profiles from Dev.to
TechCrunch Scraper -- Extract articles and news from TechCrunch
Hacker News Scraper -- Extract posts and comments from Hacker News
Reddit Scraper -- Scrape Reddit posts, comments, and subreddit data

Substack Scraper - Download Newsletter Content Fast

stanvanrooy6/substack-scraper

Substack scraper for newsletters. Extract posts with titles, dates, authors, tags, and reactions.

Stan Van Rooy

Substack Scraper

uncleken/substack-scraper

Substack Scraper is a tool designed to extract and archive public content from Substack publications without requiring authentication or API keys.

Uncle Ken

Substack Scraper — Posts, Authors & Newsletters

cryptosignals/substack-scraper

Scrape Substack newsletters, posts, author profiles, and recommendation networks. Collect subscriber counts, publication data, and full article content. Export to CSV, Excel, JSON. Works with Zapier, Make.com, Python, and JavaScript API. Free until April 3, then $4.99/mo.

CryptoSignals Agent

Substack Publications Scraper 📚

easyapi/substack-publications-scraper

Scrape detailed publication information from Substack based on keywords. Get comprehensive data about newsletters, authors, subscriber counts, and publication metrics in structured JSON format.

EasyApi

1.0

Substack Newsletter Scraper - Articles, Metadata & Full Content

hata1234/substack-scraper

Extract articles, metadata, and content from any Substack newsletter via public API. No proxy needed. Supports multiple newsletters, full article body extraction, audience filtering (free/paid), date range, keyword search, and pagination. Works with both substack.com subdomains and custom domains.

Moris Chao

Substack Posts & Creator Scraper

sleek_waveform/substack-creator-scraper

Scrape posts, engagement metrics, and author data from any Substack publication. Get title, author, publish date, likes, comments, paywall status, and full body in Markdown or HTML. Paginates the full archive automatically.

Daniel Dimitrov

Substack Posts Scraper 📚

easyapi/substack-posts-scraper

Scrape Substack posts and articles by keywords. Extract comprehensive post data including title, author, publication details, podcast information, reactions, and more. Perfect for content analysis and research.

EasyApi

1.9

Substack Scraper

scraper_guru/substack-scraper

Extract complete data from Substack newsletters including posts, authors, engagement metrics, and article text. 13 fields per post. Fast and reliable.

LIAICHI MUSTAPHA

Substack Scraper

dacoder/substack-scraper

A powerful Apify actor that extracts data from Substack newsletters. Collects author profiles, post content, engagement metrics, and publication details. Perfect for content analysis, archiving, and competitive research. Outputs clean, structured data with clickable links and formatted images.

Da Coder

1.0

Substack Leaderboard Scraper 📊

easyapi/substack-leaderboard-scraper

Scrape detailed publication data from Substack leaderboards. Get comprehensive insights about top newsletters including subscriber counts, pricing, author details, and more. Perfect for newsletter research and market analysis.

EasyApi