Substack Scraper
Pricing
Pay per event
Substack Scraper
Scrape Substack newsletters — posts with full content, comments with nested replies, and publication metadata. Unlimited archive depth, no proxy needed. Export to JSON, CSV, Excel.
Pricing
Pay per event
Rating
0.0
(0)
Developer

Stas Persiianenko
Actor stats
0
Bookmarked
4
Total users
3
Monthly active users
10 days ago
Last modified
Categories
Share
What does Substack Scraper do?
Substack Scraper extracts data from any Substack newsletter — posts with full HTML content, comments with nested replies, and publication metadata including subscriber counts. It supports unlimited archive depth (no 12-post cap), works with both *.substack.com and custom domain newsletters, and exports to JSON, CSV, Excel, or connects via API.
Unlike other scrapers, this actor uses Substack's public JSON API directly — no browser, no proxy, 100% success rate.
Use cases
- Content analysis — Download full newsletter archives for content audits, topic analysis, or AI training datasets
- Market research — Track subscriber counts, posting frequency, and engagement metrics across multiple newsletters
- Lead generation — Extract author profiles, social links, and publication metadata for B2B outreach
- Competitor monitoring — Monitor competing newsletters for new posts, engagement trends, and pricing changes
- Academic research — Build datasets of newsletter content with comments for sentiment analysis or discourse studies
Why use Substack Scraper?
- Unlimited archive depth — Scrape the complete archive of any newsletter. No 12-post cap like the market leader
- 100% success rate — Uses Substack's public JSON API. No anti-bot, no proxy needed, no failures
- Full comment threads — Extract comments with nested replies, reaction counts, and author metadata
- Publication metadata — Subscriber counts, pricing plans, author info, and 100+ publication fields
- No proxy cost — Direct API access means zero proxy fees. Runs on minimal 256MB memory
- Clean pay-per-event pricing — No hidden start fees or completion charges. Pay only for results
- 66+ fields per post — The richest output of any Substack scraper on Apify Store
- Custom domain support — Works with both
newsletter.substack.comand custom domains likewww.lennysnewsletter.com
What data can you extract?
Per post (30+ fields):
| Field | Description |
|---|---|
title, subtitle, slug | Post title, subtitle, and URL slug |
url | Full canonical URL |
publishedAt, updatedAt | Publication and update timestamps |
postType | newsletter, podcast, or thread |
audience, isPaid | Paywall status (everyone or only_paid) |
bodyHtml | Full HTML content (free posts) |
wordcount | Total word count (even for paid posts) |
coverImage | Cover image URL |
tags | Post tags/categories |
reactionCount, commentCount, restacks | Engagement metrics |
authorName, authorHandle, authorBio | Author information |
publicationName, subscriberCount | Newsletter metadata |
Per comment (12 fields): body, date, name, handle, reactionCount, isAuthor, isPinned, nested replies
Per publication: name, subscriberCount, baseUrl, paymentsEnabled, logoUrl, heroText, language
How much does it cost to scrape Substack?
This Actor uses pay-per-event pricing — you pay only for what you scrape. No monthly subscription. All platform costs are included.
| Event | Free plan | Starter ($49/mo) | Scale ($499/mo) |
|---|---|---|---|
| Start | $0.005 | $0.004 | $0.003 |
| Per post (metadata) | $0.001 | $0.0008 | $0.0006 |
| Per post (with content) | $0.002 | $0.0017 | $0.0014 |
| Per comment | $0.0005 | $0.0004 | $0.0003 |
Real-world cost examples:
| Scenario | Results | Duration | Cost (Free tier) |
|---|---|---|---|
| 1 newsletter, 50 posts (metadata) | 50 posts | ~3s | ~$0.06 |
| 1 newsletter, 50 posts (with content) | 50 posts | ~5s | ~$0.11 |
| 1 newsletter, 50 posts + comments | 50 posts + ~200 comments | ~15s | ~$0.21 |
| 1 newsletter, full archive (500 posts) | 500 posts | ~30s | ~$1.01 |
| 5 newsletters, 100 posts each | 500 posts | ~60s | ~$1.03 |
How to scrape Substack newsletters
- Go to the Substack Scraper page on Apify Store
- Enter one or more newsletter URLs (e.g.,
https://www.lennysnewsletter.com) - Choose your output options (content, comments, publication info)
- Set filters if needed (date range, content type, free posts only)
- Click Start and wait for results
- Download your data in JSON, CSV, Excel, or connect via API
Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
urls | array | required | Substack newsletter URLs. Accepts homepage, custom domain, post URLs, or /archive URLs |
maxPostsPerNewsletter | integer | 100 | Max posts per newsletter. 0 = unlimited (full archive) |
includeContent | boolean | true | Include full HTML body. Disable for metadata-only (faster, cheaper) |
includeComments | boolean | false | Fetch comments for each post. Adds one API call per post |
includePublicationInfo | boolean | true | Include newsletter metadata (subscriber count, pricing, author) |
contentType | string | all | Filter: all, newsletter, podcast, or thread |
startDate | string | — | Only posts after this date (YYYY-MM-DD) |
endDate | string | — | Only posts before this date (YYYY-MM-DD) |
onlyFree | boolean | false | Only include free posts. Skip paywalled content |
Output example
{"postId": 186226252,"title": "How to build AI product sense","subtitle": "The secret is using Cursor for non-technical work","slug": "how-to-build-ai-product-sense","url": "https://www.lennysnewsletter.com/p/how-to-build-ai-product-sense","publishedAt": "2026-02-03T13:45:58.303Z","updatedAt": "2026-02-04T17:29:56.949Z","postType": "newsletter","audience": "everyone","isPaid": false,"wordcount": 5867,"coverImage": "https://substackcdn.com/image/fetch/...","tags": ["AI"],"reactionCount": 298,"commentCount": 31,"childCommentCount": 15,"restacks": 20,"hasVoiceover": false,"bodyHtml": "<div class=\"body markup\">...</div>","authorName": "Tal Raviv","authorHandle": "talsraviv","publicationName": "Lenny's Newsletter","subscriberCount": "1,100,000","comments": [{"id": 209331673,"body": "This article creates a whole new paradigm for learning...","date": "2026-02-03T15:34:25.318Z","name": "Jack Cohen","handle": "jackcohen10","reactionCount": 9,"isAuthor": false,"replies": [{"id": 209340123,"body": "Thanks Jack!","name": "Tal Raviv","isAuthor": true,"replies": []}]}],"scrapedAt": "2026-02-06T02:07:09.750Z"}
Tips for best results
- Start with metadata-only (
includeContent: false) to quickly survey a newsletter's archive before doing a full content scrape - Use date filters to scrape only recent posts instead of full archives — saves time and money
- Comments are optional — each post with comments requires an extra API call, so only enable when needed
- Paid posts return all metadata (title, wordcount, reactions) but
bodyHtmlwill be empty - Custom domains work the same as
*.substack.comURLs — just paste the full URL - Use
maxPostsPerNewsletter: 0for unlimited archive depth — scrapes every post ever published
Integrations
Connect Substack Scraper with your existing tools:
- Make — Automate workflows triggered by new newsletter data
- Zapier — Connect to 5,000+ apps
- Google Sheets — Export directly to spreadsheets
- Slack — Get notifications for new posts
- GitHub — Trigger workflows on new data
- Webhooks — Send data to any endpoint
Using the Apify API
Node.js:
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('automation-lab/substack-scraper').call({urls: ['https://www.lennysnewsletter.com'],maxPostsPerNewsletter: 50,includeContent: true,includeComments: false,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Python:
from apify_client import ApifyClientclient = ApifyClient('YOUR_API_TOKEN')run = client.actor('automation-lab/substack-scraper').call(run_input={'urls': ['https://www.lennysnewsletter.com'],'maxPostsPerNewsletter': 50,'includeContent': True,'includeComments': False,})items = client.dataset(run['defaultDatasetId']).list_items().itemsprint(items)
FAQ
How fast is the scraper? Very fast. 50 posts (metadata only) complete in ~3 seconds. 50 posts with full content in ~5 seconds. Full archives of 500+ posts finish in under 30 seconds. No browser or proxy overhead.
Can I scrape paid/paywalled posts?
You get all metadata for paid posts (title, subtitle, wordcount, reactions, comments count) but bodyHtml will be empty since content access requires an active subscription.
Does it work with custom domains?
Yes. Enter the full URL (e.g., https://www.lennysnewsletter.com) and the scraper auto-detects it as a Substack newsletter.
How many posts can I scrape?
There is no limit. Set maxPostsPerNewsletter: 0 to scrape the complete archive. This is the only Substack scraper on Apify with unlimited archive depth.
Does it extract comments?
Yes. Set includeComments: true to get full comment threads with nested replies, author info, and reaction counts. Each post with comments requires one extra API call.
What about rate limits? Substack's public API has no detected rate limits. The scraper adds a polite delay between requests to be respectful.
Related scrapers
- Reddit Scraper — Scrape Reddit posts, comments, and subreddit data
- YouTube Transcript Scraper — Extract transcripts from YouTube videos