Pricing

from $0.30 / 1,000 results

Substack Scraper – Posts, Comments & Notes

Scrape every Substack surface in one actor — full posts (50+ fields, complete article HTML), nested comment threads, emoji reaction breakdown, Substack Notes, restacker identity, multi-byline authors, custom domains. Direct JSON API + RSS, no browser, no Cloudflare. From $0.30 per 1,000 posts.

Pricing

from $0.30 / 1,000 results

Rating

0.0

(0)

Developer

Sourabh Kumar

Actor stats

Bookmarked

Total users

Monthly active users

8.1 hours

Issues response

20 days ago

Last modified

Why this scraper

Substack doesn't publish a content API. Most actors on the Apify Store fight Cloudflare with a headless browser and split the surface across 3 to 6 separate scrapers — one for posts, one for comments, one for Notes, one for the leaderboard. This actor talks directly to Substack's internal JSON endpoints with plain HTTP. One actor, one price, every surface.

Concern	Browser-based scrapers	Internal-API scrapers (this one)
Setup time	Minutes (proxy + fingerprint config)	Seconds — paste a URL
Price per 1,000 posts	$2 to $20	$0.30
Reliability	Cloudflare blocks happen	No anti-bot to fight
Comment threads	Often a separate paid actor	Included, full nested tree
Reactor + restacker identity	Nobody exposes this	Optional opt-in, included
Notes scraping	Often a separate actor	Built in, same actor
Fields per post	5 to 25	50+

Plain HTTP isn't risk-free though, so the actor adds a Retry-After aware exponential backoff for 429s and an opt-in proxyConfiguration field for very large runs. With Apify Proxy on, each request rotates through a fresh IP and rate limits stop mattering.

What data can you extract?

📝 Article HTML + plain text + ProseMirror `bodyJson`	💬 Full nested comment threads (opt-in)	❤️ Reactions object keyed by emoji	👥 Reactor + restacker identity (opt-in)
📊 Reaction count, comment count, restack count	👤 Multi-byline guest authors with their pubs	🔓 Paywall preview text + meter taxonomy	📅 Publish date, update date, archive nav slugs
🎧 Voiceover + auto-TTS audio + podcast URL	🔍 SEO title + description + social title	📝 Substack Notes feed by author handle	⏱️ Reading time, word count, tags

URL mode (post scraping)

Paste any Substack URL — custom domain or .substack.com subdomain — and get every post with full HTML, plain text, structured bodyJson, reading time, multi-byline guest authors, full paywall taxonomy, SEO fields, audio renditions, and archive navigation slugs.

Two opt-in flags add depth:

includeComments: true — full nested comment tree per post in one HTTP call. Each comment carries its own emoji-keyed reactions, ProseMirror body_json, link-preview attachments, depth + parentId for tree reconstruction, and pinned/edited/deleted flags.
includeFacepile: true — names, handles, and primary publication of every user who reacted to or restacked the post. Useful for influence analysis. No other Substack scraper exposes this.

Notes mode (Substack Notes scraping)

Pass notesHandles instead of (or alongside) urls to scrape any author's Notes feed. Each Note record carries body text, structured body_json, attachments (link previews, image embeds), emoji-keyed reactions, restack count, and reply count.

How to scrape Substack: step by step

Create a free Apify account. Takes 30 seconds, no card.
Open Substack Newsletter Scraper in the Apify Console.
Paste newsletter URLs into urls (URL mode), or pass author handles in notesHandles for Notes mode. Tick includeComments if you want comment threads.
Click Start. A typical 50-post run finishes in under 10 seconds.
Export results as JSON, CSV, or Excel — or fetch via the Apify API.

Per 1,000 posts, comments, or Notes: $0.30
Per 1,000 reactor/restacker entries (opt-in includeFacepile): $0.20
Free-plan yield: roughly 16,000 results per month on the $5 free credit
Starter-plan yield: about 96,000 results per month on the $29 Starter plan

A 50-post run with includeComments=true averaging ~10 comments per post = 50 + 500 = 550 billable items ≈ $0.17. Adding includeFacepile=true over 100 reactors/post adds 5,000 facepile entries ≈ $1.00. The Actor Start event is $0.00005 per gigabyte at startup; for any real workload that's rounding error.

Pause whenever. There's no subscription lock-in.

Input

{
    "urls": [
        "https://newsletter.pragmaticengineer.com",
        "https://www.lennysnewsletter.com"
    ],
    "maxPosts": 50,
    "includeContent": true,
    "contentFormat": "both",
    "includeComments": false,
    "includeFacepile": false,
    "searchKeyword": null,
    "audienceFilter": "all",
    "typeFilter": "all",
    "dateFrom": null,
    "dateTo": null,
    "sortBy": "newest"
}

Field	Type	Default	Notes
`urls`	array	—	Newsletter URLs. Custom domains or `.substack.com` subdomains. Required for URL mode.
`maxPosts`	number	`50`	Max posts per newsletter. `0` means every post in the archive.
`includeContent`	boolean	`true`	Include `contentHtml` + `contentText`. Disable for fast metadata-only runs.
`contentFormat`	enum	`both`	`html`, `text`, or `both`.
`searchKeyword`	string	—	Filter the archive by keyword via Substack's `archive?search=` endpoint. Server-side filter, max 100 chars.
`audienceFilter`	enum	`all`	`all`, `free`, or `paid`.
`typeFilter`	enum	`all`	`all`, `newsletter`, `podcast`, `thread`, or `video`.
`includeComments`	boolean	`false`	Fetch full nested comment thread per post (one extra HTTP call per post).
`includeFacepile`	boolean	`false`	Fetch reactor + restacker identity per post (one extra HTTP call per post).
`dateFrom` / `dateTo`	string	—	YYYY-MM-DD bounds.
`sortBy`	enum	`newest`	`newest` or `oldest`.
`notesHandles`	array	—	Substack handles for Notes mode (e.g. `["lenny", "paulgraham"]`). Provide either `urls` or `notesHandles`.
`maxNotesPerHandle`	number	`50`	Max Notes per handle (1-1000).
`proxyConfiguration`	object	—	Optional Apify Proxy. Recommended for runs over 100 posts with `includeFacepile`, where Substack's per-IP rate limit otherwise eats wall time. Apify Proxy bandwidth is billed by Apify separately.

Notes mode input

{
    "notesHandles": ["lenny"],
    "maxNotesPerHandle": 50
}

Recipes

Ready-to-paste inputs for common jobs.

{
    "urls": ["https://newsletter.pragmaticengineer.com"],
    "maxPosts": 25,
    "includeContent": true
}

Scrape posts plus their full comment threads

{
    "urls": ["https://www.lennysnewsletter.com"],
    "maxPosts": 25,
    "includeComments": true,
    "audienceFilter": "free"
}

{
    "urls": ["https://newsletter.pragmaticengineer.com"],
    "searchKeyword": "AI",
    "maxPosts": 50,
    "includeContent": true
}

Get reactor + restacker identity for influence analysis

{
    "urls": ["https://www.lennysnewsletter.com"],
    "maxPosts": 10,
    "includeFacepile": true,
    "includeContent": false
}

Scrape Substack Notes for a list of authors

{
    "notesHandles": ["lenny", "paulgraham", "sahilbloom"],
    "maxNotesPerHandle": 50
}

Bulk archive crawl with proxy (avoid rate limits)

{
    "urls": ["https://newsletter.pragmaticengineer.com"],
    "maxPosts": 0,
    "includeFacepile": true,
    "proxyConfiguration": { "useApifyProxy": true }
}

Free posts only, sorted oldest first

{
    "urls": ["https://www.lennysnewsletter.com"],
    "audienceFilter": "free",
    "sortBy": "oldest",
    "maxPosts": 100
}

Output

Each post is one JSON record. Fields populated only by their corresponding flag (comments, facepile) are null when the flag is off.

{
    "id": 165204731,
    "title": "New: A free year of Cursor, Google AI Pro, Notion",
    "subtitle": "Subscriber perks for paid members",
    "slug": "new-a-free-year-of-cursor-google",
    "url": "https://www.lennysnewsletter.com/p/new-a-free-year-of-cursor-google",
    "canonicalUrl": "https://www.lennysnewsletter.com/p/new-a-free-year-of-cursor-google",
    "author": "Lenny Rachitsky",
    "authorHandle": "lenny",
    "authorImageUrl": "https://substackcdn.com/image/...",
    "authorBio": "Writing • Angel investing • Advising",
    "authorTwitter": "lennysan",
    "bylines": [
        {
            "id": 1849774,
            "name": "Lenny Rachitsky",
            "handle": "lenny",
            "photoUrl": "https://substackcdn.com/...",
            "bio": "Writing • Angel investing • Advising",
            "twitterHandle": "lennysan",
            "isGuest": false,
            "primaryPublicationName": "Lenny's Newsletter",
            "primaryPublicationUrl": "https://www.lennysnewsletter.com"
        }
    ],
    "publishedAt": "2026-04-21T15:30:00.000Z",
    "updatedAt": null,
    "contentHtml": "<p>Today I'm thrilled to share...</p>",
    "contentText": "Today I'm thrilled to share...",
    "bodyJson": { "type": "doc", "content": [/* ProseMirror tree */] },
    "wordCount": 1602,
    "readingTimeMinutes": 7,
    "description": "Subscriber perks for paid members",
    "socialTitle": null,
    "searchEngineTitle": "A free year of Cursor + Google AI Pro for subscribers",
    "searchEngineDescription": "Lenny's Newsletter subscriber perks include...",
    "coverImageUrl": "https://substackcdn.com/image/...",
    "coverImageIsExplicit": false,
    "audienceType": "everyone",
    "isPaywalled": false,
    "truncatedBodyText": null,
    "meterType": null,
    "freeUnlockRequired": false,
    "teaserPostEligible": false,
    "isGeoblocked": false,
    "hasCashtag": false,
    "reactionCount": 293,
    "reactions": { "❤": 293 },
    "commentCount": 33,
    "childCommentCount": 20,
    "restackCount": 9,
    "tags": ["AI", "Tools"],
    "type": "newsletter",
    "hasAudio": false,
    "audioUrl": null,
    "audioItems": [],
    "podcastUrl": null,
    "podcastDuration": null,
    "previousPostSlug": "your-couch-to-5k-for-ai",
    "nextPostSlug": "a-visual-guide-to-getting-out-of",
    "newsletter": {
        "name": "Lenny's Newsletter",
        "description": "Deeply researched product, growth, and career advice",
        "url": "https://www.lennysnewsletter.com"
    },
    "comments": null,
    "facepile": null
}

When includeComments is on, comments is a flat array (with parentId + depth for tree reconstruction):

{
    "id": 246915982,
    "parentId": null,
    "depth": 0,
    "bodyText": "Our infra is getting slammed, please bear with us...",
    "bodyJson": { "type": "doc", "content": [/* … */] },
    "authorId": 1849774,
    "authorName": "Lenny Rachitsky",
    "authorHandle": "lenny",
    "authorPhotoUrl": "https://substackcdn.com/...",
    "authorBestsellerTier": 10000,
    "publishedAt": "2026-04-21T15:56:31.307Z",
    "editedAt": "2026-04-21T16:00:11.789Z",
    "isPinned": true,
    "isDeleted": false,
    "reactionCount": 8,
    "reactions": { "❤": 8 },
    "restackCount": 0
}

When includeFacepile is on, facepile.reactors[] and facepile.restackers[] look like:

{
    "id": 73273682,
    "name": "Miles Kohl",
    "handle": "mileskohl504716",
    "photoUrl": "https://substackcdn.com/...",
    "bio": null,
    "primaryPublicationName": "Miles' Substack",
    "primaryPublicationUrl": "https://mileskohl504716.substack.com",
    "bestsellerTier": null
}

In Notes mode each record has a different shape:

{
    "id": 216329331,
    "handle": "lenny",
    "authorName": "Lenny Rachitsky",
    "authorHandle": "lenny",
    "authorPhotoUrl": "https://substackcdn.com/...",
    "authorBio": "Writing • Angel investing • Advising",
    "bestsellerTier": 10000,
    "publishedAt": "2026-02-18T18:18:50.293Z",
    "bodyText": "I'm thrilled to welcome The Skip with @Nikhyl Singhal to Lenny's Podcast Network",
    "bodyJson": { "type": "doc", "content": [/* … */] },
    "reactionCount": 142,
    "reactions": { "❤": 130, "🔥": 12 },
    "restackCount": 5,
    "childrenCount": 8,
    "attachments": [/* link previews, embedded images */],
    "publicationName": "Lenny's Newsletter",
    "publicationUrl": "https://www.lennysnewsletter.com"
}

Field availability by mode

Field group	URL mode (default)	URL + `includeComments`	URL + `includeFacepile`	Notes mode
Post identity (id, title, slug, url)	✅	✅	✅	—
Author (name, handle, photo, bylines)	✅	✅	✅	✅ (single author)
Content (HTML, text, bodyJson, wordCount)	✅	✅	✅	bodyText + bodyJson only
Engagement (reactionCount, reactions emoji map, commentCount, restackCount)	✅	✅	✅	reactionCount + restackCount + childrenCount
Paywall taxonomy	✅	✅	✅	—
SEO + navigation (search engine fields, prev/next slug)	✅	✅	✅	—
Audio + podcast	✅	✅	✅	—
`comments` array	`null`	✅ full nested tree	`null`	—
`facepile.reactors` + `facepile.restackers`	`null`	`null`	✅	—
Note `attachments` (link previews, embeds)	—	—	—	✅

FAQ

Substack Newsletter Scraper uses pay-per-result pricing. Posts, comments, and Notes are $0.30 per 1,000 items. Optional includeFacepile reactor/restacker entries are $0.20 per 1,000. The Apify Free plan gives you $5 in usage credits a month, enough for around 16,000 results. If you run regularly, the $29/month Starter plan covers about 96,000 results.

No subscription lock-in. Pause whenever.

Is it legal to scrape Substack?

Scraping public data is generally allowed in the US and most of the EU, as long as you don't collect personal data covered by GDPR or CCPA without a lawful basis. This actor only touches publicly accessible Substack pages (no login, no paywall bypass). How you use the output is on you.

Apify's full breakdown: Is web scraping legal?.

Push results into Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and more. Apify treats every actor as a webhook source, so anything that consumes webhooks or pulls from an API works.

Full list: Apify integrations.

Yes. Every run is available via the Apify REST API:

curl -X POST "https://api.apify.com/v2/acts/sourabhbgp~substack-scraper/runs?token=APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"urls":["https://newsletter.pragmaticengineer.com"],"maxPosts":25,"includeComments":true}'

Docs: Apify API reference.

Yes. Apify ships an MCP server that exposes every actor as a tool, so Claude Desktop, Cursor, and any other MCP-capable client can call Substack Newsletter Scraper directly. Setup: Apify MCP docs.

Your feedback

Bug, missing field, or odd behavior? Drop a note in the Issues tab. Reports go to a human and fixes usually ship the same week.

Substack Scraper - Newsletters, Posts & Authors

logiover/substack-newsletter-scraper

Substack API alternative: scrape newsletters, posts & authors without login. Export Substack data to CSV/JSON. No key, no proxy.

Logiover

Substack Posts Scraper

getdataforme/substack-posts-scraper

The Substack Posts Scraper efficiently extracts and organizes content from Substack posts for market research, competitive intelligence, and content aggregation....

GetDataForMe

Substack Notes Scraper 🔍

easyapi/substack-notes-scraper

Extract notes and comments from Substack's search results with images, user info, and engagement metrics. Perfect for content analysis, user research, and tracking discussions around specific topics on Substack.

EasyApi

Substack Newsletter Scraper

dataharvest/substack-scraper

Scrape Substack newsletters, posts and comments.

Alex v

Substack Scraper

scraper_guru/substack-scraper

Extract complete data from Substack newsletters including posts, authors, engagement metrics, and article text. 13 fields per post. Fast and reliable.

LIAICHI MUSTAPHA

2.6

Substack Profile Scraper

getdataforme/substack-profile-scraper

The Substack Profile Scraper efficiently extracts detailed data from Substack profiles and posts for analysis, research, and content aggregation....

GetDataForMe

Substack Email Scraper

scrapapi/substack-email-scraper

ScrapAPI

Substack Posts Scraper

fetch_cat/substack-posts-scraper

Collect public Substack newsletter posts, archives, and metadata for content research and media monitoring.

Hanna Nosova

Substack Notes Scraper 🔍📥- Cheap

scrapestorm/substack-notes-scraper---cheap

🔍 Easily Scrape Substack Notes Enter a profile or keyword to collect real-time Notes from Substack 🗒️💬 Get insights like content, author, post date, reactions, restacks, replies & more 📊🧠 Seamlessly integrate with tools like Google Drive to automate community tracking & boost productivity ⚡🧩

Storm_Scraper

Substack Scraper

noximilian/substack-scraper

Scrape Substack newsletters — fetch post archives, individual posts, comments, recommendations, and publication metadata. Search Substack for publications and content. No auth required for public content.

Noximilian

Substack Scraper – Posts, Comments & Notes

Why this scraper

What data can you extract?

URL mode (post scraping)

Notes mode (Substack Notes scraping)

How to scrape Substack: step by step

How much does Substack Newsletter Scraper cost?

Input

Notes mode input

Recipes

Pull the last 25 posts of a newsletter with full content

Scrape posts plus their full comment threads

Find every post about a topic in a newsletter

Get reactor + restacker identity for influence analysis

Scrape Substack Notes for a list of authors

Bulk archive crawl with proxy (avoid rate limits)

Free posts only, sorted oldest first

Output

Field availability by mode

FAQ

How much does Substack Newsletter Scraper cost?

Is it legal to scrape Substack?

Can I integrate Substack Newsletter Scraper with other tools?

Can I use Substack Newsletter Scraper with the Apify API?

Can I use Substack Newsletter Scraper through an MCP Server?

Your feedback

You might also like

Substack Scraper - Newsletters, Posts & Authors

Substack Posts Scraper

Substack Notes Scraper 🔍

Substack Newsletter Scraper

Substack Scraper

Substack Profile Scraper

Substack Email Scraper

Substack Posts Scraper

Substack Notes Scraper 🔍📥- Cheap

Substack Scraper