Substack Scraper – Posts, Comments & Notes avatar

Substack Scraper – Posts, Comments & Notes

Pricing

from $0.30 / 1,000 results

Go to Apify Store
Substack Scraper – Posts, Comments & Notes

Substack Scraper – Posts, Comments & Notes

Scrape every Substack surface in one actor — full posts (50+ fields, complete article HTML), nested comment threads, emoji reaction breakdown, Substack Notes, restacker identity, multi-byline authors, custom domains. Direct JSON API + RSS, no browser, no Cloudflare. From $0.30 per 1,000 posts.

Pricing

from $0.30 / 1,000 results

Rating

0.0

(0)

Developer

Sourabh Kumar

Sourabh Kumar

Maintained by Community

Actor stats

0

Bookmarked

14

Total users

7

Monthly active users

3 days ago

Last modified

Share

Scrape any Substack newsletter for posts, full article content, comment threads, Substack Notes, and reactor identity — talking to Substack's own JSON endpoints, not fighting a browser. Custom domains and .substack.com subdomains both work. No login, no proxy required (but optional Apify Proxy is supported for very large runs).

$0.30 per 1,000 results. Each scraped post, comment, or Note counts as one result. Reactor and restacker identity (opt-in via includeFacepile) is $0.20 per 1,000 entries.

Designed for the global Substack community: most large publications are English, but the actor handles UTF-8 cleanly so non-English newsletters (German, Spanish, Arabic, Japanese, Hebrew) come through with original characters intact.

Why this scraper

Substack doesn't publish a content API. Most actors on the Apify Store fight Cloudflare with a headless browser and split the surface across 3 to 6 separate scrapers — one for posts, one for comments, one for Notes, one for the leaderboard. This actor talks directly to Substack's internal JSON endpoints with plain HTTP. One actor, one price, every surface.

ConcernBrowser-based scrapersInternal-API scrapers (this one)
Setup timeMinutes (proxy + fingerprint config)Seconds — paste a URL
Price per 1,000 posts$2 to $20$0.30
ReliabilityCloudflare blocks happenNo anti-bot to fight
Comment threadsOften a separate paid actorIncluded, full nested tree
Reactor + restacker identityNobody exposes thisOptional opt-in, included
Notes scrapingOften a separate actorBuilt in, same actor
Fields per post5 to 2550+

Plain HTTP isn't risk-free though, so the actor adds a Retry-After aware exponential backoff for 429s and an opt-in proxyConfiguration field for very large runs. With Apify Proxy on, each request rotates through a fresh IP and rate limits stop mattering.

What data can you extract?

📝 Article HTML + plain text + ProseMirror bodyJson💬 Full nested comment threads (opt-in)❤️ Reactions object keyed by emoji👥 Reactor + restacker identity (opt-in)
📊 Reaction count, comment count, restack count👤 Multi-byline guest authors with their pubs🔓 Paywall preview text + meter taxonomy📅 Publish date, update date, archive nav slugs
🎧 Voiceover + auto-TTS audio + podcast URL🔍 SEO title + description + social title📝 Substack Notes feed by author handle⏱️ Reading time, word count, tags

URL mode (post scraping)

Paste any Substack URL — custom domain or .substack.com subdomain — and get every post with full HTML, plain text, structured bodyJson, reading time, multi-byline guest authors, full paywall taxonomy, SEO fields, audio renditions, and archive navigation slugs.

Two opt-in flags add depth:

  • includeComments: true — full nested comment tree per post in one HTTP call. Each comment carries its own emoji-keyed reactions, ProseMirror body_json, link-preview attachments, depth + parentId for tree reconstruction, and pinned/edited/deleted flags.
  • includeFacepile: true — names, handles, and primary publication of every user who reacted to or restacked the post. Useful for influence analysis. No other Substack scraper exposes this.

Notes mode (Substack Notes scraping)

Pass notesHandles instead of (or alongside) urls to scrape any author's Notes feed. Each Note record carries body text, structured body_json, attachments (link previews, image embeds), emoji-keyed reactions, restack count, and reply count.

How to scrape Substack: step by step

  1. Create a free Apify account. Takes 30 seconds, no card.
  2. Open Substack Newsletter Scraper in the Apify Console.
  3. Paste newsletter URLs into urls (URL mode), or pass author handles in notesHandles for Notes mode. Tick includeComments if you want comment threads.
  4. Click Start. A typical 50-post run finishes in under 10 seconds.
  5. Export results as JSON, CSV, or Excel — or fetch via the Apify API.

How much does Substack Newsletter Scraper cost?

  • Per 1,000 posts, comments, or Notes: $0.30
  • Per 1,000 reactor/restacker entries (opt-in includeFacepile): $0.20
  • Free-plan yield: roughly 16,000 results per month on the $5 free credit
  • Starter-plan yield: about 96,000 results per month on the $29 Starter plan

A 50-post run with includeComments=true averaging ~10 comments per post = 50 + 500 = 550 billable items ≈ $0.17. Adding includeFacepile=true over 100 reactors/post adds 5,000 facepile entries ≈ $1.00. The Actor Start event is $0.00005 per gigabyte at startup; for any real workload that's rounding error.

Pause whenever. There's no subscription lock-in.

Input

{
"urls": [
"https://newsletter.pragmaticengineer.com",
"https://www.lennysnewsletter.com"
],
"maxPosts": 50,
"includeContent": true,
"contentFormat": "both",
"includeComments": false,
"includeFacepile": false,
"searchKeyword": null,
"audienceFilter": "all",
"typeFilter": "all",
"dateFrom": null,
"dateTo": null,
"sortBy": "newest"
}
FieldTypeDefaultNotes
urlsarrayNewsletter URLs. Custom domains or .substack.com subdomains. Required for URL mode.
maxPostsnumber50Max posts per newsletter. 0 means every post in the archive.
includeContentbooleantrueInclude contentHtml + contentText. Disable for fast metadata-only runs.
contentFormatenumbothhtml, text, or both.
searchKeywordstringFilter the archive by keyword via Substack's archive?search= endpoint. Server-side filter, max 100 chars.
audienceFilterenumallall, free, or paid.
typeFilterenumallall, newsletter, podcast, thread, or video.
includeCommentsbooleanfalseFetch full nested comment thread per post (one extra HTTP call per post).
includeFacepilebooleanfalseFetch reactor + restacker identity per post (one extra HTTP call per post).
dateFrom / dateTostringYYYY-MM-DD bounds.
sortByenumnewestnewest or oldest.
notesHandlesarraySubstack handles for Notes mode (e.g. ["lenny", "paulgraham"]). Provide either urls or notesHandles.
maxNotesPerHandlenumber50Max Notes per handle (1-1000).
proxyConfigurationobjectOptional Apify Proxy. Recommended for runs over 100 posts with includeFacepile, where Substack's per-IP rate limit otherwise eats wall time. Apify Proxy bandwidth is billed by Apify separately.

Notes mode input

{
"notesHandles": ["lenny"],
"maxNotesPerHandle": 50
}

Recipes

Ready-to-paste inputs for common jobs.

Pull the last 25 posts of a newsletter with full content

{
"urls": ["https://newsletter.pragmaticengineer.com"],
"maxPosts": 25,
"includeContent": true
}

Scrape posts plus their full comment threads

{
"urls": ["https://www.lennysnewsletter.com"],
"maxPosts": 25,
"includeComments": true,
"audienceFilter": "free"
}

Find every post about a topic in a newsletter

{
"urls": ["https://newsletter.pragmaticengineer.com"],
"searchKeyword": "AI",
"maxPosts": 50,
"includeContent": true
}

Get reactor + restacker identity for influence analysis

{
"urls": ["https://www.lennysnewsletter.com"],
"maxPosts": 10,
"includeFacepile": true,
"includeContent": false
}

Scrape Substack Notes for a list of authors

{
"notesHandles": ["lenny", "paulgraham", "sahilbloom"],
"maxNotesPerHandle": 50
}

Bulk archive crawl with proxy (avoid rate limits)

{
"urls": ["https://newsletter.pragmaticengineer.com"],
"maxPosts": 0,
"includeFacepile": true,
"proxyConfiguration": { "useApifyProxy": true }
}

Free posts only, sorted oldest first

{
"urls": ["https://www.lennysnewsletter.com"],
"audienceFilter": "free",
"sortBy": "oldest",
"maxPosts": 100
}

Output

Each post is one JSON record. Fields populated only by their corresponding flag (comments, facepile) are null when the flag is off.

{
"id": 165204731,
"title": "New: A free year of Cursor, Google AI Pro, Notion",
"subtitle": "Subscriber perks for paid members",
"slug": "new-a-free-year-of-cursor-google",
"url": "https://www.lennysnewsletter.com/p/new-a-free-year-of-cursor-google",
"canonicalUrl": "https://www.lennysnewsletter.com/p/new-a-free-year-of-cursor-google",
"author": "Lenny Rachitsky",
"authorHandle": "lenny",
"authorImageUrl": "https://substackcdn.com/image/...",
"authorBio": "Writing • Angel investing • Advising",
"authorTwitter": "lennysan",
"bylines": [
{
"id": 1849774,
"name": "Lenny Rachitsky",
"handle": "lenny",
"photoUrl": "https://substackcdn.com/...",
"bio": "Writing • Angel investing • Advising",
"twitterHandle": "lennysan",
"isGuest": false,
"primaryPublicationName": "Lenny's Newsletter",
"primaryPublicationUrl": "https://www.lennysnewsletter.com"
}
],
"publishedAt": "2026-04-21T15:30:00.000Z",
"updatedAt": null,
"contentHtml": "<p>Today I'm thrilled to share...</p>",
"contentText": "Today I'm thrilled to share...",
"bodyJson": { "type": "doc", "content": [/* ProseMirror tree */] },
"wordCount": 1602,
"readingTimeMinutes": 7,
"description": "Subscriber perks for paid members",
"socialTitle": null,
"searchEngineTitle": "A free year of Cursor + Google AI Pro for subscribers",
"searchEngineDescription": "Lenny's Newsletter subscriber perks include...",
"coverImageUrl": "https://substackcdn.com/image/...",
"coverImageIsExplicit": false,
"audienceType": "everyone",
"isPaywalled": false,
"truncatedBodyText": null,
"meterType": null,
"freeUnlockRequired": false,
"teaserPostEligible": false,
"isGeoblocked": false,
"hasCashtag": false,
"reactionCount": 293,
"reactions": { "❤": 293 },
"commentCount": 33,
"childCommentCount": 20,
"restackCount": 9,
"tags": ["AI", "Tools"],
"type": "newsletter",
"hasAudio": false,
"audioUrl": null,
"audioItems": [],
"podcastUrl": null,
"podcastDuration": null,
"previousPostSlug": "your-couch-to-5k-for-ai",
"nextPostSlug": "a-visual-guide-to-getting-out-of",
"newsletter": {
"name": "Lenny's Newsletter",
"description": "Deeply researched product, growth, and career advice",
"url": "https://www.lennysnewsletter.com"
},
"comments": null,
"facepile": null
}

When includeComments is on, comments is a flat array (with parentId + depth for tree reconstruction):

{
"id": 246915982,
"parentId": null,
"depth": 0,
"bodyText": "Our infra is getting slammed, please bear with us...",
"bodyJson": { "type": "doc", "content": [/* … */] },
"authorId": 1849774,
"authorName": "Lenny Rachitsky",
"authorHandle": "lenny",
"authorPhotoUrl": "https://substackcdn.com/...",
"authorBestsellerTier": 10000,
"publishedAt": "2026-04-21T15:56:31.307Z",
"editedAt": "2026-04-21T16:00:11.789Z",
"isPinned": true,
"isDeleted": false,
"reactionCount": 8,
"reactions": { "❤": 8 },
"restackCount": 0
}

When includeFacepile is on, facepile.reactors[] and facepile.restackers[] look like:

{
"id": 73273682,
"name": "Miles Kohl",
"handle": "mileskohl504716",
"photoUrl": "https://substackcdn.com/...",
"bio": null,
"primaryPublicationName": "Miles' Substack",
"primaryPublicationUrl": "https://mileskohl504716.substack.com",
"bestsellerTier": null
}

In Notes mode each record has a different shape:

{
"id": 216329331,
"handle": "lenny",
"authorName": "Lenny Rachitsky",
"authorHandle": "lenny",
"authorPhotoUrl": "https://substackcdn.com/...",
"authorBio": "Writing • Angel investing • Advising",
"bestsellerTier": 10000,
"publishedAt": "2026-02-18T18:18:50.293Z",
"bodyText": "I'm thrilled to welcome The Skip with @Nikhyl Singhal to Lenny's Podcast Network",
"bodyJson": { "type": "doc", "content": [/* … */] },
"reactionCount": 142,
"reactions": { "❤": 130, "🔥": 12 },
"restackCount": 5,
"childrenCount": 8,
"attachments": [/* link previews, embedded images */],
"publicationName": "Lenny's Newsletter",
"publicationUrl": "https://www.lennysnewsletter.com"
}

Field availability by mode

Field groupURL mode (default)URL + includeCommentsURL + includeFacepileNotes mode
Post identity (id, title, slug, url)
Author (name, handle, photo, bylines)✅ (single author)
Content (HTML, text, bodyJson, wordCount)bodyText + bodyJson only
Engagement (reactionCount, reactions emoji map, commentCount, restackCount)reactionCount + restackCount + childrenCount
Paywall taxonomy
SEO + navigation (search engine fields, prev/next slug)
Audio + podcast
comments arraynull✅ full nested treenull
facepile.reactors + facepile.restackersnullnull
Note attachments (link previews, embeds)

FAQ

How much does Substack Newsletter Scraper cost?

Substack Newsletter Scraper uses pay-per-result pricing. Posts, comments, and Notes are $0.30 per 1,000 items. Optional includeFacepile reactor/restacker entries are $0.20 per 1,000. The Apify Free plan gives you $5 in usage credits a month, enough for around 16,000 results. If you run regularly, the $29/month Starter plan covers about 96,000 results.

No subscription lock-in. Pause whenever.

Scraping public data is generally allowed in the US and most of the EU, as long as you don't collect personal data covered by GDPR or CCPA without a lawful basis. This actor only touches publicly accessible Substack pages (no login, no paywall bypass). How you use the output is on you.

Apify's full breakdown: Is web scraping legal?.

Can I integrate Substack Newsletter Scraper with other tools?

Push results into Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and more. Apify treats every actor as a webhook source, so anything that consumes webhooks or pulls from an API works.

Full list: Apify integrations.

Can I use Substack Newsletter Scraper with the Apify API?

Yes. Every run is available via the Apify REST API:

curl -X POST "https://api.apify.com/v2/acts/sourabhbgp~substack-scraper/runs?token=APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{"urls":["https://newsletter.pragmaticengineer.com"],"maxPosts":25,"includeComments":true}'

Docs: Apify API reference.

Can I use Substack Newsletter Scraper through an MCP Server?

Yes. Apify ships an MCP server that exposes every actor as a tool, so Claude Desktop, Cursor, and any other MCP-capable client can call Substack Newsletter Scraper directly. Setup: Apify MCP docs.

Your feedback

Bug, missing field, or odd behavior? Drop a note in the Issues tab. Reports go to a human and fixes usually ship the same week.