Substack Scraper – Posts, Comments & Notes
Pricing
from $0.30 / 1,000 results
Substack Scraper – Posts, Comments & Notes
Scrape every Substack surface in one actor — full posts (50+ fields, complete article HTML), nested comment threads, emoji reaction breakdown, Substack Notes, restacker identity, multi-byline authors, custom domains. Direct JSON API + RSS, no browser, no Cloudflare. From $0.30 per 1,000 posts.
Pricing
from $0.30 / 1,000 results
Rating
0.0
(0)
Developer
Sourabh Kumar
Actor stats
0
Bookmarked
14
Total users
7
Monthly active users
3 days ago
Last modified
Categories
Share
Scrape any Substack newsletter for posts, full article content, comment threads, Substack Notes, and reactor identity — talking to Substack's own JSON endpoints, not fighting a browser. Custom domains and .substack.com subdomains both work. No login, no proxy required (but optional Apify Proxy is supported for very large runs).
$0.30 per 1,000 results. Each scraped post, comment, or Note counts as one result. Reactor and restacker identity (opt-in via includeFacepile) is $0.20 per 1,000 entries.
Designed for the global Substack community: most large publications are English, but the actor handles UTF-8 cleanly so non-English newsletters (German, Spanish, Arabic, Japanese, Hebrew) come through with original characters intact.
Why this scraper
Substack doesn't publish a content API. Most actors on the Apify Store fight Cloudflare with a headless browser and split the surface across 3 to 6 separate scrapers — one for posts, one for comments, one for Notes, one for the leaderboard. This actor talks directly to Substack's internal JSON endpoints with plain HTTP. One actor, one price, every surface.
| Concern | Browser-based scrapers | Internal-API scrapers (this one) |
|---|---|---|
| Setup time | Minutes (proxy + fingerprint config) | Seconds — paste a URL |
| Price per 1,000 posts | $2 to $20 | $0.30 |
| Reliability | Cloudflare blocks happen | No anti-bot to fight |
| Comment threads | Often a separate paid actor | Included, full nested tree |
| Reactor + restacker identity | Nobody exposes this | Optional opt-in, included |
| Notes scraping | Often a separate actor | Built in, same actor |
| Fields per post | 5 to 25 | 50+ |
Plain HTTP isn't risk-free though, so the actor adds a Retry-After aware exponential backoff for 429s and an opt-in proxyConfiguration field for very large runs. With Apify Proxy on, each request rotates through a fresh IP and rate limits stop mattering.
What data can you extract?
📝 Article HTML + plain text + ProseMirror bodyJson | 💬 Full nested comment threads (opt-in) | ❤️ Reactions object keyed by emoji | 👥 Reactor + restacker identity (opt-in) |
| 📊 Reaction count, comment count, restack count | 👤 Multi-byline guest authors with their pubs | 🔓 Paywall preview text + meter taxonomy | 📅 Publish date, update date, archive nav slugs |
| 🎧 Voiceover + auto-TTS audio + podcast URL | 🔍 SEO title + description + social title | 📝 Substack Notes feed by author handle | ⏱️ Reading time, word count, tags |
URL mode (post scraping)
Paste any Substack URL — custom domain or .substack.com subdomain — and get every post with full HTML, plain text, structured bodyJson, reading time, multi-byline guest authors, full paywall taxonomy, SEO fields, audio renditions, and archive navigation slugs.
Two opt-in flags add depth:
includeComments: true— full nested comment tree per post in one HTTP call. Each comment carries its own emoji-keyed reactions, ProseMirrorbody_json, link-preview attachments, depth + parentId for tree reconstruction, and pinned/edited/deleted flags.includeFacepile: true— names, handles, and primary publication of every user who reacted to or restacked the post. Useful for influence analysis. No other Substack scraper exposes this.
Notes mode (Substack Notes scraping)
Pass notesHandles instead of (or alongside) urls to scrape any author's Notes feed. Each Note record carries body text, structured body_json, attachments (link previews, image embeds), emoji-keyed reactions, restack count, and reply count.
How to scrape Substack: step by step
- Create a free Apify account. Takes 30 seconds, no card.
- Open Substack Newsletter Scraper in the Apify Console.
- Paste newsletter URLs into
urls(URL mode), or pass author handles innotesHandlesfor Notes mode. TickincludeCommentsif you want comment threads. - Click Start. A typical 50-post run finishes in under 10 seconds.
- Export results as JSON, CSV, or Excel — or fetch via the Apify API.
How much does Substack Newsletter Scraper cost?
- Per 1,000 posts, comments, or Notes: $0.30
- Per 1,000 reactor/restacker entries (opt-in
includeFacepile): $0.20 - Free-plan yield: roughly 16,000 results per month on the $5 free credit
- Starter-plan yield: about 96,000 results per month on the $29 Starter plan
A 50-post run with includeComments=true averaging ~10 comments per post = 50 + 500 = 550 billable items ≈ $0.17. Adding includeFacepile=true over 100 reactors/post adds 5,000 facepile entries ≈ $1.00. The Actor Start event is $0.00005 per gigabyte at startup; for any real workload that's rounding error.
Pause whenever. There's no subscription lock-in.
Input
{"urls": ["https://newsletter.pragmaticengineer.com","https://www.lennysnewsletter.com"],"maxPosts": 50,"includeContent": true,"contentFormat": "both","includeComments": false,"includeFacepile": false,"searchKeyword": null,"audienceFilter": "all","typeFilter": "all","dateFrom": null,"dateTo": null,"sortBy": "newest"}
| Field | Type | Default | Notes |
|---|---|---|---|
urls | array | — | Newsletter URLs. Custom domains or .substack.com subdomains. Required for URL mode. |
maxPosts | number | 50 | Max posts per newsletter. 0 means every post in the archive. |
includeContent | boolean | true | Include contentHtml + contentText. Disable for fast metadata-only runs. |
contentFormat | enum | both | html, text, or both. |
searchKeyword | string | — | Filter the archive by keyword via Substack's archive?search= endpoint. Server-side filter, max 100 chars. |
audienceFilter | enum | all | all, free, or paid. |
typeFilter | enum | all | all, newsletter, podcast, thread, or video. |
includeComments | boolean | false | Fetch full nested comment thread per post (one extra HTTP call per post). |
includeFacepile | boolean | false | Fetch reactor + restacker identity per post (one extra HTTP call per post). |
dateFrom / dateTo | string | — | YYYY-MM-DD bounds. |
sortBy | enum | newest | newest or oldest. |
notesHandles | array | — | Substack handles for Notes mode (e.g. ["lenny", "paulgraham"]). Provide either urls or notesHandles. |
maxNotesPerHandle | number | 50 | Max Notes per handle (1-1000). |
proxyConfiguration | object | — | Optional Apify Proxy. Recommended for runs over 100 posts with includeFacepile, where Substack's per-IP rate limit otherwise eats wall time. Apify Proxy bandwidth is billed by Apify separately. |
Notes mode input
{"notesHandles": ["lenny"],"maxNotesPerHandle": 50}
Recipes
Ready-to-paste inputs for common jobs.
Pull the last 25 posts of a newsletter with full content
{"urls": ["https://newsletter.pragmaticengineer.com"],"maxPosts": 25,"includeContent": true}
Scrape posts plus their full comment threads
{"urls": ["https://www.lennysnewsletter.com"],"maxPosts": 25,"includeComments": true,"audienceFilter": "free"}
Find every post about a topic in a newsletter
{"urls": ["https://newsletter.pragmaticengineer.com"],"searchKeyword": "AI","maxPosts": 50,"includeContent": true}
Get reactor + restacker identity for influence analysis
{"urls": ["https://www.lennysnewsletter.com"],"maxPosts": 10,"includeFacepile": true,"includeContent": false}
Scrape Substack Notes for a list of authors
{"notesHandles": ["lenny", "paulgraham", "sahilbloom"],"maxNotesPerHandle": 50}
Bulk archive crawl with proxy (avoid rate limits)
{"urls": ["https://newsletter.pragmaticengineer.com"],"maxPosts": 0,"includeFacepile": true,"proxyConfiguration": { "useApifyProxy": true }}
Free posts only, sorted oldest first
{"urls": ["https://www.lennysnewsletter.com"],"audienceFilter": "free","sortBy": "oldest","maxPosts": 100}
Output
Each post is one JSON record. Fields populated only by their corresponding flag (comments, facepile) are null when the flag is off.
{"id": 165204731,"title": "New: A free year of Cursor, Google AI Pro, Notion","subtitle": "Subscriber perks for paid members","slug": "new-a-free-year-of-cursor-google","url": "https://www.lennysnewsletter.com/p/new-a-free-year-of-cursor-google","canonicalUrl": "https://www.lennysnewsletter.com/p/new-a-free-year-of-cursor-google","author": "Lenny Rachitsky","authorHandle": "lenny","authorImageUrl": "https://substackcdn.com/image/...","authorBio": "Writing • Angel investing • Advising","authorTwitter": "lennysan","bylines": [{"id": 1849774,"name": "Lenny Rachitsky","handle": "lenny","photoUrl": "https://substackcdn.com/...","bio": "Writing • Angel investing • Advising","twitterHandle": "lennysan","isGuest": false,"primaryPublicationName": "Lenny's Newsletter","primaryPublicationUrl": "https://www.lennysnewsletter.com"}],"publishedAt": "2026-04-21T15:30:00.000Z","updatedAt": null,"contentHtml": "<p>Today I'm thrilled to share...</p>","contentText": "Today I'm thrilled to share...","bodyJson": { "type": "doc", "content": [/* ProseMirror tree */] },"wordCount": 1602,"readingTimeMinutes": 7,"description": "Subscriber perks for paid members","socialTitle": null,"searchEngineTitle": "A free year of Cursor + Google AI Pro for subscribers","searchEngineDescription": "Lenny's Newsletter subscriber perks include...","coverImageUrl": "https://substackcdn.com/image/...","coverImageIsExplicit": false,"audienceType": "everyone","isPaywalled": false,"truncatedBodyText": null,"meterType": null,"freeUnlockRequired": false,"teaserPostEligible": false,"isGeoblocked": false,"hasCashtag": false,"reactionCount": 293,"reactions": { "❤": 293 },"commentCount": 33,"childCommentCount": 20,"restackCount": 9,"tags": ["AI", "Tools"],"type": "newsletter","hasAudio": false,"audioUrl": null,"audioItems": [],"podcastUrl": null,"podcastDuration": null,"previousPostSlug": "your-couch-to-5k-for-ai","nextPostSlug": "a-visual-guide-to-getting-out-of","newsletter": {"name": "Lenny's Newsletter","description": "Deeply researched product, growth, and career advice","url": "https://www.lennysnewsletter.com"},"comments": null,"facepile": null}
When includeComments is on, comments is a flat array (with parentId + depth for tree reconstruction):
{"id": 246915982,"parentId": null,"depth": 0,"bodyText": "Our infra is getting slammed, please bear with us...","bodyJson": { "type": "doc", "content": [/* … */] },"authorId": 1849774,"authorName": "Lenny Rachitsky","authorHandle": "lenny","authorPhotoUrl": "https://substackcdn.com/...","authorBestsellerTier": 10000,"publishedAt": "2026-04-21T15:56:31.307Z","editedAt": "2026-04-21T16:00:11.789Z","isPinned": true,"isDeleted": false,"reactionCount": 8,"reactions": { "❤": 8 },"restackCount": 0}
When includeFacepile is on, facepile.reactors[] and facepile.restackers[] look like:
{"id": 73273682,"name": "Miles Kohl","handle": "mileskohl504716","photoUrl": "https://substackcdn.com/...","bio": null,"primaryPublicationName": "Miles' Substack","primaryPublicationUrl": "https://mileskohl504716.substack.com","bestsellerTier": null}
In Notes mode each record has a different shape:
{"id": 216329331,"handle": "lenny","authorName": "Lenny Rachitsky","authorHandle": "lenny","authorPhotoUrl": "https://substackcdn.com/...","authorBio": "Writing • Angel investing • Advising","bestsellerTier": 10000,"publishedAt": "2026-02-18T18:18:50.293Z","bodyText": "I'm thrilled to welcome The Skip with @Nikhyl Singhal to Lenny's Podcast Network","bodyJson": { "type": "doc", "content": [/* … */] },"reactionCount": 142,"reactions": { "❤": 130, "🔥": 12 },"restackCount": 5,"childrenCount": 8,"attachments": [/* link previews, embedded images */],"publicationName": "Lenny's Newsletter","publicationUrl": "https://www.lennysnewsletter.com"}
Field availability by mode
| Field group | URL mode (default) | URL + includeComments | URL + includeFacepile | Notes mode |
|---|---|---|---|---|
| Post identity (id, title, slug, url) | ✅ | ✅ | ✅ | — |
| Author (name, handle, photo, bylines) | ✅ | ✅ | ✅ | ✅ (single author) |
| Content (HTML, text, bodyJson, wordCount) | ✅ | ✅ | ✅ | bodyText + bodyJson only |
| Engagement (reactionCount, reactions emoji map, commentCount, restackCount) | ✅ | ✅ | ✅ | reactionCount + restackCount + childrenCount |
| Paywall taxonomy | ✅ | ✅ | ✅ | — |
| SEO + navigation (search engine fields, prev/next slug) | ✅ | ✅ | ✅ | — |
| Audio + podcast | ✅ | ✅ | ✅ | — |
comments array | null | ✅ full nested tree | null | — |
facepile.reactors + facepile.restackers | null | null | ✅ | — |
Note attachments (link previews, embeds) | — | — | — | ✅ |
FAQ
How much does Substack Newsletter Scraper cost?
Substack Newsletter Scraper uses pay-per-result pricing. Posts, comments, and Notes are $0.30 per 1,000 items. Optional includeFacepile reactor/restacker entries are $0.20 per 1,000. The Apify Free plan gives you $5 in usage credits a month, enough for around 16,000 results. If you run regularly, the $29/month Starter plan covers about 96,000 results.
No subscription lock-in. Pause whenever.
Is it legal to scrape Substack?
Scraping public data is generally allowed in the US and most of the EU, as long as you don't collect personal data covered by GDPR or CCPA without a lawful basis. This actor only touches publicly accessible Substack pages (no login, no paywall bypass). How you use the output is on you.
Apify's full breakdown: Is web scraping legal?.
Can I integrate Substack Newsletter Scraper with other tools?
Push results into Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and more. Apify treats every actor as a webhook source, so anything that consumes webhooks or pulls from an API works.
Full list: Apify integrations.
Can I use Substack Newsletter Scraper with the Apify API?
Yes. Every run is available via the Apify REST API:
curl -X POST "https://api.apify.com/v2/acts/sourabhbgp~substack-scraper/runs?token=APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"urls":["https://newsletter.pragmaticengineer.com"],"maxPosts":25,"includeComments":true}'
Docs: Apify API reference.
Can I use Substack Newsletter Scraper through an MCP Server?
Yes. Apify ships an MCP server that exposes every actor as a tool, so Claude Desktop, Cursor, and any other MCP-capable client can call Substack Newsletter Scraper directly. Setup: Apify MCP docs.
Your feedback
Bug, missing field, or odd behavior? Drop a note in the Issues tab. Reports go to a human and fixes usually ship the same week.