Substack Newsletter Scraper avatar

Substack Newsletter Scraper

Pricing

from $3.00 / 1,000 posts

Go to Apify Store
Substack Newsletter Scraper

Substack Newsletter Scraper

Scrape Substack newsletter posts — titles, content, reactions, comments, tags, and author data. Supports custom domains. No login needed.

Pricing

from $3.00 / 1,000 posts

Rating

0.0

(0)

Developer

Boundary

Boundary

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Scrape posts from any Substack newsletter — titles, full content, reactions, comments, tags, author data, and more. Works with custom domains. No login or API key needed.

Features

  • Extracts full post data: title, subtitle, full HTML content, description, word count, cover image, tags
  • Scrapes engagement metrics: reactions, comments (with text and reactions), restacks
  • Includes author info: name, handle, bio, profile image, Twitter handle
  • Includes newsletter info: name, subdomain, custom domain, logo, description
  • Captures podcast and audio: podcast URLs, TTS audio URLs when available
  • Accepts flexible input: Substack URLs or custom domains — just paste from your browser
  • Handles pagination automatically — scrape hundreds or thousands of posts
  • No authentication required — uses Substack's public API
  • Supports multiple newsletters in a single run
  • Lightweight — uses HTTP requests only, no browser needed

How much does it cost?

With Substack Newsletter Scraper, you can scrape 1,000 posts for $2.00 in Apify platform credits ($0.002 per post).

Running on the free plan? You can scrape several hundred posts per month at no cost.

Input

FieldDescriptionDefault
Newsletter URLsPaste newsletter URLs from your browser (e.g. https://www.lennysnewsletter.com). Works with both Substack subdomains (name.substack.com) and custom domains.Required
Post LimitMaximum posts per newsletter (0 = unlimited)100
Not Older ThanStop scraping when posts are older than this date (YYYY-MM-DD)No limit

Output example

Each post is saved as a JSON object:

{
"postId": 190767098,
"title": "BOOM: Senate Votes to Block Private Equity from Buying Homes",
"subtitle": "For 15 years, elites have fought against homeownership...",
"slug": "boom-senate-votes-to-block-private",
"url": "https://www.thebignewsletter.com/p/boom-senate-votes-to-block-private",
"date": "2026-03-13T22:39:57.833Z",
"updatedAt": "2026-03-13T23:39:58.809Z",
"type": "newsletter",
"audience": "everyone",
"description": "For 15 years, elites have fought against homeownership...",
"contentHtml": "<p>Until the Iran war, the main political pressure on...</p>",
"contentText": "Until the Iran war, the main political pressure on...",
"wordCount": 2040,
"coverImageUrl": "https://substackcdn.com/image/fetch/...",
"sectionName": null,
"reactionCount": 320,
"reactions": { "❤": 320 },
"commentCount": 8,
"restackCount": 42,
"tags": ["Private equity", "housing", "Wall Street"],
"authorName": "Matt Stoller",
"authorHandle": "mattstoller",
"authorBio": "Matt Stoller is Research Director for the American Economic...",
"authorImageUrl": "https://bucketeer-e05bbc84-baa3-437e.../915fa1b4-....png",
"authorTwitter": null,
"newsletterName": "BIG by Matt Stoller",
"newsletterSubdomain": "mattstoller",
"newsletterDomain": "www.thebignewsletter.com",
"newsletterLogoUrl": "https://bucketeer-e05bbc84-baa3-437e.../c12cbcf7-....png",
"newsletterDescription": "The history and politics of monopoly power.",
"podcastUrl": null,
"podcastDuration": null,
"ttsAudioUrl": "https://substack-video.s3.amazonaws.com/video_upload/post/...",
"comments": [
{
"name": "marku52",
"body": "I don't understand how they intend to keep Wall Street from...",
"date": "2026-03-14T00:36:45.063Z",
"reactions": { "❤": 20 }
}
],
"scrapedAt": "2026-03-14T11:40:28.777Z"
}

All output fields

FieldTypeDescription
postIdnumberSubstack post ID
titlestringPost title
subtitlestringPost subtitle
slugstringURL slug
urlstringFull post URL
datestringPublish date (ISO 8601)
updatedAtstringLast update date (ISO 8601)
typestringPost type (newsletter or podcast)
audiencestringeveryone = free, only_paid = paywalled, only_free = free-only
descriptionstringShort description / preview text
contentHtmlstringFull HTML content for free posts. Teaser (first few paragraphs) for paid posts.
contentTextstringTruncated plain text (~200 chars)
wordCountnumberWord count
coverImageUrlstringCover image URL
sectionNamestringNewsletter section name
reactionCountnumberTotal reaction count
reactionsobjectReactions by emoji (e.g. {"❤": 320})
commentCountnumberNumber of comments
restackCountnumberNumber of restacks
tagsarrayPost tags
authorNamestringAuthor display name
authorHandlestringAuthor handle
authorBiostringAuthor bio
authorImageUrlstringAuthor profile image URL
authorTwitterstringAuthor Twitter/X handle
newsletterNamestringNewsletter name
newsletterSubdomainstringSubstack subdomain
newsletterDomainstringCustom domain (or Substack subdomain)
newsletterLogoUrlstringNewsletter logo URL
newsletterDescriptionstringNewsletter tagline
podcastUrlstringPodcast audio URL
podcastDurationnumberPodcast duration in seconds
ttsAudioUrlstringAI-generated text-to-speech audio URL
commentsarrayComments with name, body, date, and reactions
scrapedAtstringWhen the data was scraped (ISO 8601)

About paid content

By default, this scraper only collects free posts (audience: "everyone"). Paid/paywalled posts are skipped.

If you turn on "Include Paid Posts", the scraper will attempt to fetch paywalled posts too. However, Substack's public API has limitations for paid content:

  • Content is partial — you'll get a teaser (first few paragraphs), not the full article. This is a Substack limitation, not ours.
  • Some newsletters block paid posts entirely — the API returns HTTP 403 for paid posts on certain newsletters. These posts are skipped automatically.
  • Comments may be empty — if comment permissions are set to paid subscribers only, the API returns no comments.
  • No TTS audio — text-to-speech audio is not available for paid posts.

You can always check the audience field in the output to see whether a post is free (everyone) or paywalled (only_paid).

Tips

  • Custom domains work the samewww.lennysnewsletter.com and lenny.substack.com both work. Just paste the URL from your browser.
  • Use "Not Older Than" to save costs — for recurring scrapes, set a date cutoff so you only fetch new posts.
  • Post limit is per newsletter — if you scrape 3 newsletters with a limit of 100, you'll get up to 300 posts total.
  • TTS audio — many free Substack posts have AI-generated audio versions. The ttsAudioUrl field captures these when available.