Substack Newsletter Scraper - Articles, Metadata & Full Content avatar

Substack Newsletter Scraper - Articles, Metadata & Full Content

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Substack Newsletter Scraper - Articles, Metadata & Full Content

Substack Newsletter Scraper - Articles, Metadata & Full Content

Extract articles, metadata, and content from any Substack newsletter via public API. No proxy needed. Supports multiple newsletters, full article body extraction, audience filtering (free/paid), date range, keyword search, and pagination. Works with both substack.com subdomains and custom domains.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Moris Chao

Moris Chao

Maintained by Community

Actor stats

0

Bookmarked

7

Total users

3

Monthly active users

21 days ago

Last modified

Share

Substack Newsletter Scraper

Scrape articles from any Substack newsletter using the public Substack API. Supports both subdomain.substack.com URLs and custom domains (e.g., www.lennysnewsletter.com).

Features

  • Scrape article metadata (title, author, date, reactions, comments, etc.)
  • Optionally fetch full article HTML body
  • Filter by audience (free/paid), content type, date range, or keyword
  • Scrape multiple newsletters in a single run (comma-separated URLs)
  • Concurrent batch fetching for full article bodies
  • Polite rate limiting (500ms delay between requests)

Input

FieldTypeDefaultDescription
newsletterUrlstringrequiredNewsletter URL(s). Comma-separated for multiple.
maxItemsinteger50Max articles per newsletter. 0 = unlimited.
sortBystring"new""new" (newest first) or "top" (most popular).
includeBodybooleanfalseFetch full HTML body for each article.
audienceFilterstring"all""all", "free", or "paid".
typeFilterstring"all""all", "newsletter", "podcast", or "thread".
dateFromstringOnly articles on/after this date (YYYY-MM-DD).
dateTostringOnly articles on/before this date (YYYY-MM-DD).
searchKeywordstringFilter by keyword in title or description.

Example Input

{
"newsletterUrl": "https://www.lennysnewsletter.com",
"maxItems": 10,
"sortBy": "new",
"includeBody": false
}

Multiple Newsletters

{
"newsletterUrl": "https://www.lennysnewsletter.com, https://stratechery.com",
"maxItems": 20
}

Output

Each article is saved to the default dataset with the following fields:

FieldTypeDescription
idnumberSubstack post ID
titlestringArticle title
subtitlestringArticle subtitle
slugstringURL slug
urlstringFull canonical URL
postDatestringISO 8601 publish date
audiencestring"everyone" or "only_paid"
typestring"newsletter", "podcast", or "thread"
wordcountnumberWord count
reactionsobjectReaction counts (e.g., {"❤": 532})
commentCountnumberNumber of comments
coverImagestringCover image URL
authorstringAuthor name(s)
descriptionstringArticle description/excerpt
bodystringFull HTML body (only when includeBody: true)

Example Output

{
"id": 123456,
"title": "How to build a great product",
"subtitle": "Lessons from top PMs",
"slug": "how-to-build-a-great-product",
"url": "https://www.lennysnewsletter.com/p/how-to-build-a-great-product",
"postDate": "2026-03-03T13:45:17.054Z",
"audience": "everyone",
"type": "newsletter",
"wordcount": 2642,
"reactions": { "❤": 532 },
"commentCount": 9,
"coverImage": "https://substackcdn.com/image/...",
"author": "Lenny Rachitsky",
"description": "A deep dive into product excellence..."
}

Notes

  • Paid articles: Articles marked as "only_paid" may only return a preview of the body content.
  • Rate limiting: The Actor adds a 500ms delay between API requests to avoid overloading Substack servers.
  • Custom domains: Both newsletter.substack.com and custom domains like platformer.news are supported.
  • No authentication required: Uses Substack's public API endpoints.

Cost

This Actor uses only HTTP API calls (no browser), so it's very lightweight:

  • ~256MB memory is sufficient
  • No proxy required
  • A typical run of 50 articles completes in under a minute