Substack Newsletter Scraper - Articles, Metadata & Full Content avatar

Substack Newsletter Scraper - Articles, Metadata & Full Content

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Substack Newsletter Scraper - Articles, Metadata & Full Content

Substack Newsletter Scraper - Articles, Metadata & Full Content

Extract articles, metadata, and content from any Substack newsletter via public API. No proxy needed. Supports multiple newsletters, full article body extraction, audience filtering (free/paid), date range, keyword search, and pagination. Works with both substack.com subdomains and custom domains.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Moris Chao

Moris Chao

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

Substack Newsletter Scraper

Scrape articles from any Substack newsletter using the public Substack API. Supports both subdomain.substack.com URLs and custom domains (e.g., www.lennysnewsletter.com).

Features

  • Scrape article metadata (title, author, date, reactions, comments, etc.)
  • Optionally fetch full article HTML body
  • Filter by audience (free/paid), content type, date range, or keyword
  • Scrape multiple newsletters in a single run (comma-separated URLs)
  • Concurrent batch fetching for full article bodies
  • Polite rate limiting (500ms delay between requests)

Input

FieldTypeDefaultDescription
newsletterUrlstringrequiredNewsletter URL(s). Comma-separated for multiple.
maxItemsinteger50Max articles per newsletter. 0 = unlimited.
sortBystring"new""new" (newest first) or "top" (most popular).
includeBodybooleanfalseFetch full HTML body for each article.
audienceFilterstring"all""all", "free", or "paid".
typeFilterstring"all""all", "newsletter", "podcast", or "thread".
dateFromstringOnly articles on/after this date (YYYY-MM-DD).
dateTostringOnly articles on/before this date (YYYY-MM-DD).
searchKeywordstringFilter by keyword in title or description.

Example Input

{
"newsletterUrl": "https://www.lennysnewsletter.com",
"maxItems": 10,
"sortBy": "new",
"includeBody": false
}

Multiple Newsletters

{
"newsletterUrl": "https://www.lennysnewsletter.com, https://stratechery.com",
"maxItems": 20
}

Output

Each article is saved to the default dataset with the following fields:

FieldTypeDescription
idnumberSubstack post ID
titlestringArticle title
subtitlestringArticle subtitle
slugstringURL slug
urlstringFull canonical URL
postDatestringISO 8601 publish date
audiencestring"everyone" or "only_paid"
typestring"newsletter", "podcast", or "thread"
wordcountnumberWord count
reactionsobjectReaction counts (e.g., {"❤": 532})
commentCountnumberNumber of comments
coverImagestringCover image URL
authorstringAuthor name(s)
descriptionstringArticle description/excerpt
bodystringFull HTML body (only when includeBody: true)

Example Output

{
"id": 123456,
"title": "How to build a great product",
"subtitle": "Lessons from top PMs",
"slug": "how-to-build-a-great-product",
"url": "https://www.lennysnewsletter.com/p/how-to-build-a-great-product",
"postDate": "2026-03-03T13:45:17.054Z",
"audience": "everyone",
"type": "newsletter",
"wordcount": 2642,
"reactions": { "❤": 532 },
"commentCount": 9,
"coverImage": "https://substackcdn.com/image/...",
"author": "Lenny Rachitsky",
"description": "A deep dive into product excellence..."
}

Notes

  • Paid articles: Articles marked as "only_paid" may only return a preview of the body content.
  • Rate limiting: The Actor adds a 500ms delay between API requests to avoid overloading Substack servers.
  • Custom domains: Both newsletter.substack.com and custom domains like platformer.news are supported.
  • No authentication required: Uses Substack's public API endpoints.

Cost

This Actor uses only HTTP API calls (no browser), so it's very lightweight:

  • ~256MB memory is sufficient
  • No proxy required
  • A typical run of 50 articles completes in under a minute