Substack Scraper – Newsletter Posts, Engagement & Monitoring avatar

Substack Scraper – Newsletter Posts, Engagement & Monitoring

Pricing

from $3.00 / 1,000 results

Go to Apify Store
Substack Scraper – Newsletter Posts, Engagement & Monitoring

Substack Scraper – Newsletter Posts, Engagement & Monitoring

Scrape any Substack newsletter's full post archive with engagement metadata (likes, comments, paywall status, word count, authors), fetch single posts, and monitor newsletters incrementally — via Substack's public JSON API. No login.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Bobby

Bobby

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

Give it the Substack newsletters you care about and pull their full post history with the numbers that matter — likes, comments, paywall status, word count, and authors — as a clean CSV/JSON. Then keep it fresh: monitor mode returns only the posts published since your last run, so a scheduled task becomes a live feed instead of a re-scrape.

No login, no cookies. Works with both name.substack.com subdomains and custom domains (e.g. www.thefp.com).

You provide the newsletter URLs — this Actor scrapes publications you specify, it does not discover newsletters by topic/keyword (see What this does — and doesn't — return below).

What you can do

ModeWhat it doesFields it uses
Publication archiveEvery post from one or more newsletters, newest-first (or top).publicationUrls, sort, maxItems, includePostBody
Single postsFull detail (incl. body + word count) for specific post URLs.postUrls
Monitor (incremental)Only posts published since the last run, per newsletter. State persists between runs.publicationUrls, maxItems, monitorStoreName

Example input

Scrape the latest 50 posts from two newsletters:

{
"mode": "publication",
"publicationUrls": ["astralcodexten.substack.com", "https://www.thefp.com"],
"maxItems": 50
}

Track new posts daily (schedule this run; each run only returns what's new):

{
"mode": "monitor",
"publicationUrls": ["platformer", "stratechery.com"],
"maxItems": 50
}

Output (per post)

{
"type": "post",
"title": "Preliminary Thoughts On The Midjourney Scanner",
"publication": "www.astralcodexten.com",
"url": "https://www.astralcodexten.com/p/preliminary-thoughts-on-the-midjourney",
"postDate": "2026-06-19T...Z",
"postType": "newsletter",
"audience": "everyone",
"isPaywalled": false,
"reactionCount": 270,
"commentCount": 166,
"wordcount": 3327,
"authors": [{ "name": "Scott Alexander", "handle": "...", "url": "https://substack.com/@..." }],
"authorNames": "Scott Alexander"
}

What this does — and doesn't — return

To keep expectations honest:

  • Public, structured post data: titles, subtitles, slugs, URLs, publish dates, post type (newsletter / podcast / thread), audience + paywall flag, public like counts, public comment counts, word counts, authors, cover images. Optional full HTML body for non-paywalled posts.
  • No private analytics. Substack does not expose open rates, click rates, or private subscriber counts publicly, so this Actor cannot return them. Anyone promising those from a no-login scraper is over-promising.
  • No email/contact extraction. This is a content-intelligence tool, not a lead-gen scraper.
  • No keyword/topic discovery. Substack's global search endpoint is gated, so this Actor scrapes newsletters you name rather than finding them by topic. Point it at the publications you already know.
  • ⚠️ Paywalled posts return a preview body only. For only_paid/founding posts, wordcount reflects the full article but the fetched bodyHtml is just the free preview — the Actor flags these with isPreviewOnly: true so you can filter them. All public metadata (title, engagement counts, audience, authors) is still returned.

Notes

  • Custom domains just work. Migrated newsletters 301-redirect from their *.substack.com subdomain; the Actor follows the redirect automatically.
  • Proxy is off by default. Substack's public API is open; enable Apify Proxy only for large runs to spread the per-IP rate limit.
  • maxTotalItems caps results (and charges) across the whole run, independent of the per-newsletter maxItems.

Pricing

Pay per result — you're charged once per post stored to the dataset. Cap any run with maxTotalItems.