Substack Scraper – Newsletter Posts, Engagement & Monitoring
Pricing
from $3.00 / 1,000 results
Substack Scraper – Newsletter Posts, Engagement & Monitoring
Scrape any Substack newsletter's full post archive with engagement metadata (likes, comments, paywall status, word count, authors), fetch single posts, and monitor newsletters incrementally — via Substack's public JSON API. No login.
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer
Bobby
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Give it the Substack newsletters you care about and pull their full post history with the numbers that matter — likes, comments, paywall status, word count, and authors — as a clean CSV/JSON. Then keep it fresh: monitor mode returns only the posts published since your last run, so a scheduled task becomes a live feed instead of a re-scrape.
No login, no cookies. Works with both name.substack.com subdomains and custom domains (e.g. www.thefp.com).
You provide the newsletter URLs — this Actor scrapes publications you specify, it does not discover newsletters by topic/keyword (see What this does — and doesn't — return below).
What you can do
| Mode | What it does | Fields it uses |
|---|---|---|
| Publication archive | Every post from one or more newsletters, newest-first (or top). | publicationUrls, sort, maxItems, includePostBody |
| Single posts | Full detail (incl. body + word count) for specific post URLs. | postUrls |
| Monitor (incremental) | Only posts published since the last run, per newsletter. State persists between runs. | publicationUrls, maxItems, monitorStoreName |
Example input
Scrape the latest 50 posts from two newsletters:
{"mode": "publication","publicationUrls": ["astralcodexten.substack.com", "https://www.thefp.com"],"maxItems": 50}
Track new posts daily (schedule this run; each run only returns what's new):
{"mode": "monitor","publicationUrls": ["platformer", "stratechery.com"],"maxItems": 50}
Output (per post)
{"type": "post","title": "Preliminary Thoughts On The Midjourney Scanner","publication": "www.astralcodexten.com","url": "https://www.astralcodexten.com/p/preliminary-thoughts-on-the-midjourney","postDate": "2026-06-19T...Z","postType": "newsletter","audience": "everyone","isPaywalled": false,"reactionCount": 270,"commentCount": 166,"wordcount": 3327,"authors": [{ "name": "Scott Alexander", "handle": "...", "url": "https://substack.com/@..." }],"authorNames": "Scott Alexander"}
What this does — and doesn't — return
To keep expectations honest:
- ✅ Public, structured post data: titles, subtitles, slugs, URLs, publish dates, post type (newsletter / podcast / thread), audience + paywall flag, public like counts, public comment counts, word counts, authors, cover images. Optional full HTML body for non-paywalled posts.
- ❌ No private analytics. Substack does not expose open rates, click rates, or private subscriber counts publicly, so this Actor cannot return them. Anyone promising those from a no-login scraper is over-promising.
- ❌ No email/contact extraction. This is a content-intelligence tool, not a lead-gen scraper.
- ❌ No keyword/topic discovery. Substack's global search endpoint is gated, so this Actor scrapes newsletters you name rather than finding them by topic. Point it at the publications you already know.
- ⚠️ Paywalled posts return a preview body only. For
only_paid/foundingposts,wordcountreflects the full article but the fetchedbodyHtmlis just the free preview — the Actor flags these withisPreviewOnly: trueso you can filter them. All public metadata (title, engagement counts, audience, authors) is still returned.
Notes
- Custom domains just work. Migrated newsletters 301-redirect from their
*.substack.comsubdomain; the Actor follows the redirect automatically. - Proxy is off by default. Substack's public API is open; enable Apify Proxy only for large runs to spread the per-IP rate limit.
maxTotalItemscaps results (and charges) across the whole run, independent of the per-newslettermaxItems.
Pricing
Pay per result — you're charged once per post stored to the dataset. Cap any run with maxTotalItems.