Substack Scraper: Newsletter Posts, Archives & Subscribers
Pricing
$1.00 / 1,000 result items
Substack Scraper: Newsletter Posts, Archives & Subscribers
Scrape any Substack publication: full post archive, single post detail with body, comment counts, reactions, paid/free audience, podcast metadata. No auth, no proxies, no cookies. Uses Substack official JSON API. Pay only per result.
Pricing
$1.00 / 1,000 result items
Rating
0.0
(0)
Developer
Perconey
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Share
Scrape any Substack publication in seconds. Get the full post archive, single-post detail with body and comments, reactions, paid/free audience tier, podcast metadata - everything that Substack itself shows publicly. No browser, no proxies, no cookies, no Substack account needed. The actor calls Substack's official JSON API directly, so the data you get is the same data the web app gets, with full fidelity.
Works with every Substack publication: subdomains like stratechery.substack.com, custom domains like lennysnewsletter.com or astralcodexten.com, and paid newsletters (teaser content shown for paid posts).
Why use this Substack scraper?
- No auth, no setup - point at any publication URL and start scraping. Substack's archive API is fully public.
- Real data, full fidelity - 25+ fields per post: title, subtitle, body HTML, reaction counts, comment counts, audience tier (free/paid), podcast duration, author bylines, cover image, canonical URL.
- Custom-domain support - we follow redirects, so
lennysnewsletter.comworks exactly like*.substack.com. - Date range filters -
sinceanduntilkeep your runs scoped and cheap when monitoring on a schedule. - Pay-per-result pricing - you only pay for posts you actually receive. Stop a run any time and the meter stops.
- API + scheduler + integrations - run from the Apify API, cron-schedule, fire webhooks into Slack/Make/Zapier on every new post.
Use cases
- Competitor / market research - track top newsletters in your niche, see what topics drive reactions, monitor publication cadence.
- Content audits - export an entire newsletter's archive to spreadsheet, sort by reaction count, find your best-performing posts.
- Lead generation - identify active subscribers / commenters on industry newsletters (comment counts are public).
- NLP / research datasets - bulk-export posts with body text and metadata for sentiment, topic modeling, embedding indexes.
- News monitoring - schedule daily scrapes of trade newsletters, alert on new posts via Apify webhook.
- Migration backups - if you run a Substack, this is the easiest way to back up your full archive as JSON.
How to use the Substack scraper
- Pick an action from the What do you want to scrape? dropdown:
getArchivefor a publication's post list,getPostfor one post's full body. - Fill Substack publication URLs - subdomain (
https://stratechery.substack.com) or custom domain (https://www.lennysnewsletter.com). - Set Max posts per publication (default 100, 0 = unlimited).
- Optional:
since/untilISO dates,audiencefilter (everyone / only_paid / only_free),includeBodyto fetch full HTML body for every post. - Click Save & Start. Results stream into the Dataset tab in real time. Export as JSON, CSV, Excel, HTML or XML.
Input
| Field | Required | What it does |
|---|---|---|
action | yes | getArchive or getPost |
publications | yes | Publication URLs (getArchive) or post URLs (getPost), one per line |
maxItems | no | Max posts per publication for getArchive (0 = unlimited) |
since / until | no | ISO date filter for post_date |
audience | no | everyone (default), only_paid, or only_free |
includeBody | no | Default false. Set true to fetch each post's body_html in the archive run (one extra API call per post). |
Output
Every run produces one Dataset with one item per result. Real example - getArchive on lennysnewsletter.com, 3 most recent posts:
[{ "_type": "post", "_action": "getArchive", "_publication": "https://www.lennysnewsletter.com","title": "Why SaaS freemium playbooks don't work in AI, and what to do instead","post_date": "2026-05-05T...", "audience": "only_paid","canonical_url": "https://www.lennysnewsletter.com/p/...","reaction_count": 292, "comment_count": 187, "word_count": 4521,"author_name": "Lenny Rachitsky" },{ "_type": "post", "title": "Your Couch-to-5K for AI", "audience": "only_paid","reaction_count": 363, "comment_count": 142 },{ "_type": "post", "title": "New: A free year of Cursor, Google AI Pro...","audience": "everyone", "reaction_count": 297, "comment_count": 88 }]
You can download the dataset in various formats such as JSON, JSON-Lines, CSV, Excel, HTML or XML, or fetch programmatically via the Apify Dataset API.
Data fields
| Field | Type | Description |
|---|---|---|
_type | string | post or error |
_action | string | The action that produced this row |
_publication | string | Publication base URL |
id / slug | string | Substack internal id and url slug |
title / subtitle / description | string | Headlines |
post_date | ISO 8601 | When the post was published |
type | string | newsletter, podcast, thread, etc. |
audience | string | everyone (free), only_paid, only_free |
canonical_url | string | Web URL of the post |
cover_image | string | Hero image URL |
word_count | int | Word count (may be null for some posts) |
reaction_count / reactions | int / object | Heart count + per-emoji breakdown |
comment_count / child_comment_count | int | Top-level and total comments |
podcast_duration | int | Seconds (podcast posts only) |
author_name / author_handle / author_id | string | Primary author |
body_html | string | Full HTML body. Empty in archive mode unless includeBody=true; always populated in getPost mode. |
Pricing - what does scraping Substack cost?
Pricing is pay-per-result - you pay only for posts you receive. A budget cap on each run means you never spend more than you allow.
Sample budgets at the published per-item price:
| Use case | Items | Approx. cost |
|---|---|---|
| Monitor a newsletter for 100 latest posts | 100 | ~$0.10 |
| Full archive of a 500-post publication | 500 | ~$0.50 |
| Daily scheduled scrape, 5 new posts/day, 30 days | 150 | ~$0.15 |
| 50-publication competitive scan, 20 posts each | 1 000 | ~$1.00 |
See the Pricing section on this page for the exact per-item rate.
Tips & advanced options
- Schedule it. Set
sinceto the last-run timestamp in your scheduled task so each run only ingests new posts. Cost stays flat regardless of publication age. - Skip bodies by default.
getArchivereturns metadata only by default - that's enough to know what's new. UsegetPostseparately when you actually need a full post's body. - Custom domains work transparently.
https://www.lennysnewsletter.comandhttps://lenny.substack.comboth work; we follow Substack's redirects automatically. - Audience filter is useful for "free-only" datasets. Paid posts return teaser content only - if you don't want them, set
audience: only_free.
Integrations
- REST API -
POST /v2/acts/perconey~substack-scraper/runs - Scheduler - cron-style in Apify console
- Webhooks - Slack, Discord, custom endpoints on
RUN_SUCCEEDED - Sheets / Notion / Airtable / Google Drive via Apify Integrations
- Make / Zapier / n8n via the same catalog
FAQ
Do I need a Substack account? No. Every action works fully anonymously.
What about paid posts? Paid posts return teaser content (title, subtitle, preview, audience flag). The body and full post is behind the paywall - the actor surfaces what Substack itself exposes to non-subscribers.
Is this allowed by Substack's terms of service? The actor uses Substack's public JSON API the same way the web app does. Public data is publicly readable. Use the results responsibly - respect privacy, attribute creators, don't redistribute paid-only content.
Can I scrape comments too?
The comment_count field is included on every post. Full comment threads are a separate Substack endpoint - planned for a future release.
Rate limits?
Substack is generally lenient on the public archive endpoint. The actor paces ~6 req/s, retries on 429 with Retry-After. Heavy parallel runs may still hit limits - start with one run.
Support & feedback
Bug, feature, or custom version? Open an issue from the Issues tab on the Apify page, or message @perconey.bsky.social on Bluesky.
Disclaimer: this scraper reads public Substack data only. Don't use it to harass writers, scrape paid content for redistribution, or violate Substack's Terms of Use.