Substack Scraper: Newsletter Posts, Archives & Subscribers avatar

Substack Scraper: Newsletter Posts, Archives & Subscribers

Pricing

$1.00 / 1,000 result items

Go to Apify Store
Substack Scraper: Newsletter Posts, Archives & Subscribers

Substack Scraper: Newsletter Posts, Archives & Subscribers

Scrape any Substack publication: full post archive, single post detail with body, comment counts, reactions, paid/free audience, podcast metadata. No auth, no proxies, no cookies. Uses Substack official JSON API. Pay only per result.

Pricing

$1.00 / 1,000 result items

Rating

0.0

(0)

Developer

Perconey

Perconey

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

Scrape any Substack publication in seconds. Get the full post archive, single-post detail with body and comments, reactions, paid/free audience tier, podcast metadata - everything that Substack itself shows publicly. No browser, no proxies, no cookies, no Substack account needed. The actor calls Substack's official JSON API directly, so the data you get is the same data the web app gets, with full fidelity.

Works with every Substack publication: subdomains like stratechery.substack.com, custom domains like lennysnewsletter.com or astralcodexten.com, and paid newsletters (teaser content shown for paid posts).

Why use this Substack scraper?

  • No auth, no setup - point at any publication URL and start scraping. Substack's archive API is fully public.
  • Real data, full fidelity - 25+ fields per post: title, subtitle, body HTML, reaction counts, comment counts, audience tier (free/paid), podcast duration, author bylines, cover image, canonical URL.
  • Custom-domain support - we follow redirects, so lennysnewsletter.com works exactly like *.substack.com.
  • Date range filters - since and until keep your runs scoped and cheap when monitoring on a schedule.
  • Pay-per-result pricing - you only pay for posts you actually receive. Stop a run any time and the meter stops.
  • API + scheduler + integrations - run from the Apify API, cron-schedule, fire webhooks into Slack/Make/Zapier on every new post.

Use cases

  • Competitor / market research - track top newsletters in your niche, see what topics drive reactions, monitor publication cadence.
  • Content audits - export an entire newsletter's archive to spreadsheet, sort by reaction count, find your best-performing posts.
  • Lead generation - identify active subscribers / commenters on industry newsletters (comment counts are public).
  • NLP / research datasets - bulk-export posts with body text and metadata for sentiment, topic modeling, embedding indexes.
  • News monitoring - schedule daily scrapes of trade newsletters, alert on new posts via Apify webhook.
  • Migration backups - if you run a Substack, this is the easiest way to back up your full archive as JSON.

How to use the Substack scraper

  1. Pick an action from the What do you want to scrape? dropdown: getArchive for a publication's post list, getPost for one post's full body.
  2. Fill Substack publication URLs - subdomain (https://stratechery.substack.com) or custom domain (https://www.lennysnewsletter.com).
  3. Set Max posts per publication (default 100, 0 = unlimited).
  4. Optional: since / until ISO dates, audience filter (everyone / only_paid / only_free), includeBody to fetch full HTML body for every post.
  5. Click Save & Start. Results stream into the Dataset tab in real time. Export as JSON, CSV, Excel, HTML or XML.

Input

FieldRequiredWhat it does
actionyesgetArchive or getPost
publicationsyesPublication URLs (getArchive) or post URLs (getPost), one per line
maxItemsnoMax posts per publication for getArchive (0 = unlimited)
since / untilnoISO date filter for post_date
audiencenoeveryone (default), only_paid, or only_free
includeBodynoDefault false. Set true to fetch each post's body_html in the archive run (one extra API call per post).

Output

Every run produces one Dataset with one item per result. Real example - getArchive on lennysnewsletter.com, 3 most recent posts:

[
{ "_type": "post", "_action": "getArchive", "_publication": "https://www.lennysnewsletter.com",
"title": "Why SaaS freemium playbooks don't work in AI, and what to do instead",
"post_date": "2026-05-05T...", "audience": "only_paid",
"canonical_url": "https://www.lennysnewsletter.com/p/...",
"reaction_count": 292, "comment_count": 187, "word_count": 4521,
"author_name": "Lenny Rachitsky" },
{ "_type": "post", "title": "Your Couch-to-5K for AI", "audience": "only_paid",
"reaction_count": 363, "comment_count": 142 },
{ "_type": "post", "title": "New: A free year of Cursor, Google AI Pro...",
"audience": "everyone", "reaction_count": 297, "comment_count": 88 }
]

You can download the dataset in various formats such as JSON, JSON-Lines, CSV, Excel, HTML or XML, or fetch programmatically via the Apify Dataset API.

Data fields

FieldTypeDescription
_typestringpost or error
_actionstringThe action that produced this row
_publicationstringPublication base URL
id / slugstringSubstack internal id and url slug
title / subtitle / descriptionstringHeadlines
post_dateISO 8601When the post was published
typestringnewsletter, podcast, thread, etc.
audiencestringeveryone (free), only_paid, only_free
canonical_urlstringWeb URL of the post
cover_imagestringHero image URL
word_countintWord count (may be null for some posts)
reaction_count / reactionsint / objectHeart count + per-emoji breakdown
comment_count / child_comment_countintTop-level and total comments
podcast_durationintSeconds (podcast posts only)
author_name / author_handle / author_idstringPrimary author
body_htmlstringFull HTML body. Empty in archive mode unless includeBody=true; always populated in getPost mode.

Pricing - what does scraping Substack cost?

Pricing is pay-per-result - you pay only for posts you receive. A budget cap on each run means you never spend more than you allow.

Sample budgets at the published per-item price:

Use caseItemsApprox. cost
Monitor a newsletter for 100 latest posts100~$0.10
Full archive of a 500-post publication500~$0.50
Daily scheduled scrape, 5 new posts/day, 30 days150~$0.15
50-publication competitive scan, 20 posts each1 000~$1.00

See the Pricing section on this page for the exact per-item rate.

Tips & advanced options

  • Schedule it. Set since to the last-run timestamp in your scheduled task so each run only ingests new posts. Cost stays flat regardless of publication age.
  • Skip bodies by default. getArchive returns metadata only by default - that's enough to know what's new. Use getPost separately when you actually need a full post's body.
  • Custom domains work transparently. https://www.lennysnewsletter.com and https://lenny.substack.com both work; we follow Substack's redirects automatically.
  • Audience filter is useful for "free-only" datasets. Paid posts return teaser content only - if you don't want them, set audience: only_free.

Integrations

  • REST API - POST /v2/acts/perconey~substack-scraper/runs
  • Scheduler - cron-style in Apify console
  • Webhooks - Slack, Discord, custom endpoints on RUN_SUCCEEDED
  • Sheets / Notion / Airtable / Google Drive via Apify Integrations
  • Make / Zapier / n8n via the same catalog

FAQ

Do I need a Substack account? No. Every action works fully anonymously.

What about paid posts? Paid posts return teaser content (title, subtitle, preview, audience flag). The body and full post is behind the paywall - the actor surfaces what Substack itself exposes to non-subscribers.

Is this allowed by Substack's terms of service? The actor uses Substack's public JSON API the same way the web app does. Public data is publicly readable. Use the results responsibly - respect privacy, attribute creators, don't redistribute paid-only content.

Can I scrape comments too? The comment_count field is included on every post. Full comment threads are a separate Substack endpoint - planned for a future release.

Rate limits? Substack is generally lenient on the public archive endpoint. The actor paces ~6 req/s, retries on 429 with Retry-After. Heavy parallel runs may still hit limits - start with one run.

Support & feedback

Bug, feature, or custom version? Open an issue from the Issues tab on the Apify page, or message @perconey.bsky.social on Bluesky.


Disclaimer: this scraper reads public Substack data only. Don't use it to harass writers, scrape paid content for redistribution, or violate Substack's Terms of Use.