Substack Newsletter Scraper
Pricing
Pay per usage
Substack Newsletter Scraper
Scrape posts from any Substack publication — title, subtitle, date, paywall status, reaction count, comment count, word count, and full body HTML for free posts. Handles custom domains. Paginates to the full archive.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
🎯 What this scrapes
Substack hosts thousands of newsletters across every niche — from tech and finance to culture and investigative journalism. This Actor taps each publication's public JSON API to extract the full post archive: metadata for every post, plus body HTML for free-tier content. Works with native Substack domains (author.substack.com) and custom domains alike.
🔥 What we handle for you
- 🛡️ Browser fingerprint rotation —
curl-cffiimpersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per page,Retry-Afterhonoured. - 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
- 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.
💡 Use cases
- Newsletter intelligence — track a competitor publication's cadence, topics, and engagement over time.
- Content research — aggregate posts across multiple newsletters on a niche to find trends and gaps.
- Lead gen — identify high-engagement free posts to understand what resonates with a target audience.
- Archive backfill — pull the full post history into a warehouse for analysis or ML training.
- Paywall mapping — understand the free-vs-paid content split for any publication.
⚙️ How to use it
- Click Try for free at the top of the page.
- Fill in the input form — most fields have sensible defaults.
- Click Start. Output streams into the run's dataset.
- Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the API.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
publications | array | yes | ['https://www.lennysnewsletter.com'] | One or more Substack publication URLs or custom domains. Accepts both native substack.com URLs and custom domains (e.g. |
maxPostsPerPublication | integer | no | 50 | Maximum number of posts to collect per publication. Set to 0 to collect the full archive. |
includeBody | boolean | no | True | When enabled, body_html is included for free posts. Paywalled posts return null body. |
proxyConfiguration | object | no | {'useApifyProxy': True} | Apify Proxy configuration. Substack's public API is well-behaved; the default shared pool is sufficient for most volumes |
Example input
{"publications": ["https://www.lennysnewsletter.com"],"maxPostsPerPublication": 5,"includeBody": false,"proxyConfiguration": {"useApifyProxy": true}}
📤 Output
Every row is one dataset item.
| Field | Type | Notes |
|---|---|---|
publication | string | Base URL of the publication that produced this post. |
post_id | integer | Substack internal numeric post ID. |
title | string | Post title. |
subtitle | ['string', 'null'] | Post subtitle / deck (may be null). |
url | string | Canonical URL of the post. |
post_date | string | Publication timestamp (ISO-8601). |
audience | string | Access tier: free or only_paid. |
is_paywalled | boolean | True when the post is behind the paywall. |
reaction_count | integer | Total like/heart reactions. |
comment_count | integer | Number of public comments. |
word_count | ['integer', 'null'] | Word count when provided by the API. |
body_html | ['string', 'null'] | Full post HTML for free posts; null for paywalled posts or when includeBody is false. |
scraped_at | string | ISO-8601 timestamp of when this row was recorded. |
Example output
{"publication": "https://www.lennysnewsletter.com","post_id": 123456,"title": "The most important metric you're ignoring","subtitle": "How to think about retention in your product","url": "https://www.lennysnewsletter.com/p/the-most-important-metric","post_date": "2024-03-15T09:00:00.000Z","audience": "free","is_paywalled": false,"reaction_count": 847,"comment_count": 43,"word_count": 2100,"body_html": null,"scraped_at": "2026-06-07T10:00:00Z"}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | USD | What it is |
|---|---|---|
actor-start | $0.005 | One-off warm-up charge per run |
result | $0.0025 | Per dataset item |
Example: 1 000 results at the rates above ≈ $2.50. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.
🚧 Limitations
Body HTML is only available for posts the publication makes free. Paywalled posts return metadata only. Private Substack communities (invite-only) are not accessible. The Actor honours Substack's public API rate limits — very large archives may take several minutes.
❓ FAQ
Does this work with custom domains?
Yes. Pass the full base URL of the publication — whether it's author.substack.com or a custom domain like https://www.lennysnewsletter.com — and the Actor resolves the correct API endpoint.
Can I get paywalled post bodies?
No. Paywalled posts require a paid subscription; body_html returns null for those. You get all public metadata (title, date, reaction count, etc.) regardless of paywall status.
How far back does the archive go?
As far as the publication's API returns. We paginate until we hit maxPostsPerPublication or exhaust the archive, whichever comes first.
What if a publication has thousands of posts?
Set maxPostsPerPublication to 0 and the Actor will paginate to the full archive. Runs are priced per result, so longer archives cost proportionally more.
💬 Your feedback
Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.