Pricing

from $2.50 / 1,000 results

Substack Newsletter Scraper

Scrape posts from any Substack publication — title, subtitle, date, paywall status, reaction count, comment count, word count, and full body HTML for free posts. Handles custom domains. Paginates to the full archive.

Pricing

from $2.50 / 1,000 results

Rating

0.0

(0)

Developer

DevilScrapes

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

🎯 What this scrapes

Substack hosts thousands of newsletters across every niche — from tech and finance to culture and investigative journalism. This Actor taps each publication's public JSON API to extract the full post archive: metadata for every post, plus body HTML for free-tier content. Works with native Substack domains (author.substack.com) and custom domains alike.

🔥 What we handle for you

🛡️ Browser fingerprint rotation — curl-cffi impersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python.
🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
🔁 Retries with exponential backoff on 408 / 429 / 5xx — up to 5 attempts per page, Retry-After honoured.
🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.

💡 Use cases

Newsletter intelligence — track a competitor publication's cadence, topics, and engagement over time.
Content research — aggregate posts across multiple newsletters on a niche to find trends and gaps.
Lead gen — identify high-engagement free posts to understand what resonates with a target audience.
Archive backfill — pull the full post history into a warehouse for analysis or ML training.
Paywall mapping — understand the free-vs-paid content split for any publication.

⚙️ How to use it

Click Try for free at the top of the page.
Fill in the input form — most fields have sensible defaults.
Click Start. Output streams into the run's dataset.
Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the API.

📥 Input

Field	Type	Required	Default	Notes
`publications`	`array`	yes	['https://www.lennysnewsletter.com']	One or more Substack publication URLs or custom domains. Accepts both native substack.com URLs and custom domains (e.g.
`maxPostsPerPublication`	`integer`	no	50	Maximum number of posts to collect per publication. Set to 0 to collect the full archive.
`includeBody`	`boolean`	no	True	When enabled, body_html is included for free posts. Paywalled posts return null body.
`proxyConfiguration`	`object`	no	{'useApifyProxy': True}	Apify Proxy configuration. Substack's public API is well-behaved; the default shared pool is sufficient for most volumes

Example input

{
  "publications": [
    "https://www.lennysnewsletter.com"
  ],
  "maxPostsPerPublication": 5,
  "includeBody": false,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}

📤 Output

Every row is one dataset item.

Field	Type	Notes
`publication`	`string`	Base URL of the publication that produced this post.
`post_id`	`integer`	Substack internal numeric post ID.
`title`	`string`	Post title.
`subtitle`	`['string', 'null']`	Post subtitle / deck (may be null).
`url`	`string`	Canonical URL of the post.
`post_date`	`string`	Publication timestamp (ISO-8601).
`audience`	`string`	Access tier: `free` or `only_paid`.
`is_paywalled`	`boolean`	True when the post is behind the paywall.
`reaction_count`	`integer`	Total like/heart reactions.
`comment_count`	`integer`	Number of public comments.
`word_count`	`['integer', 'null']`	Word count when provided by the API.
`body_html`	`['string', 'null']`	Full post HTML for free posts; null for paywalled posts or when includeBody is false.
`scraped_at`	`string`	ISO-8601 timestamp of when this row was recorded.

Example output

{
  "publication": "https://www.lennysnewsletter.com",
  "post_id": 123456,
  "title": "The most important metric you're ignoring",
  "subtitle": "How to think about retention in your product",
  "url": "https://www.lennysnewsletter.com/p/the-most-important-metric",
  "post_date": "2024-03-15T09:00:00.000Z",
  "audience": "free",
  "is_paywalled": false,
  "reaction_count": 847,
  "comment_count": 43,
  "word_count": 2100,
  "body_html": null,
  "scraped_at": "2026-06-07T10:00:00Z"
}

💰 Pricing

Pay-Per-Event — you pay only when these events fire:

Event	USD	What it is
`actor-start`	$0.005	One-off warm-up charge per run
`result`	$0.0025	Per dataset item

Example: 1 000 results at the rates above ≈ $2.50. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.

🚧 Limitations

Body HTML is only available for posts the publication makes free. Paywalled posts return metadata only. Private Substack communities (invite-only) are not accessible. The Actor honours Substack's public API rate limits — very large archives may take several minutes.

❓ FAQ

Does this work with custom domains?

Yes. Pass the full base URL of the publication — whether it's author.substack.com or a custom domain like https://www.lennysnewsletter.com — and the Actor resolves the correct API endpoint.

Can I get paywalled post bodies?

No. Paywalled posts require a paid subscription; body_html returns null for those. You get all public metadata (title, date, reaction count, etc.) regardless of paywall status.

How far back does the archive go?

As far as the publication's API returns. We paginate until we hit maxPostsPerPublication or exhaust the archive, whichever comes first.

What if a publication has thousands of posts?

Set maxPostsPerPublication to 0 and the Actor will paginate to the full archive. Runs are priced per result, so longer archives cost proportionally more.

💬 Your feedback

Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.

Substack Posts & Creator Scraper

sleek_waveform/substack-creator-scraper

Scrape posts, engagement metrics, and author data from any Substack publication. Get title, author, publish date, likes, comments, paywall status, and full body in Markdown or HTML. Paginates the full archive automatically.

Daniel Dimitrov

Substack Post Scraper

seemuapps/substack-post-scraper

Scrape all posts from any Substack publication. Title, publish date, likes, comments, restacks, word count, paywall status, and author for every post in the archive.

Andrew

Substack Scraper – Newsletter Posts, Engagement & Monitoring

bitofacoder/substack-scraper

Scrape any Substack newsletter's full post archive with engagement metadata (likes, comments, paywall status, word count, authors), fetch single posts, and monitor newsletters incrementally — via Substack's public JSON API. No login.

Bobby

Substack Scraper

sheshinmcfly/substack-scraper

Scrape posts from any Substack publication (subdomain or custom domain). Get title, subtitle, description, word count, reactions, restacks, comment counts, tags, authors, and publication metadata.

Sheshinmcfly

Substack Posts Scraper — Newsletter Archive & Stats

darknezz/substack-posts-scraper

Scrape any Substack publication's post archive: titles, subtitles, publish dates, likes, comments, paywall status and (optionally) full post text. Works with custom domains. Perfect for newsletter research, content analysis and AI training data.

Oaida Adrian

Substack Scraper — Posts, Authors & Newsletter Data

sian.agency/substack-scraper

Substack newsletter scraper for any publication. Extract posts: title, subtitle, author, date, reactions, comments, restacks, word count, cover image — plus full article HTML in detail mode. Search by handle, subdomain or custom domain. Clean JSON/CSV, no-code, no API key needed.

SIÁN OÜ

Substack Newsletter Scraper - Posts & Archives

wetyr_corporation/substack-newsletter-scraper

Bulk extract posts from any public Substack newsletter. Title, full content, author, paywall status, reactions. For creator economy intelligence and content trends.

WETYR

Substack Publication Scraper

parseforge/substack-publication-scraper

Pull every public post from any Substack publication with title, subtitle, body preview, author, publish date, podcast URL, audience type, comment count, and reactions. Filter by post type and date range. Export to JSON, CSV, or Excel for newsletter research and competitive intelligence.