Pricing

from $3.00 / 1,000 posts

Try for free

Go to Apify Store

Substack Scraper: Posts, Comments & Authors

Try for free

Scrape any Substack publication: post archives, article text, comments, author profiles and subscriber signals. Search across newsletters and export structured data for research, monitoring and AI datasets. No browser. Output to CSV, JSON or Excel.

Pricing

from $3.00 / 1,000 posts

Rating

5.0

(2)

Developer

Doggo

Actor stats

Bookmarked

Total users

Monthly active users

22 days ago

Last modified

Substack Scraper

Scrape any Substack newsletter, post, author, or comment — fast, cheap, and at scale.

This Apify actor extracts structured data from Substack publications via their public JSON API. No browser, no JavaScript rendering, no login required. Built for newsletter research, content monitoring, author discovery, competitive intelligence, and LLM training datasets.

What you can scrape

Substack posts — title, subtitle, full HTML and plain-text body, word count, publish date, tags, cover image, paywall status, reactions, comment count, restacks
Substack publications — name, subdomain, custom domain, description, logo, category, language, subscriber count (when public), founding plan
Substack authors — profile, handle, bio, photo, the publications they write for, the publications they subscribe to
Substack comments — full nested comment threads, author handles, publish dates, reactions, reply depth

Works with any Substack URL: https://*.substack.com, custom domains (https://stratechery.com), individual post URLs, https://substack.com/@handle author profiles, and https://open.substack.com/pub/... share links.

Why use this Substack scraper

Pay only for data, not for browser time — no Playwright, no rendering overhead, no per-minute compute billing. You pay per result, and failed requests are never charged.
Full archives, not just the front page — paginates through the entire publication archive until the very first post.
Clean, typed output — one dataset with a type field (post / publication / author / comment) and per-type table views, so you can export straight to BI tools, CSV, JSON, Excel, or Google Sheets.
No duplicates, no surprises — every post is delivered exactly once, limits are enforced even across platform restarts, and proxy rotation is handled for you.

Common use cases

Newsletter research — download the full archive of a competitor's Substack for content analysis, topic clustering, or SEO research
Content monitoring — schedule a daily run with maxPostsPerPublication: 5 to capture new posts from a tracked list of newsletters and pipe to Slack or email
Author discovery and lead generation — crawl author profiles to map who writes for which publications, then export handles for outreach
LLM training data — bulk-extract long-form Substack content (with word counts and metadata) for fine-tuning datasets
Competitive intelligence — track subscriber counts, post frequency, paywall strategy, and engagement metrics (reactions, comments, restacks) across a competitor set
Academic and journalism research — gather statements, essays, and commentary from Substack writers with citable timestamps
Archiving and backup — export your own Substack publication before a migration

Input

Field	Type	Default	Description
`startUrls`	array of URLs	—	Substack publication, post, or author URLs. Leave empty only when using Discovery mode
`mode`	`posts` / `publication`	`posts`	What to pull for each publication URL
`maxPostsPerPublication`	integer	50	Cap per publication. `0` = entire archive. Lower = cheaper
`includeContent`	boolean	true	Fetch each post's full HTML body
`includeComments`	boolean	false	Fetch comments for each post (each comment is a separate result)
`onlyFreePosts`	boolean	false	Skip paid / subscriber-only posts in archives
`searchQuery`	string	—	Filter the publication archive by keyword
`discoveryMode`	`none` / `leaderboard` / `search`	`none`	Auto-discover many publications without providing URLs
`discoveryQuery`	string	—	Keyword for `search` discovery
`maxPublicationsToDiscover`	integer	25	Cap on discovered publications. Lower = cheaper
`maxConcurrency`	integer	5	Parallel requests

Discovery mode — scrape many publications without a list

If you don't have a list of specific newsletters, turn on Discovery mode and the actor will find publications for you:

Top publications (leaderboard) — seeds from 5 curated top Substacks and expands through each publication's recommendations until the limit is hit
Search (search) — same expansion, plus your discoveryQuery keyword filters every discovered publication's archive

Each discovered publication is then scraped using the same mode / maxPostsPerPublication settings as startUrls, so you can go from zero URLs to a full corpus in one run. Discovery is off by default — a discovery run scrapes many publications and produces a correspondingly large dataset.

{
    "discoveryMode": "search",
    "discoveryQuery": "AI",
    "maxPublicationsToDiscover": 50,
    "mode": "posts",
    "maxPostsPerPublication": 20,
    "includeContent": true
}

Example input

{
    "startUrls": [
        { "url": "https://www.thefitzwilliam.com" },
        { "url": "https://noahpinion.substack.com" },
        { "url": "https://substack.com/@mattyglesias" }
    ],
    "mode": "posts",
    "maxPostsPerPublication": 100,
    "includeContent": true,
    "includeComments": false
}

Output

All records land in the run's dataset with a type discriminator (post, publication, author, comment). The Output tab offers per-type table views (Posts, Publications, Authors, Comments); for exports, filter on the type field to split record types into separate files.

Post record

{
    "type": "post",
    "id": 123456,
    "title": "Why newsletters won",
    "slug": "why-newsletters-won",
    "url": "https://example.substack.com/p/why-newsletters-won",
    "publication": "example",
    "publicationName": "The Example",
    "publishedAt": "2026-02-01T14:00:00Z",
    "audience": "everyone",
    "isPaid": false,
    "author": "Jane Author",
    "authors": [{ "id": 99, "name": "Jane Author", "handle": "janeauthor" }],
    "bodyHtml": "<p>...</p>",
    "bodyText": "...",
    "wordcount": 1842,
    "reactionCount": 213,
    "commentCount": 42,
    "restacks": 18,
    "postTags": ["media", "business"]
}

Publication record

{
    "type": "publication",
    "id": 42,
    "name": "The Example",
    "subdomain": "example",
    "customDomain": null,
    "url": "https://example.substack.com",
    "description": "A newsletter about newsletters.",
    "categoryName": "Business",
    "totalSubscribers": 48211,
    "paidSubscribers": 1203,
    "createdAt": "2022-06-14T09:12:00Z"
}

Author record

{
    "type": "author",
    "id": 99,
    "name": "Jane Author",
    "handle": "janeauthor",
    "profileUrl": "https://substack.com/@janeauthor",
    "bio": "Writing about media.",
    "photoUrl": "https://.../photo.jpg",
    "publications": [{ "publicationName": "The Example", "subdomain": "example", "role": "admin" }],
    "subscriptions": [{ "publicationName": "Noahpinion", "subdomain": "noahpinion" }]
}

Comment record

{
    "type": "comment",
    "id": 55512,
    "postId": 123456,
    "postSlug": "why-newsletters-won",
    "postTitle": "Why newsletters won",
    "publication": "example",
    "parentId": null,
    "depth": 0,
    "body": "Great piece.",
    "authorName": "A Reader",
    "authorHandle": "areader",
    "publishedAt": "2026-02-01T16:30:00Z",
    "reactionCount": 4
}

How to scrape Substack (step-by-step)

Click "Try for free" at the top of this page — you'll be taken to the Apify console.
Paste your target URLs into the Start URLs field. Examples:
- A publication: https://stratechery.com or https://noahpinion.substack.com
- A single post: https://example.substack.com/p/some-post
- An author profile: https://substack.com/@handle
- A share link: https://open.substack.com/pub/astralcodexten/p/some-post
Set maxPostsPerPublication — start with 10 for a test, then bump it (or set 0 for the whole archive).
Click "Start". When the run completes, open the Output tab to browse results or hit Export for CSV / JSON / Excel.

FAQ

How am I charged? Per record in your results — each post, publication, author, and comment counts as one result. Failed or retried requests are never charged, and you'll never receive the same post twice. Control your bill with maxPostsPerPublication, includeComments, and maxPublicationsToDiscover; you can also set a maximum budget for any run in the Apify Console.

Does it scrape paywalled posts? Paid posts are listed with metadata and the free preview text; full paid bodies require a subscriber login, which this scraper does not use. Enable onlyFreePosts to skip them entirely.

How many comments will a post produce? Whatever the thread holds — popular posts can carry hundreds of comments, each delivered (and charged) as its own result. Leave includeComments off unless you need them.

Will it get blocked? No setup needed on your side — proxy rotation, retries, and rate-limit handling are built in.

Can I schedule it? Yes — use Apify Schedules for daily/weekly monitoring runs, and connect the dataset to Google Sheets, webhooks, or the API for delivery.

Use with AI agents (MCP)

This Actor works as a tool for AI agents via the Apify MCP server. Add it to Claude, Cursor, or any MCP-compatible assistant with this server URL:

https://mcp.apify.com?tools=doggo/substack-scraper-posts-comments-authors

Then just ask something like "Scrape the latest 20 posts from noahpinion" — the agent fills in the input, runs the Actor, and reads the results from the dataset. Standard pay-per-result pricing applies to the calling account.

Substack Scraper

noximilian/substack-scraper

Scrape Substack newsletters — fetch post archives, individual posts, comments, recommendations, and publication metadata. Search Substack for publications and content. No auth required for public content.

Noximilian

Substack Scraper – Newsletters, Creators, Posts & Leads

haketa/substack-scraper

Scrape Substack: discover top newsletters by category with author, subscriber count and contact, or pull any publication's posts with full content, reactions, comments and podcast data. Search by category or publication URL. Export to JSON, CSV or Excel.

Haketa

Substack Scraper - Newsletters, Posts & Authors

logiover/substack-newsletter-scraper

Substack API alternative: scrape newsletters, posts & authors without login. Export Substack data to CSV/JSON. No key, no proxy.

Logiover

Substack Newsletter Scraper

dataharvest/substack-scraper

Scrape Substack newsletters, posts and comments.

Alex v

Substack Scraper - Newsletters & Author Data

antishock/substack-newsletter-intelligence-scraper

Scrape Substack newsletters and author data. Extract subscriber counts, post frequency, topics, and engagement metrics. Ideal for newsletter research, content strategy, and influencer discovery.

Ryan Zinburg

Substack Scraper

automation-lab/substack-scraper

Scrape Substack newsletters — posts, comments, publication metadata. Full archive depth with no caps. Export to JSON, CSV, Excel, or connect via API.

Stas Persiianenko

273

Substack Newsletter Scraper

prince.sh/substack-scraper

Scrape Substack newsletter archives. Get post titles, body text, authors, and publish dates for any Substack publication. Perfect for content aggregation, news monitoring, writer research, and AI training datasets.

Prince Jain

Substack Publications Scraper 📚

easyapi/substack-publications-scraper

Scrape detailed publication information from Substack based on keywords. Get comprehensive data about newsletters, authors, subscriber counts, and publication metrics in structured JSON format.

EasyApi

1.8

Substack Scraper — Posts·Comments·Subscribers w/ EMAILS

memo23/substack-scraper

Scrape Substack newsletters — full post content, threaded comments, tags, authors, podcast episodes and EXACT subscriber counts. Works with custom domains, date filters, paid-post detection, optional author contact emails. Any publication or post URL. Pure HTTP, JSON/CSV out.