Under maintenance

Pricing

from $0.002 / actor start

Try for free

Go to Apify Store

Substack Scraper: Posts, Comments & Newsletter Leaderboards

Under maintenance

Try for free

Scrape Substack: post archives, full content, comments, author profiles, leaderboards. No login. 6 modes. Half the price of competitors.

Pricing

from $0.002 / actor start

Rating

0.0

(0)

Developer

Charlie Krug

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Substack Scraper

Scrape the public web of Substack — no login, no API key. Pull full post archives, individual posts with clean text, top-level + nested comments, author profiles, and ranked publication leaderboards — all six data types in one actor, where most tools do just one. Date filtering, audience filtering, and AI-ready clean-text output are built in.

What makes this different

Feature	This Actor	Typical competitors
Comments + replies (depth-first flat list)	✅	❌ rarely offered
Subscriber count estimates from Substack's ranking data	✅	❌
Clean plain text extracted from body HTML	✅ (automatic)	❌ raw HTML only
Date range filtering for archive	✅	❌
Post type filter (newsletter / podcast / thread)	✅	❌
Custom domain support (e.g. platformer.news)	✅	partial
All 6 data types in one actor	✅	1–2 per actor
No login / no cookies required	✅	✅

Modes

`publication` — Post archive

Pull a publication's full post list, newest-first. Paginated automatically.

Input: publication (subdomain or URL), limit, audienceFilter, postType, dateFrom, dateTo, includeBody

Output fields per post:

{
  "postId": 140602898,
  "title": "Why Platformer is leaving Substack",
  "subtitle": "Casey explains the move to Ghost",
  "url": "https://platformer.substack.com/p/why-platformer-is-leaving-substack",
  "slug": "why-platformer-is-leaving-substack",
  "postDate": "2024-01-12T18:00:00.000Z",
  "type": "newsletter",
  "audience": "everyone",
  "isPaid": false,
  "reactionCount": 312,
  "commentCount": 54,
  "restackCount": 28,
  "wordcount": 2300,
  "coverImage": "https://...",
  "description": "Casey explains why Platformer is moving to Ghost.",
  "authors": [{ "name": "Casey Newton", "handle": "platformer", "photoUrl": "https://..." }],
  "tags": ["tech", "media"],
  "publication": "platformer",
  "bodyHtml": "<p>Full HTML content...</p>",
  "bodyText": "Full content here in clean plain text..."
}

`post` — Single post

Full content of one post. Paywalled posts return only the public preview — no paywall bypass.

Input: publication, slug (slug or full post URL)

`comments` — Post comments ⭐ rarely offered

All top-level comments plus nested replies, depth-first flattened — charged at the same flat per-record rate as everything else, with no per-comment surcharge.

Input: publication, slug, limit

Output fields per comment:

{
  "id": 47125539,
  "postId": 140602898,
  "postSlug": "why-platformer-is-leaving-substack",
  "postTitle": "Why Platformer is leaving Substack",
  "parentId": null,
  "depth": 0,
  "authorName": "Gordon Strause",
  "authorHandle": "gordonstrause",
  "body": "Too bad. I think Substack's policies are the right ones...",
  "date": "2024-01-12T02:18:02.661Z",
  "reactionCount": 132,
  "childCount": 2,
  "isDeleted": false,
  "isPinned": false
}

Replies have "depth": 1, "parentId": <parent comment id>.

`author` — Publication / author profile

Author bio, publication description, custom domain, paid status. Extracted from the publication's own API — no authentication required.

Input: publication (subdomain or URL) or handle

Output:

{
  "name": "Casey Newton",
  "handle": "platformer",
  "subdomain": "platformer",
  "customDomain": "www.platformer.news",
  "bio": "Casey Newton is the founder and editor of Platformer...",
  "photoUrl": "https://...",
  "twitterHandle": "CaseyNewton",
  "publicationName": "Platformer",
  "publicationDescription": "News at the intersection of Silicon Valley and democracy.",
  "publicationLogoUrl": "https://...",
  "hasPaid": false
}

`category` — Leaderboard ⭐ with subscriber counts

Ranked publications in any of Substack's 32 categories, including real subscriber-count estimates.

Input: category (slug or id), limit

Valid category slugs: technology, business, finance, health, science, culture, sports, news, music, crypto, education, literature, fiction, philosophy, climate, travel, parenting, design, art, humor, comics, history, faith, food, film-and-tv, home-garden, international, podcast.

Output:

{
  "name": "The Pragmatic Engineer",
  "subdomain": "pragmaticengineer",
  "customDomain": null,
  "description": "The #1 technology newsletter on Substack...",
  "logo": "https://...",
  "authorName": "Gergely Orosz",
  "authorHandle": "pragmaticengineer",
  "hasPaid": true,
  "subscriberCountEstimate": "1.1M+",
  "rankingScore": 10000,
  "tier": 2,
  "type": "newsletter"
}

`search` — Keyword search

Search posts and publications by keyword.

⚠️ Note: Substack's search endpoint returns empty results for anonymous requests. Results may be sparse. For reliable discovery, use category mode instead.

Filters (publication mode)

Filter	Input field	Example
Audience	`audienceFilter`	`free`, `paid`, `all`
Date range	`dateFrom`, `dateTo`	`2024-01-01`, `2024-12-31`
Content type	`postType`	`newsletter`, `podcast`, `thread`
Full body	`includeBody`	`true` — adds `bodyHtml` + `bodyText` per post

Use cases

Newsletter research & competitive intelligence Use publication mode to pull a competitor's full archive. Analyze posting frequency, topic mix (via tags), and engagement trends (reactionCount, commentCount) over time.

Lead generation for agencies Use category mode to pull the top 200 tech newsletters sorted by Substack's own ranking, with subscriber-count estimates (1.1M+, 228K+, etc.) and contact handles. Export to CSV for outreach.

AI training data Use publication with includeBody: true + audienceFilter: free for large batches of high-quality long-form text. bodyText is already clean — no HTML stripping needed. A 1,000-post archive of a major publication costs ~$0.30.

Audience sentiment analysis Use comments mode to pull every comment + reply thread for a specific post. Analyze sentiment, top commenters, and reaction counts. The depth field lets you reconstruct the full conversation tree.

Author discovery Combine category and author modes: pull the top 50 tech newsletters, then loop over each subdomain with author mode to get bios, Twitter handles, and paid status for a complete contact list.

Pricing — $0.30 per 1,000 records

The lowest price of any all-in-one Substack scraper — one flat rate for every data type.

Event	Price
Actor start (once per run)	$0.002
Per record returned	$0.0003

What you get	Records	Cost
Quick 50-post archive	50	~$0.02
Full archive of a 500-post newsletter	500	~$0.15
1,000-post AI training dataset	1,000	~$0.30
Top 200 tech newsletters (category mode)	200	~$0.06
All comments on a popular post	500	~$0.15

How we compare

Per 1,000 records, verified live on Apify Store (June 2026):

Actor	Price / 1K	Comments	Data types
This actor	$0.30	✅ flat rate	6 (posts, content, comments, authors, leaderboards, search)
sourabhbgp/substack-scraper	$0.30	➕ extra per-comment fee	3
benthepythondev/newsletter-scraper	$1.00	❌	1
easyapi/substack-*	$2.99–4.99 each	❌	1 per actor (6 separate actors to match this one)
automation-lab/substack-scraper	higher	❌	1

Same lowest price as the nearest rival — but comments are included at the flat rate (others surcharge), and it's all six data types in one actor instead of six separate purchases.

Why we can price this low: pure JSON API — no Puppeteer, no proxy, no JS rendering. The extra request budget (for includeBody) is the Actor's cost, not yours.

Run locally

# Unit tests (95 tests, no network required)
python3 tests/test_substack.py

# Quick live test — 3 posts from Platformer
python3 -c "
from src.substack import archive
import json
rows = archive('platformer', limit=3)
print(json.dumps(rows, indent=2))
"

# Full Actor run (requires: pip install apify)
apify run

Publish to Apify

npm i -g apify-cli
apify login    # paste your API token
apify push

Full step-by-step for pay-per-event pricing → PUBLISH.md

Legal / data usage

Substack's public post data (no-login content) is covered by the same legal framework as other public-web scrapers. The 2024 Meta Platforms v. Bright Data ruling affirmed that scraping publicly accessible logged-out content is defensible. This actor:

Never bypasses paywalls — paywalled posts return only the public preview
Never requires login — all data is publicly accessible without authentication
Is polite — built-in rate limiting (0.3s between archive pages, 0.5s between body fetches)

Endpoint status (verified 2026-06-30)

Endpoint	Status
Archive	✅ Stable
Single post	✅ Stable
Comments	✅ Stable
Author (via bylines)	✅ Stable
Category leaderboard	✅ Stable
Categories list	✅ Stable
Search	⚠️ Returns empty without session cookie

Substack Newsletter Scraper

dataharvest/substack-scraper

Scrape Substack newsletters, posts and comments.

Alex v

Substack Posts Scraper

fetch_cat/substack-posts-scraper

Collect public Substack newsletter posts, archives, and metadata for content research and media monitoring.

Hanna Nosova

Substack Newsletter Scraper

opalescent_quintet/substack-newsletter-scraper

Substack-Newsletter-Scraper Extract complete newsletter archives from any Substack publication with advanced filtering, multiple export formats, and engagement analytics. ## Features - Scrape entire newsletter archives from Substack - Extract full metadata: titles, content, author details etc.

Aryan

Substack Newsletter Scraper

boundary/substack-newsletter-scraper

Scrape Substack newsletter posts — titles, content, reactions, comments, tags, and author data. Supports custom domains. No login needed.

Boundary

Substack Explore Leaderboards Scraper - Low-cost💲🔥📰📈

delectable_incubator/substack-explore-leaderboards-scraper-low-cost

🔍 Scrape Substack leaderboards and extract top newsletters, publication titles, author names, handles, bios, custom domains, categories, and URLs. Ideal for lead generation, influencer discovery, newsletter research, competitive analysis, and AI automation workflows 🚀📊

Prime Scrape

Substack Scraper

noximilian/substack-scraper

Scrape Substack newsletters — fetch post archives, individual posts, comments, recommendations, and publication metadata. Search Substack for publications and content. No auth required for public content.

Noximilian

Substack Newsletter Scraper - Posts & Archives

wetyr_corporation/substack-newsletter-scraper

Bulk extract posts from any public Substack newsletter. Title, full content, author, paywall status, reactions. For creator economy intelligence and content trends.

WETYR

Substack Scraper — Posts, Comments & Newsletter Intelligence

cryptosignals/substack-scraper

Turn the newsletter economy into a dataset. Scrape any Substack publication — posts with full text, likes, comments, paywall status, reader comments, and metadata — or search by keyword across all of Substack. No login. For competitive intel, sponsor research, and media monitoring. $0.005/result.

Web Data Labs

Substack Newsletter Scraper

prince.sh/substack-scraper

Scrape Substack newsletter archives. Get post titles, body text, authors, and publish dates for any Substack publication. Perfect for content aggregation, news monitoring, writer research, and AI training datasets.

Prince Jain

Substack Leaderboard Scraper 📊

easyapi/substack-leaderboard-scraper

Scrape detailed publication data from Substack leaderboards. Get comprehensive insights about top newsletters including subscriber counts, pricing, author details, and more. Perfect for newsletter research and market analysis.