Substack Scraper: Posts, Comments & Newsletter Leaderboards
Pricing
from $0.002 / actor start
Substack Scraper: Posts, Comments & Newsletter Leaderboards
Scrape Substack: post archives, full content, comments, author profiles, leaderboards. No login. 6 modes. Half the price of competitors.
Pricing
from $0.002 / actor start
Rating
0.0
(0)
Developer
Charlie Krug
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Substack Scraper
Scrape the public web of Substack — no login, no API key. Pull full post archives, individual posts with clean text, top-level + nested comments, author profiles, and ranked publication leaderboards — all six data types in one actor, where most tools do just one. Date filtering, audience filtering, and AI-ready clean-text output are built in.
What makes this different
| Feature | This Actor | Typical competitors |
|---|---|---|
| Comments + replies (depth-first flat list) | ✅ | ❌ rarely offered |
| Subscriber count estimates from Substack's ranking data | ✅ | ❌ |
| Clean plain text extracted from body HTML | ✅ (automatic) | ❌ raw HTML only |
| Date range filtering for archive | ✅ | ❌ |
| Post type filter (newsletter / podcast / thread) | ✅ | ❌ |
| Custom domain support (e.g. platformer.news) | ✅ | partial |
| All 6 data types in one actor | ✅ | 1–2 per actor |
| No login / no cookies required | ✅ | ✅ |
Modes
publication — Post archive
Pull a publication's full post list, newest-first. Paginated automatically.
Input: publication (subdomain or URL), limit, audienceFilter, postType, dateFrom, dateTo, includeBody
Output fields per post:
{"postId": 140602898,"title": "Why Platformer is leaving Substack","subtitle": "Casey explains the move to Ghost","url": "https://platformer.substack.com/p/why-platformer-is-leaving-substack","slug": "why-platformer-is-leaving-substack","postDate": "2024-01-12T18:00:00.000Z","type": "newsletter","audience": "everyone","isPaid": false,"reactionCount": 312,"commentCount": 54,"restackCount": 28,"wordcount": 2300,"coverImage": "https://...","description": "Casey explains why Platformer is moving to Ghost.","authors": [{ "name": "Casey Newton", "handle": "platformer", "photoUrl": "https://..." }],"tags": ["tech", "media"],"publication": "platformer","bodyHtml": "<p>Full HTML content...</p>","bodyText": "Full content here in clean plain text..."}
post — Single post
Full content of one post. Paywalled posts return only the public preview — no paywall bypass.
Input: publication, slug (slug or full post URL)
comments — Post comments ⭐ rarely offered
All top-level comments plus nested replies, depth-first flattened — charged at the same flat per-record rate as everything else, with no per-comment surcharge.
Input: publication, slug, limit
Output fields per comment:
{"id": 47125539,"postId": 140602898,"postSlug": "why-platformer-is-leaving-substack","postTitle": "Why Platformer is leaving Substack","parentId": null,"depth": 0,"authorName": "Gordon Strause","authorHandle": "gordonstrause","body": "Too bad. I think Substack's policies are the right ones...","date": "2024-01-12T02:18:02.661Z","reactionCount": 132,"childCount": 2,"isDeleted": false,"isPinned": false}
Replies have "depth": 1, "parentId": <parent comment id>.
author — Publication / author profile
Author bio, publication description, custom domain, paid status. Extracted from the publication's own API — no authentication required.
Input: publication (subdomain or URL) or handle
Output:
{"name": "Casey Newton","handle": "platformer","subdomain": "platformer","customDomain": "www.platformer.news","bio": "Casey Newton is the founder and editor of Platformer...","photoUrl": "https://...","twitterHandle": "CaseyNewton","publicationName": "Platformer","publicationDescription": "News at the intersection of Silicon Valley and democracy.","publicationLogoUrl": "https://...","hasPaid": false}
category — Leaderboard ⭐ with subscriber counts
Ranked publications in any of Substack's 32 categories, including real subscriber-count estimates.
Input: category (slug or id), limit
Valid category slugs: technology, business, finance, health, science, culture,
sports, news, music, crypto, education, literature, fiction, philosophy,
climate, travel, parenting, design, art, humor, comics, history, faith,
food, film-and-tv, home-garden, international, podcast.
Output:
{"name": "The Pragmatic Engineer","subdomain": "pragmaticengineer","customDomain": null,"description": "The #1 technology newsletter on Substack...","logo": "https://...","authorName": "Gergely Orosz","authorHandle": "pragmaticengineer","hasPaid": true,"subscriberCountEstimate": "1.1M+","rankingScore": 10000,"tier": 2,"type": "newsletter"}
search — Keyword search
Search posts and publications by keyword.
⚠️ Note: Substack's search endpoint returns empty results for anonymous requests. Results may be sparse. For reliable discovery, use
categorymode instead.
Filters (publication mode)
| Filter | Input field | Example |
|---|---|---|
| Audience | audienceFilter | free, paid, all |
| Date range | dateFrom, dateTo | 2024-01-01, 2024-12-31 |
| Content type | postType | newsletter, podcast, thread |
| Full body | includeBody | true — adds bodyHtml + bodyText per post |
Use cases
Newsletter research & competitive intelligence
Use publication mode to pull a competitor's full archive. Analyze posting frequency, topic
mix (via tags), and engagement trends (reactionCount, commentCount) over time.
Lead generation for agencies
Use category mode to pull the top 200 tech newsletters sorted by Substack's own ranking, with
subscriber-count estimates (1.1M+, 228K+, etc.) and contact handles. Export to CSV for outreach.
AI training data
Use publication with includeBody: true + audienceFilter: free for large batches of
high-quality long-form text. bodyText is already clean — no HTML stripping needed. A 1,000-post
archive of a major publication costs ~$0.30.
Audience sentiment analysis
Use comments mode to pull every comment + reply thread for a specific post. Analyze sentiment,
top commenters, and reaction counts. The depth field lets you reconstruct the full conversation
tree.
Author discovery
Combine category and author modes: pull the top 50 tech newsletters, then loop over each
subdomain with author mode to get bios, Twitter handles, and paid status for a complete
contact list.
Pricing — $0.30 per 1,000 records
The lowest price of any all-in-one Substack scraper — one flat rate for every data type.
| Event | Price |
|---|---|
| Actor start (once per run) | $0.002 |
| Per record returned | $0.0003 |
| What you get | Records | Cost |
|---|---|---|
| Quick 50-post archive | 50 | ~$0.02 |
| Full archive of a 500-post newsletter | 500 | ~$0.15 |
| 1,000-post AI training dataset | 1,000 | ~$0.30 |
| Top 200 tech newsletters (category mode) | 200 | ~$0.06 |
| All comments on a popular post | 500 | ~$0.15 |
How we compare
Per 1,000 records, verified live on Apify Store (June 2026):
| Actor | Price / 1K | Comments | Data types |
|---|---|---|---|
| This actor | $0.30 | ✅ flat rate | 6 (posts, content, comments, authors, leaderboards, search) |
| sourabhbgp/substack-scraper | $0.30 | ➕ extra per-comment fee | 3 |
| benthepythondev/newsletter-scraper | $1.00 | ❌ | 1 |
| easyapi/substack-* | $2.99–4.99 each | ❌ | 1 per actor (6 separate actors to match this one) |
| automation-lab/substack-scraper | higher | ❌ | 1 |
Same lowest price as the nearest rival — but comments are included at the flat rate (others surcharge), and it's all six data types in one actor instead of six separate purchases.
Why we can price this low: pure JSON API — no Puppeteer, no proxy, no JS rendering.
The extra request budget (for includeBody) is the Actor's cost, not yours.
Run locally
# Unit tests (95 tests, no network required)python3 tests/test_substack.py# Quick live test — 3 posts from Platformerpython3 -c "from src.substack import archiveimport jsonrows = archive('platformer', limit=3)print(json.dumps(rows, indent=2))"# Full Actor run (requires: pip install apify)apify run
Publish to Apify
npm i -g apify-cliapify login # paste your API tokenapify push
Full step-by-step for pay-per-event pricing → PUBLISH.md
Legal / data usage
Substack's public post data (no-login content) is covered by the same legal framework as other public-web scrapers. The 2024 Meta Platforms v. Bright Data ruling affirmed that scraping publicly accessible logged-out content is defensible. This actor:
- Never bypasses paywalls — paywalled posts return only the public preview
- Never requires login — all data is publicly accessible without authentication
- Is polite — built-in rate limiting (0.3s between archive pages, 0.5s between body fetches)
Endpoint status (verified 2026-06-30)
| Endpoint | Status |
|---|---|
| Archive | ✅ Stable |
| Single post | ✅ Stable |
| Comments | ✅ Stable |
| Author (via bylines) | ✅ Stable |
| Category leaderboard | ✅ Stable |
| Categories list | ✅ Stable |
| Search | ⚠️ Returns empty without session cookie |