Substack Scraper — Newsletters, Posts, Authors & Subscribers
Pricing
from $2.00 / 1,000 results
Substack Scraper — Newsletters, Posts, Authors & Subscribers
Discover Substack newsletters by category & leaderboard rank, then pull every post, author and publication. 30+ categories or direct subdomain. Per post: title, audience (free/paid), reactions, restacks. Per pub: subdomain, custom domain, author, subscription tiers. Public Substack API — no auth.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer
Logiover
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
11 hours ago
Last modified
Categories
Share
Substack Scraper — Newsletter Discovery, Posts, Authors & Subscriber Data
Discover Substack newsletters by category and leaderboard rank, then pull every post, every author and full publication metadata in a single normalized dataset. Search across 30+ Substack categories — Technology, Business, Finance, Crypto, News, Culture, Health, Politics, Science, Design and more — or jump straight to a known subdomain or custom domain.
Built on Substack's public JSON API — no authentication, no proxy, no scraping fight. Per post: title, subtitle, audience (free / paid / founding), post date, reaction count, restacks, cover image, podcast info and canonical URL. Per publication: name, subdomain, custom domain, author display name, hero description, language, subscription benefits and founded date.
Perfect for content intelligence, RAG / LLM training data, newsletter sponsorship outreach, competitive monitoring, author / creator lead generation, and podcast network discovery.
🚀 What does this Substack scraper do?
Two complementary modes — combine them or use one in isolation:
| Mode | When to use | What it returns |
|---|---|---|
| Category Discovery | Find every tech / finance / crypto newsletter on Substack ranked by leaderboard, top paid, or all-publications | Top-N publications per category + every post per publication |
| Direct Newsletter URLs | You already know the newsletter — pass its subdomain, custom domain, or just its name | All posts of that publication, with metadata |
Optional: also push one publication-level record per discovered newsletter (denormalized list of every newsletter alongside posts) for downstream relational joins.
💡 Use cases
- Content intelligence platforms — track every AI / startup / crypto newsletter on Substack with daily refresh
- Newsletter sponsorship outreach — pull author names, custom domains, free-vs-paid mix, subscription tier descriptions for cold outreach lists
- RAG / LLM training data — every public Substack post is high-quality long-form content with author + date + topic metadata, ready for vector indexing
- Substack-to-Beehiiv migration tools — bulk-export an author's archive
- Competitive monitoring — track when a competitor's newsletter publishes, what audience tier it targets, and how reader reactions trend
- Investor / VC research — every fintech / crypto / SaaS founder publishes here now; find them at scale
- Podcast network discovery — Substack hosts thousands of independent podcasts; the actor exposes
podcastFeedUrlfor each publication - Newsletter ranking dashboards — leaderboard mode returns subscriber-weighted rankings per category
⚙️ Input configuration
| Field | Type | Default | Description |
|---|---|---|---|
categorySlugs | string[] | [] | Substack category slugs (see list below). Each is paginated and enumerated to maxPublicationsPerCategory. |
categoryRanking | string | "leaderboard" | leaderboard (top by subs + engagement), paid (top paid), all (default Substack sort). |
newsletterUrls | string[] | [] | Specific newsletters to scrape. Accepts subdomain, subdomain.substack.com, or custom domain. |
maxPublicationsPerCategory | integer | 25 | Hard cap per category. Substack returns 25 per page, the actor auto-paginates. |
maxPostsPerPublication | integer | 50 | Hard cap per publication via the archive API. 0 = skip posts (publication-only mode). |
audienceFilter | string | "all" | all / free (only posts open to everyone) / paid (only paid-subscriber posts). |
minPostDate | string | null | Drop posts published before this YYYY-MM-DD. |
maxPostDate | string | null | Drop posts published after this YYYY-MM-DD. |
keywordFilter | string[] | [] | Client-side title/subtitle substring filter (case-insensitive). E.g. ["ai","gpt"]. |
includePublicationMetadata | boolean | true | Enrich each post with parent-publication fields. |
alsoPushPublicationRecord | boolean | false | Push one extra record per publication (recordType: "publication") alongside post records. |
language | string | "" | ISO 639-1 publication-language filter (e.g. en). |
Supported category slugs
technology · business · finance · crypto · news · us-politics · world-politics · health-politics · culture · science · health · design · travel · parenting · literature · fiction · philosophy · history · climate · art · music · sports · food · film-and-tv · comics · humor · fashionandbeauty · education · faith · international · home-garden · podcast
The full live category list with subcategories is fetched from Substack at runtime, so additions automatically flow through.
📦 Output fields
Records have recordType: "post" or recordType: "publication".
Per-post fields
| Field | Description | Example |
|---|---|---|
recordType | "post" | "post" |
postId | Substack post ID | 195672025 |
postSlug | URL slug | "why-saas-freemium-playbooks-dont" |
postTitle | Post title | "Why SaaS freemium playbooks don't work in AI..." |
postSubtitle | Subtitle | "How to build an AI monetization strategy..." |
postType | newsletter, podcast, thread, etc. | "newsletter" |
audience | everyone / only_paid / founding | "only_paid" |
postDate | Publication timestamp (ISO) | "2026-05-05T13:03:32.007Z" |
canonicalUrl | Full post URL | "https://www.lennysnewsletter.com/p/..." |
coverImage | Hero image URL | "https://substackcdn.com/.../cover.png" |
reactions | Total reactions | 313 |
restacks | Number of restacks | 10 |
wordCount | Word count (if returned) | 2500 |
podcastDuration | Episode duration (seconds) | 1820 |
podcastUrl | Podcast audio URL | "https://.../episode.mp3" |
videoUploadId | Video upload ID (if any) | "vid_abc" |
sectionName | Substack section | "Premium" |
Per-publication fields (added when includePublicationMetadata: true)
| Field | Description | Example |
|---|---|---|
publicationId | Substack publication ID | 10845 |
publicationName | Newsletter name | "Lenny's Newsletter" |
subdomain | *.substack.com subdomain | "lennysnewsletter" |
customDomain | Custom domain (if any) | "www.lennysnewsletter.com" |
publicationUrl | Primary base URL | "https://www.lennysnewsletter.com" |
logoUrl | Logo image | "https://substackcdn.com/.../logo.png" |
coverPhotoUrl | Cover photo | "https://..." |
publicationDescription | Hero/about text | "The #1 product / growth newsletter..." |
language | ISO 639-1 | "en" |
authorId | Author user ID | 22329494 |
authorName | Display name / copyright owner | "Lenny Rachitsky" |
category | Category slug used to discover | "technology" |
publicationCreatedAt | Founded timestamp | "2020-09-08T..." |
freeSubscriptionBenefits | Bullets shown to free subscribers | ["Weekly free post"] |
paidSubscriptionBenefits | Bullets for paid tier | ["Full archive", "AMAs", ...] |
foundingSubscriptionBenefits | Top-tier benefits | ["Direct access to Lenny"] |
communityEnabled | Comments / Notes enabled | true |
podcastFeedUrl | Substack RSS feed (audio) | "https://lennysnewsletter.substack.com/feed/podcast" |
🧪 Example inputs
1. Top 25 tech newsletters and their latest 50 posts
{"categorySlugs": ["technology"],"categoryRanking": "leaderboard","maxPublicationsPerCategory": 25,"maxPostsPerPublication": 50}
2. Free posts from top AI/crypto newsletters in the last month
{"categorySlugs": ["technology", "crypto", "finance"],"categoryRanking": "leaderboard","maxPublicationsPerCategory": 50,"maxPostsPerPublication": 20,"audienceFilter": "free","minPostDate": "2026-04-15","keywordFilter": ["ai", "gpt", "llm", "crypto"]}
3. One specific newsletter's full archive
{"newsletterUrls": ["lennysnewsletter"],"maxPostsPerPublication": 1000}
4. Build a newsletter directory (publication records only)
{"categorySlugs": ["technology", "business", "finance", "crypto", "news", "science"],"categoryRanking": "leaderboard","maxPublicationsPerCategory": 100,"maxPostsPerPublication": 0,"alsoPushPublicationRecord": true}
5. Mix categories + direct URLs in one run
{"categorySlugs": ["technology"],"categoryRanking": "paid","maxPublicationsPerCategory": 30,"newsletterUrls": ["www.semianalysis.com", "stratechery.substack.com"],"maxPostsPerPublication": 30,"audienceFilter": "all"}
6. English-only top-paid tech newsletters with podcast feeds
{"categorySlugs": ["technology"],"categoryRanking": "paid","language": "en","alsoPushPublicationRecord": true,"maxPublicationsPerCategory": 100,"maxPostsPerPublication": 5}
🧠 How it works
- Categories →
GET https://substack.com/api/v1/categoriesreturns every active category and subcategory with numeric IDs. - Discovery →
GET https://substack.com/api/v1/category/public/{id}/{leaderboard|paid|all}?page=Npaginates 25 publications per page. - Posts →
GET https://{subdomain}.substack.com/api/v1/archive?sort=new&offset=O&limit=12(or custom-domain equivalent) paginates 12 posts per page in reverse-chronological order. - Direct URLs → if you pass a custom domain, the actor probes the
/api/v1/archiveendpoint on multiple host candidates to find a working base URL. - Deduplication → publications are keyed by
publicationId; cross-category enumeration never double-fetches the same newsletter.
No authentication. No proxy. Substack publishes all of this data on its public web.
🛑 Limits & notes
- Word count, full post body, and subscriber counts are not exposed in the archive endpoint. For the full post HTML/body, the per-post detail endpoint
https://{base}/api/v1/posts/{slug}can be added — open an issue if you need it. - Subscriber counts are private — Substack only shows them to publication owners. The actor returns proxies (reactions, restacks, leaderboard rank).
- Paid post bodies are paywalled — you only get the public preview unless the actor is run with a paid-subscriber cookie (out of scope here).
- Rate limits — Substack does not publish explicit limits but throttles aggressive callers. The actor uses exponential backoff and realistic browser headers; in practice 25k+ posts per run runs cleanly.
- Non-Substack platforms (Beehiiv, Ghost, ConvertKit, Stratechery's custom platform) will be skipped with a warning when passed in
newsletterUrls.
💰 Pricing
Monetized via pay-per-event on Apify — pay per post or publication record saved. Substack's public API is free.
❓ FAQ
Can I get subscriber counts? No — Substack treats subscriber counts as private. The leaderboard rank, reactions, and restack count are the public proxy metrics.
Can I get the full post body? Free posts only. The actor currently returns metadata + canonical URL; for the rendered HTML body, request the per-post detail endpoint as a feature addition.
Does this work with Beehiiv / Ghost? No — Substack-only. Beehiiv has its own public API (separate actor).
How is this different from existing Substack actors? Most existing actors require a list of URLs upfront. This one does discovery first (by category + leaderboard rank), which is the hard part for outreach / intelligence use cases.
Can I export to CSV / Excel? Yes — every Apify dataset can be exported in CSV, Excel, XML, JSONL or RSS straight from the run page.
Will Substack block this? The endpoints used are the same ones the Substack website itself calls. No authentication is required and no rate limit has been hit at typical use. Use respectfully.
🔗 Related actors
logiover/apple-podcasts-episode-scraper— feed thepodcastFeedUrlfrom each publication into the podcast scraper for full episode listslogiover/website-contact-scraper— enrich each publication'scustomDomainwith author contact emails for sponsorship outreachlogiover/google-news-scraper— track press mentions of the newsletters you scrapedlogiover/sitemap-to-url-crawler— crawl the custom domain of each publication for landing pages and partnerships
🆘 Support
Need a specific Substack-related feature (full post body, comments, subscriber recommendations graph)? Open an issue on the actor's Apify page.