Substack Scraper — Newsletters, Posts & Creator Leads
Pricing
from $4.00 / 1,000 publication scrapeds
Substack Scraper — Newsletters, Posts & Creator Leads
Scrape Substack: search newsletters by keyword, browse category leaderboards, pull full publication profiles (subscribers, paid pricing, podcast), posts, authors and the recommendation network. Turn creators into leads with contact emails. Monitoring mode. No API key, no browser.
Pricing
from $4.00 / 1,000 publication scrapeds
Rating
0.0
(0)
Developer
Scrape Sage
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Substack Scraper — Newsletters, Posts & Creator Leads (Subscribers, Pricing, Emails)
Extract complete Substack data — search newsletters by keyword, browse category leaderboards, and pull the fields other scrapers miss: free-subscriber counts, paid-subscriber tiers, real paid pricing (monthly / yearly / founding), podcast details, the recommendation network, and full author profiles. Optionally turn every creator into a ready-to-contact lead by crawling their own website for contact emails, phone, and socials.
No login, no cookies, no browser — fast first-party JSON extraction with 99%+ reliability.
Why this Substack scraper?
Most Substack scrapers return a thin slice — a title, a date, maybe a subscriber number. This actor reads Substack's own public API and ships the richest dataset in the category, across newsletters, posts and authors in one run:
| Data | Typical scrapers | This actor |
|---|---|---|
| Search by keyword + category leaderboards | partial | ✅ both |
| Free subscriber count | partial | ✅ |
| Paid-subscriber tier (e.g. "Thousands of paid subscribers") | ❌ | ✅ |
| Real paid pricing — monthly / yearly / founding + currency | ❌ | ✅ |
| Accepts sponsorships (ad-sales signal) | ❌ | ✅ |
| Podcast title / description / flags | ❌ | ✅ |
| Recommendation network (who recommends whom) | ❌ | ✅ opt-in |
| Posts — reactions, restacks, comments, word count | partial | ✅ opt-in |
| Full post content (HTML + plain text) | ❌ | ✅ opt-in |
| Author profiles — followers, bio, external links, all publications | ❌ | ✅ opt-in |
| Creator contact emails (from their website) | ❌ | ✅ opt-in |
| Lead score (0–100) per newsletter | ❌ | ✅ |
| No start fee | ❌ many charge per run | ✅ pay per result only |
Use cases
- Creator & newsletter lead generation — Substack creators are active buyers and sellers: they want tools, sponsors, cross-promotion, and ghostwriters. Score them by audience (
freeSubscriberCount,paidSubscriberTier) and reach them directly (supportEmail,contactEmails). - Sponsorship & ad-sales prospecting — find paid newsletters that
acceptsSponsorships, ranked by subscriber tier and niche, with contact data attached. - Market & competitor research — track category leaderboards, paid pricing, posting cadence, and engagement (reactions, restacks, comments) across any topic.
- Content & trend analysis — pull posts with full content for summarization, RAG, sentiment, and topic modeling.
- Influencer / partnership discovery — map the recommendation network to find who the top newsletters endorse.
How to use
- Sign up for Apify — the free plan is enough to try this actor.
- Open the Substack Scraper, enter search queries and/or categories (or paste Substack URLs), and click Start.
- Watch results stream into the dataset table.
- Export as JSON, CSV, Excel, XML, or RSS — or pull results programmatically via the Apify API.
Input
{"searchQueries": ["artificial intelligence"],"categories": ["Technology", "Business"],"maxPublications": 200,"includePosts": true,"maxPostsPerPublication": 20,"includeRecommendations": true,"includeAuthorProfiles": true,"enrichContactEmails": true,"onlyPaidPublications": false,"minFreeSubscribers": 1000}
- searchQueries — keywords to search publications (each returns full newsletter profiles).
- categories — category leaderboards by name (
Technology,Business,Finance,Culture,U.S. Politics,Food & Drink,Sports, …) or numeric id. - startUrls — direct publication URLs (
https://newsletter.substack.comor a custom domain), post URLs (.../p/the-slug), or author profiles (https://substack.com/@handle). - maxPublications (default 100) — cap on unique publications from search + categories.
- includePosts / maxPostsPerPublication / includePostContent — add recent posts, and optionally their full HTML + plain text.
- includeRecommendations (default false) — add each newsletter's recommendation network as a
recommendsarray. - includeAuthorProfiles (default false) — emit one author record per unique creator (followers, bio, links, all publications).
- enrichContactEmails (default false) — crawl the publication's own website (home + about/contact, max 3 pages) for emails, phone, and extra socials. Substack never exposes emails — this is the only way to get them.
- onlyPaidPublications / minFreeSubscribers — filters.
- monitorMode (default false) — emit only publications/posts not seen in previous runs (see below).
Output
One record per newsletter (type: "publication"), plus optional post records (type: "post") and author records (type: "author"):
{"type": "publication","id": 89120,"name": "Astral Codex Ten","subdomain": "astralcodexten","url": "https://www.astralcodexten.com","customDomain": "www.astralcodexten.com","publicationType": "newsletter","tagline": "P(A|B) = [P(A)*P(B|A)]/P(B), all the rest is commentary","authorName": "Scott Alexander","authorHandle": "astralcodexten","authorBio": "Psychiatrist, blogger…","freeSubscriberCount": 91000,"paidSubscriberTier": "Thousands of paid subscribers","bestsellerTier": 1000,"isPaid": true,"currency": "USD","monthlyPrice": 10,"yearlyPrice": 100,"foundingPrice": 300,"acceptsSponsorships": false,"hasPodcast": true,"supportEmail": "astralcodexten@substack.com","website": "https://www.astralcodexten.com","contactEmails": ["scott@slatestarcodex.com"],"contactSocials": { "twitter": "https://twitter.com/slatestarcodex" },"recommends": [{ "name": "Slow Boring", "subdomain": "slowboring", "url": "https://www.slowboring.com" }],"leadScore": 86,"category": "Technology","searchQuery": "artificial intelligence","scrapedAt": "2026-06-14T12:00:00.000Z"}
Monitoring mode
Turn on monitorMode to make the actor remember every publication and post it has already returned (in a named key-value store) and emit only new ones on the next run. Combine it with Apify Schedules to:
- watch a category or keyword for newly launched newsletters,
- alert on new posts from a set of newsletters you track,
- keep a CRM topped up with fresh creator leads.
Monitoring mode is independent of the scheduler: Schedules decide when a run starts; monitoring decides what counts as new. Use a distinct monitorStoreName per tracked target to keep histories separate.
Automate & schedule
Run this actor on autopilot and pull results into your own stack:
- Apify API — start runs, fetch datasets, and manage schedules over REST.
- apify-client for JavaScript and apify-client for Python — official SDKs.
- Schedules — run it hourly/daily/weekly to monitor new newsletters, posts, or leads.
- Webhooks — trigger downstream actions (CRM import, Slack alert, email sequence) the moment a run finishes.
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'MY_APIFY_TOKEN' });const run = await client.actor('scrapesage/substack-scraper').call({searchQueries: ['fintech'],categories: ['Finance'],maxPublications: 200,enrichContactEmails: true,onlyPaidPublications: true,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(`Got ${items.length} newsletters & creator leads`);
Integrate with any app
Connect the dataset to 5,000+ apps — no code required:
- Make — multi-step automation scenarios.
- Zapier — push new creator leads straight into your CRM.
- Slack — get notified when a monitored search finds new newsletters.
- Google Drive / Sheets — auto-export every run to a spreadsheet.
- Airbyte — pipe results into your data warehouse.
- GitHub — trigger runs from commits or releases.
Use with AI assistants (MCP)
The output is clean, LLM-ready JSON. You can call this actor from Claude, ChatGPT, or any agent framework through the Apify MCP server — ask your assistant to "find the top AI newsletters on Substack and list their contact emails" and let it run this scraper for you.
More scrapers from scrapesage
Build a complete creator & event lead-gen stack:
- Eventbrite Scraper — events + organizer leads (prices, emails, socials).
- Sched Conference Scraper — sessions, speakers & sponsors from Sched event sites.
- Whova Event Scraper — attendees, agendas, and sponsors from Whova event apps.
- Swapcard Exhibitor Scraper — exhibitor lists and booth data from Swapcard trade shows.
- Facebook Ad Library Scraper — competitor ad intelligence (Meta + Instagram).
- Google Ads Transparency Scraper — who's advertising what on Google.
- LinkedIn Jobs Scraper — job postings as hiring-intent signals.
- Bark Listing Scraper — service-provider leads from Bark.
- Airbnb Scraper — listings, prices, and availability.
Tips
- Exhaust a niche: combine
searchQueries(keywords) withcategories(leaderboards) to cover both long-tail and top newsletters; raisemaxPublications. - Best leads: set
onlyPaidPublications: true+minFreeSubscribers+enrichContactEmails: trueto get monetizing creators with real contact data and a highleadScore. - Cost control: posts, recommendations, author profiles and email enrichment are all opt-in, so you only pay for what you turn on; email enrichment only runs for publications that actually have a website.
- Monitoring: combine
monitorModewith Schedules to track only new newsletters/posts.
FAQ
How do I scrape the top newsletters in a topic? Put the category name in categories (e.g. Technology, Finance) to pull its leaderboard, and/or add keywords to searchQueries.
Where do the emails come from? Never from Substack (they don't publish creator emails). With enrichContactEmails on, the actor visits the newsletter's own public website and extracts publicly listed contact emails — the same thing a human visitor would see. Many newsletters also expose a supportEmail directly.
Does it expose exact paid-subscriber counts? Substack hides exact paid counts, but publishes a tier band (e.g. "Hundreds/Thousands of paid subscribers") which this actor returns as paidSubscriberTier, plus the exact freeSubscriberCount for most newsletters.
Can I export to Google Sheets, CSV, or Excel? Yes — one click in the dataset view, or automatically on every run via the Google Drive integration.
Is scraping Substack legal? This actor collects publicly available data only. You are responsible for using the data in compliance with applicable laws (GDPR/CCPA for personal data) and Substack's terms.
A field is null — why? Some newsletters genuinely don't publish a price (free-only), a website, or a podcast. Fields are null only when the data doesn't exist, not because the scraper skipped them.
Need help?
Open an issue on the actor's Issues tab, or visit the Apify help center. Feature requests are welcome — this actor is actively maintained.