Pricing

from $1.00 / 1,000 newsletter profiles

Substack Lead Gen Scraper

Generate B2B leads from Substack newsletters — author profiles, subscriber counts, Twitter handles, paid pricing, recommendations network, and sponsorship signals. URL or category mode. Flat $1 per 1,000 profiles. HTTP-only, no browser.

Pricing

from $1.00 / 1,000 newsletter profiles

Rating

0.0

(0)

Developer

Sourabh Kumar

Actor stats

Bookmarked

Total users

Monthly active users

10 days ago

Last modified

What does Substack Lead Gen Scraper do?

Substack Lead Gen Scraper turns any Substack newsletter (or an entire category leaderboard) into a structured lead record with the author's name, handle, bio, Twitter, custom domain, subscriber count, paid tier pricing, posting frequency, and optional enrichment data. Two modes:

URL Mode — paste a list of newsletter URLs and get a detailed profile for each.
Discovery Mode — pick a category (Technology, Business, Finance, Crypto, …) and pull the top newsletters with subscriber metrics.

Both modes hit Substack's public JSON endpoints directly — no Cloudflare juggling, no headless browser, no proxy required.

Why scrape Substack?

Substack hosts the highest concentration of independent media operators in B2B — solo writers, ex-journalists, and analysts with engaged paid audiences. That makes it a goldmine for:

📨 Cold outreach to newsletter authors — pitch a tool, sponsorship, or interview using the author's name, handle, and Twitter.
💰 Sponsorship prospecting — combine subscriber count + paid tier price to estimate audience value before you reach out.
📈 Market research — map the newsletter landscape in your category and benchmark against competitors.
🏢 Competitive intelligence — track who's paid vs. free, who's growing (bestseller tier), who recommends whom.
🎯 Influencer & thought-leader discovery — sort by subscriber count and bestseller tier to surface the loudest voices in any niche.

Apify platform extras: scheduled runs, REST/JS/Python API, integrations with Make/Zapier/n8n, dataset export to JSON/CSV/Excel, and pay-per-result billing.

What data can Substack Lead Gen Scraper extract?

Field	Type	Description
`name`	string	Newsletter name (e.g., The Pragmatic Engineer)
`authorName`	string	Author's display name
`authorHandle`	string	Substack handle (`@pragmaticengineer`)
`authorBio`	string	Author bio / about text
`twitterHandle`	string	Twitter/X handle
`websiteUrl`	string	Author's external website
`subscriberCount`	string	Free subscriber count (e.g., `241,000`)
`subscriberCountEstimate`	string	Order-of-magnitude estimate (`241K+`)
`bestsellerTier`	number	Substack bestseller tier (1 = top tier)
`hasPaidSubscription`	boolean	Whether a paid tier exists
`monthlyPriceCents` / `yearlyPriceCents`	number	Paid tier pricing
`estimatedPostsPerMonth`	number	Posting frequency from the last ~5 posts
`lastPostDate`	string	Most recent post timestamp
`category`	string	Substack category
`customDomain`	string	Custom domain (e.g., `newsletter.pragmaticengineer.com`)
`hasPodcast` / `hasCommunity`	boolean	Feature flags on the publication

Plus optional enrichment fields (recommendations, contributors, full pricing plans, sponsorship signal, recent podcast guests, health flags) — see the Enrichment flags section below.

How to scrape Substack newsletters

Click Try for free on the Apify Store page.
Choose your mode:
- URL mode — paste full newsletter URLs (must include https://) into the Newsletter URLs field.
- Discovery mode — pick a Category from the dropdown and (optionally) a Newsletter Type (all, paid, free).
Set Max Results (default 50, max 1000).
(Optional) Enable any enrichment flags — sponsorship signal, recommendations network, contributors, full pricing, health signals, recent guests.
Click Start and watch results populate the dataset in real time.
Export as JSON, CSV, Excel, or HTML, or stream them via the Apify API into your CRM / data warehouse.

You can also schedule the run (daily/weekly) to keep a live snapshot of any category, or trigger it from Make / Zapier / n8n when you need fresh leads on demand.

How much does it cost to scrape Substack?

Flat $1.00 per 1,000 newsletter profiles ($0.001 per profile). Enrichment flags do not add to the price — only the result count is billed.

Volume	Cost
100 newsletters	$0.10
500 newsletters	$0.50
1,000 newsletters	$1.00
10,000 newsletters	$10.00

The Apify Free plan includes $5 of platform usage per month — enough to pull ~5,000 newsletter profiles before paying for results.

Input

See the Input tab for the full configuration UI. The two main parameters:

urls — array of full newsletter URLs. Must include https://. Custom domains and *.substack.com subdomains both work.
category — one of the 31 official Substack category slugs (case-sensitive). Use the dropdown to avoid typos.

You can provide either urls or category — or both in the same run.

URL Mode

{
    "urls": [
        "https://newsletter.pragmaticengineer.com",
        "https://platformer.news",
        "https://lenny.substack.com"
    ]
}

Discovery Mode

{
    "category": "technology",
    "newsletterType": "paid",
    "maxResults": 100
}

All input parameters

Parameter	Type	Default	Description
`urls`	string[]	—	Newsletter URLs (substack.com or custom domains, with `https://`)
`category`	string	—	Category slug for discovery mode (e.g., `technology`)
`newsletterType`	enum	`all`	`all`, `paid`, or `free`
`maxResults`	integer	`50`	Max newsletter profiles to return (1–1000)
`includeActivityMetrics`	boolean	`true`	Calculate posting frequency + last post date
`includeHomepageLinks`	boolean	`false`	External links + sponsorship signal
`includeRecommendations`	boolean	`false`	Outbound + inbound recommendations
`includeContributors`	boolean	`false`	Co-authors / contributors
`includeFullPricing`	boolean	`false`	All Stripe plans + group coupons
`includeHealthSignals`	boolean	`false`	paymentsState, paused, spam flag, etc.
`includeGuests`	boolean	`false`	Recent podcast / co-author guests
`concurrency`	integer	`20`	Newsletters processed in parallel (1–50)
`requestDelayMs`	integer	`0`	Per-worker delay in ms (raise if rate-limited)

Output

You can download the dataset in various formats such as JSON, HTML, CSV, or Excel. Each newsletter is one record:

{
    "name": "The Pragmatic Engineer",
    "description": "Big Tech and high-growth startups, from the inside.",
    "url": "https://newsletter.pragmaticengineer.com",
    "substackUrl": "https://pragmaticengineer.substack.com",
    "customDomain": "newsletter.pragmaticengineer.com",
    "logoUrl": "https://substackcdn.com/image/...",
    "language": "en",
    "createdAt": "2019-07-15T00:00:00.000Z",
    "authorName": "Gergely Orosz",
    "authorHandle": "pragmaticengineer",
    "authorBio": "Writing about big tech and startups...",
    "authorPhotoUrl": "https://substackcdn.com/image/...",
    "twitterHandle": "GergelyOrosz",
    "websiteUrl": "https://newsletter.pragmaticengineer.com",
    "socialLinks": [
        { "platform": "twitter", "url": "https://twitter.com/GergelyOrosz" }
    ],
    "subscriberCount": "241,000",
    "subscriberCountEstimate": "241K+",
    "paidSubscriberDetail": "Thousands of paid subscribers",
    "bestsellerTier": 1,
    "hasPaidSubscription": true,
    "monthlyPriceCents": 1500,
    "yearlyPriceCents": 15000,
    "lastPostDate": "2026-02-25T12:00:00.000Z",
    "estimatedPostsPerMonth": 8.2,
    "category": "Technology",
    "hasPodcast": true,
    "hasCommunity": true,
    "scrapedAt": "2026-02-27T10:30:00.000Z"
}

Enrichment flags

All flags are optional and off by default. Enabling them adds extra HTTP calls per newsletter (no extra billing — pricing is per result, not per call):

Flag	What you get	Extra HTTP calls per newsletter
`includeHomepageLinks`	External links + `acceptsSponsorships: boolean` + `sponsorshipInquiryUrl`	1
`includeRecommendations`	Outbound + inbound recommendation network	2
`includeContributors`	Co-authors / contributors with handles, photos, bios	0 (Discovery) / 1 (URL)
`includeFullPricing`	All Stripe plans, founding tier, group coupons	0 (Discovery) / 1 (URL)
`includeHealthSignals`	paymentsState, pauseReturnDate, flaggedAsSpam, explicit, noIndex/noFollow, inviteOnly	0 (Discovery) / 1 (URL)
`includeGuests`	Recent podcast / co-author guests across last 12 posts	1 (free if Activity Metrics is on)

In URL mode, includeContributors / includeFullPricing / includeHealthSignals share a single posts/by-id/{id} call per newsletter, so enabling all three together still costs only one extra HTTP call.

ℹ️ recommendedByCount reflects the inbound recommendations returned by Substack's recommendations/to/{pubId} endpoint, which currently caps at 50 per call. Newsletters with more than 50 inbound recommenders will report 50.

Tips & advanced options

Speed up large discovery runs — bump concurrency to 40–50. Discovery mode pulls 25 newsletters per leaderboard call, so 1,000 results finish in seconds.
If you see rate-limit warnings — drop concurrency to 10 and set requestDelayMs to 100–500.
Cheaper runs — leave all enrichment flags off. The base profile already includes subscriber count, pricing, Twitter handle, and posting frequency — enough for most outreach use cases.
Filter dead newsletters — enable includeHealthSignals and discard records where paymentsState !== "active", flaggedAsSpam === true, or pauseReturnDate is set.
Find sponsorship-friendly authors — enable includeHomepageLinks and filter acceptsSponsorships === true.

Available categories

Culture, Technology, Business, U.S. Politics, Finance, Food & Drink, Podcast, Sports, Art & Illustration, World Politics, Health Politics, News, Fashion & Beauty, Music, Faith & Spirituality, Climate & Environment, Science, Literature, Fiction, Health & Wellness, Design, Travel, Parenting, Philosophy, Comics, International, Crypto, History, Humor, Education, Film & TV.

Use the Category dropdown in the input form to pick the exact slug.

What's new in v0.3

6 new opt-in enrichment flags (sponsorship signal, recommendations, contributors, full pricing, health signals, guests).
Worker-pool concurrency (default 20) — large discovery jobs finish in a fraction of v0.2 time.
Cleaner error handling — input issues now exit with a status message instead of marking the run Failed.

Heads-up for v0.2 users (breaking changes)

The "first 100 free" wedge has been removed. All results are charged at $0.001 each from result 1.
URLs must include https://. Bare domains like newsletter.pragmaticengineer.com are rejected.
Categories must be one of the 31 official Substack slugs (case-sensitive). Variants like "Technology" or "tech" are rejected — use the dropdown.

FAQ

Is it legal to scrape Substack?

The actor only reads data that Substack itself exposes publicly via its own JSON endpoints — the same data any visitor can see in their browser. We do not bypass logins, paywalls, or rate limits, and we do not extract paid-only post content.

Why is my custom domain returning no profile?

Make sure the URL includes https://. Bare hostnames like newsletter.example.com are rejected by the input schema.

How fresh is the data?

Live — every run hits Substack's API directly. No caching layer.

Can I scrape posts / full content?

This actor is for lead generation (publication + author + metrics). For full post content, use the Substack Scraper.

Why does `recommendedByCount` cap at 50?

Substack's recommendations/to/{pubId} endpoint returns a maximum of 50 recommenders per call. Pagination support is tracked for v0.4.

What if a run fails or migrates mid-way?

The actor checkpoints pushed records on every push and on platform migrating events. Resumed runs skip already-pushed newsletters and won't double-charge.

Disclaimer

Our actors are ethical and do not extract any private user data, such as email addresses, gender, or location. They only extract what the user has chosen to share publicly on Substack. We therefore believe that this actor, when used for ethical purposes by Apify users, is safe. However, you should be aware that your results could contain personal data. Personal data is protected by the GDPR in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers.

Support & feedback

Found a bug or want a feature? Open it on the Issues tab.
Need programmatic access? See the API tab for ready-to-use code snippets in JS, Python, and curl.

Substack Scraper — scrape full post content from Substack newsletters

Substack Leads Scraper

digispruce/substack-leads-scraper

Find newsletter creators to pitch. Extracts author contact email, social profiles (Twitter, LinkedIn, Facebook, Instagram), subscriber counts, and publication metadata from Substack newsletters. One flat lead row per URL — drop straight into your CRM. Pay-per-event pricing.

Akram

Substack Scraper — Newsletters, Posts & Creator Leads

scrapesage/substack-scraper

Scrape Substack: search newsletters by keyword, browse category leaderboards, pull full publication profiles (subscribers, paid pricing, podcast), posts, authors and the recommendation network. Turn creators into leads with contact emails. Monitoring mode. No API key, no browser.

Scrape Sage

Substack Scraper: Posts, Comments & Authors

doggo/substack-scraper-posts-comments-authors

Scrape any Substack publication: post archives, article text, comments, author profiles and subscriber signals. Search across newsletters and export structured data for research, monitoring and AI datasets. No browser. Output to CSV, JSON or Excel.

Doggo

5.0

Substack Email Scraper

scraperx/substack-email-scraper

📧 Substack Email Scraper extracts verified subscriber emails from Substack newsletters for smarter outreach. Automate lead building for B2B sales, marketing, and research — fast, efficient, and developer-friendly. 🚀

ScraperX

Substack Newsletter Scraper

dataharvest/substack-scraper

Scrape Substack newsletters, posts and comments.

Alex v

Substack Scraper - Newsletters, Posts & Authors

logiover/substack-newsletter-scraper

Substack API alternative: scrape newsletters, posts & authors without login. Export Substack data to CSV/JSON. No key, no proxy.

Logiover

Substack Leaderboard Scraper

automation-lab/substack-leaderboard-scraper

📊 Scrape public Substack leaderboards for ranked newsletters, author details, subscriber labels, and publication URLs.

Stas Persiianenko

Substack Scraper - Posts, Authors, Reactions & Newsletters

makework36/substack-scraper

Scrape Substack newsletters via official API. Title, author, bio, audience (free/paid), reactions, comments, cover, podcast duration. HTTP only, $5/1K.

deusex machine

Substack Scraper

noximilian/substack-scraper

Scrape Substack newsletters — fetch post archives, individual posts, comments, recommendations, and publication metadata. Search Substack for publications and content. No auth required for public content.

Noximilian

Substack Leaderboard Scraper 📊

easyapi/substack-leaderboard-scraper

Scrape detailed publication data from Substack leaderboards. Get comprehensive insights about top newsletters including subscriber counts, pricing, author details, and more. Perfect for newsletter research and market analysis.

EasyApi