Substack Lead Gen Scraper avatar

Substack Lead Gen Scraper

Pricing

from $1.00 / 1,000 newsletter profiles

Go to Apify Store
Substack Lead Gen Scraper

Substack Lead Gen Scraper

Generate B2B leads from Substack newsletters — author profiles, subscriber counts, Twitter handles, paid pricing, recommendations network, and sponsorship signals. URL or category mode. Flat $1 per 1,000 profiles. HTTP-only, no browser.

Pricing

from $1.00 / 1,000 newsletter profiles

Rating

0.0

(0)

Developer

Sourabh Kumar

Sourabh Kumar

Maintained by Community

Actor stats

0

Bookmarked

12

Total users

5

Monthly active users

13 days ago

Last modified

Share

Generate B2B leads from Substack newsletters: author profiles, subscriber counts, Twitter/social handles, paid pricing, recommendations network, sponsorship signals, and posting activity. A Substack API alternative for sales teams, sponsorship prospectors, and market researchers — HTTP-only, no browser, no proxy.

What does Substack Lead Gen Scraper do?

Substack Lead Gen Scraper turns any Substack newsletter (or an entire category leaderboard) into a structured lead record with the author's name, handle, bio, Twitter, custom domain, subscriber count, paid tier pricing, posting frequency, and optional enrichment data. Two modes:

  • URL Mode — paste a list of newsletter URLs and get a detailed profile for each.
  • Discovery Mode — pick a category (Technology, Business, Finance, Crypto, …) and pull the top newsletters with subscriber metrics.

Both modes hit Substack's public JSON endpoints directly — no Cloudflare juggling, no headless browser, no proxy required.

Why scrape Substack?

Substack hosts the highest concentration of independent media operators in B2B — solo writers, ex-journalists, and analysts with engaged paid audiences. That makes it a goldmine for:

  • 📨 Cold outreach to newsletter authors — pitch a tool, sponsorship, or interview using the author's name, handle, and Twitter.
  • 💰 Sponsorship prospecting — combine subscriber count + paid tier price to estimate audience value before you reach out.
  • 📈 Market research — map the newsletter landscape in your category and benchmark against competitors.
  • 🏢 Competitive intelligence — track who's paid vs. free, who's growing (bestseller tier), who recommends whom.
  • 🎯 Influencer & thought-leader discovery — sort by subscriber count and bestseller tier to surface the loudest voices in any niche.

Apify platform extras: scheduled runs, REST/JS/Python API, integrations with Make/Zapier/n8n, dataset export to JSON/CSV/Excel, and pay-per-result billing.

What data can Substack Lead Gen Scraper extract?

FieldTypeDescription
namestringNewsletter name (e.g., The Pragmatic Engineer)
authorNamestringAuthor's display name
authorHandlestringSubstack handle (@pragmaticengineer)
authorBiostringAuthor bio / about text
twitterHandlestringTwitter/X handle
websiteUrlstringAuthor's external website
subscriberCountstringFree subscriber count (e.g., 241,000)
subscriberCountEstimatestringOrder-of-magnitude estimate (241K+)
bestsellerTiernumberSubstack bestseller tier (1 = top tier)
hasPaidSubscriptionbooleanWhether a paid tier exists
monthlyPriceCents / yearlyPriceCentsnumberPaid tier pricing
estimatedPostsPerMonthnumberPosting frequency from the last ~5 posts
lastPostDatestringMost recent post timestamp
categorystringSubstack category
customDomainstringCustom domain (e.g., newsletter.pragmaticengineer.com)
hasPodcast / hasCommunitybooleanFeature flags on the publication

Plus optional enrichment fields (recommendations, contributors, full pricing plans, sponsorship signal, recent podcast guests, health flags) — see the Enrichment flags section below.

How to scrape Substack newsletters

  1. Click Try for free on the Apify Store page.
  2. Choose your mode:
    • URL mode — paste full newsletter URLs (must include https://) into the Newsletter URLs field.
    • Discovery mode — pick a Category from the dropdown and (optionally) a Newsletter Type (all, paid, free).
  3. Set Max Results (default 50, max 1000).
  4. (Optional) Enable any enrichment flags — sponsorship signal, recommendations network, contributors, full pricing, health signals, recent guests.
  5. Click Start and watch results populate the dataset in real time.
  6. Export as JSON, CSV, Excel, or HTML, or stream them via the Apify API into your CRM / data warehouse.

You can also schedule the run (daily/weekly) to keep a live snapshot of any category, or trigger it from Make / Zapier / n8n when you need fresh leads on demand.

How much does it cost to scrape Substack?

Flat $1.00 per 1,000 newsletter profiles ($0.001 per profile). Enrichment flags do not add to the price — only the result count is billed.

VolumeCost
100 newsletters$0.10
500 newsletters$0.50
1,000 newsletters$1.00
10,000 newsletters$10.00

The Apify Free plan includes $5 of platform usage per month — enough to pull ~5,000 newsletter profiles before paying for results.

Input

See the Input tab for the full configuration UI. The two main parameters:

  • urls — array of full newsletter URLs. Must include https://. Custom domains and *.substack.com subdomains both work.
  • category — one of the 31 official Substack category slugs (case-sensitive). Use the dropdown to avoid typos.

You can provide either urls or category — or both in the same run.

URL Mode

{
"urls": [
"https://newsletter.pragmaticengineer.com",
"https://platformer.news",
"https://lenny.substack.com"
]
}

Discovery Mode

{
"category": "technology",
"newsletterType": "paid",
"maxResults": 100
}

All input parameters

ParameterTypeDefaultDescription
urlsstring[]Newsletter URLs (substack.com or custom domains, with https://)
categorystringCategory slug for discovery mode (e.g., technology)
newsletterTypeenumallall, paid, or free
maxResultsinteger50Max newsletter profiles to return (1–1000)
includeActivityMetricsbooleantrueCalculate posting frequency + last post date
includeHomepageLinksbooleanfalseExternal links + sponsorship signal
includeRecommendationsbooleanfalseOutbound + inbound recommendations
includeContributorsbooleanfalseCo-authors / contributors
includeFullPricingbooleanfalseAll Stripe plans + group coupons
includeHealthSignalsbooleanfalsepaymentsState, paused, spam flag, etc.
includeGuestsbooleanfalseRecent podcast / co-author guests
concurrencyinteger20Newsletters processed in parallel (1–50)
requestDelayMsinteger0Per-worker delay in ms (raise if rate-limited)

Output

You can download the dataset in various formats such as JSON, HTML, CSV, or Excel. Each newsletter is one record:

{
"name": "The Pragmatic Engineer",
"description": "Big Tech and high-growth startups, from the inside.",
"url": "https://newsletter.pragmaticengineer.com",
"substackUrl": "https://pragmaticengineer.substack.com",
"customDomain": "newsletter.pragmaticengineer.com",
"logoUrl": "https://substackcdn.com/image/...",
"language": "en",
"createdAt": "2019-07-15T00:00:00.000Z",
"authorName": "Gergely Orosz",
"authorHandle": "pragmaticengineer",
"authorBio": "Writing about big tech and startups...",
"authorPhotoUrl": "https://substackcdn.com/image/...",
"twitterHandle": "GergelyOrosz",
"websiteUrl": "https://newsletter.pragmaticengineer.com",
"socialLinks": [
{ "platform": "twitter", "url": "https://twitter.com/GergelyOrosz" }
],
"subscriberCount": "241,000",
"subscriberCountEstimate": "241K+",
"paidSubscriberDetail": "Thousands of paid subscribers",
"bestsellerTier": 1,
"hasPaidSubscription": true,
"monthlyPriceCents": 1500,
"yearlyPriceCents": 15000,
"lastPostDate": "2026-02-25T12:00:00.000Z",
"estimatedPostsPerMonth": 8.2,
"category": "Technology",
"hasPodcast": true,
"hasCommunity": true,
"scrapedAt": "2026-02-27T10:30:00.000Z"
}

Enrichment flags

All flags are optional and off by default. Enabling them adds extra HTTP calls per newsletter (no extra billing — pricing is per result, not per call):

FlagWhat you getExtra HTTP calls per newsletter
includeHomepageLinksExternal links + acceptsSponsorships: boolean + sponsorshipInquiryUrl1
includeRecommendationsOutbound + inbound recommendation network2
includeContributorsCo-authors / contributors with handles, photos, bios0 (Discovery) / 1 (URL)
includeFullPricingAll Stripe plans, founding tier, group coupons0 (Discovery) / 1 (URL)
includeHealthSignalspaymentsState, pauseReturnDate, flaggedAsSpam, explicit, noIndex/noFollow, inviteOnly0 (Discovery) / 1 (URL)
includeGuestsRecent podcast / co-author guests across last 12 posts1 (free if Activity Metrics is on)

In URL mode, includeContributors / includeFullPricing / includeHealthSignals share a single posts/by-id/{id} call per newsletter, so enabling all three together still costs only one extra HTTP call.

ℹ️ recommendedByCount reflects the inbound recommendations returned by Substack's recommendations/to/{pubId} endpoint, which currently caps at 50 per call. Newsletters with more than 50 inbound recommenders will report 50.

Tips & advanced options

  • Speed up large discovery runs — bump concurrency to 40–50. Discovery mode pulls 25 newsletters per leaderboard call, so 1,000 results finish in seconds.
  • If you see rate-limit warnings — drop concurrency to 10 and set requestDelayMs to 100–500.
  • Cheaper runs — leave all enrichment flags off. The base profile already includes subscriber count, pricing, Twitter handle, and posting frequency — enough for most outreach use cases.
  • Filter dead newsletters — enable includeHealthSignals and discard records where paymentsState !== "active", flaggedAsSpam === true, or pauseReturnDate is set.
  • Find sponsorship-friendly authors — enable includeHomepageLinks and filter acceptsSponsorships === true.

Available categories

Culture, Technology, Business, U.S. Politics, Finance, Food & Drink, Podcast, Sports, Art & Illustration, World Politics, Health Politics, News, Fashion & Beauty, Music, Faith & Spirituality, Climate & Environment, Science, Literature, Fiction, Health & Wellness, Design, Travel, Parenting, Philosophy, Comics, International, Crypto, History, Humor, Education, Film & TV.

Use the Category dropdown in the input form to pick the exact slug.

What's new in v0.3

  • 6 new opt-in enrichment flags (sponsorship signal, recommendations, contributors, full pricing, health signals, guests).
  • Worker-pool concurrency (default 20) — large discovery jobs finish in a fraction of v0.2 time.
  • Cleaner error handling — input issues now exit with a status message instead of marking the run Failed.

Heads-up for v0.2 users (breaking changes)

  • The "first 100 free" wedge has been removed. All results are charged at $0.001 each from result 1.
  • URLs must include https://. Bare domains like newsletter.pragmaticengineer.com are rejected.
  • Categories must be one of the 31 official Substack slugs (case-sensitive). Variants like "Technology" or "tech" are rejected — use the dropdown.

FAQ

The actor only reads data that Substack itself exposes publicly via its own JSON endpoints — the same data any visitor can see in their browser. We do not bypass logins, paywalls, or rate limits, and we do not extract paid-only post content.

Why is my custom domain returning no profile?

Make sure the URL includes https://. Bare hostnames like newsletter.example.com are rejected by the input schema.

How fresh is the data?

Live — every run hits Substack's API directly. No caching layer.

Can I scrape posts / full content?

This actor is for lead generation (publication + author + metrics). For full post content, use the Substack Scraper.

Why does recommendedByCount cap at 50?

Substack's recommendations/to/{pubId} endpoint returns a maximum of 50 recommenders per call. Pagination support is tracked for v0.4.

What if a run fails or migrates mid-way?

The actor checkpoints pushed records on every push and on platform migrating events. Resumed runs skip already-pushed newsletters and won't double-charge.

Disclaimer

Our actors are ethical and do not extract any private user data, such as email addresses, gender, or location. They only extract what the user has chosen to share publicly on Substack. We therefore believe that this actor, when used for ethical purposes by Apify users, is safe. However, you should be aware that your results could contain personal data. Personal data is protected by the GDPR in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers.

Support & feedback

  • Found a bug or want a feature? Open it on the Issues tab.
  • Need programmatic access? See the API tab for ready-to-use code snippets in JS, Python, and curl.