Substack Lead Gen Scraper
Pricing
from $1.00 / 1,000 newsletter profiles
Substack Lead Gen Scraper
Generate B2B leads from Substack newsletters — author profiles, subscriber counts, Twitter handles, paid pricing, recommendations network, and sponsorship signals. URL or category mode. Flat $1 per 1,000 profiles. HTTP-only, no browser.
Pricing
from $1.00 / 1,000 newsletter profiles
Rating
0.0
(0)
Developer
Sourabh Kumar
Maintained by CommunityActor stats
0
Bookmarked
12
Total users
5
Monthly active users
13 days ago
Last modified
Categories
Share
Generate B2B leads from Substack newsletters: author profiles, subscriber counts, Twitter/social handles, paid pricing, recommendations network, sponsorship signals, and posting activity. A Substack API alternative for sales teams, sponsorship prospectors, and market researchers — HTTP-only, no browser, no proxy.
What does Substack Lead Gen Scraper do?
Substack Lead Gen Scraper turns any Substack newsletter (or an entire category leaderboard) into a structured lead record with the author's name, handle, bio, Twitter, custom domain, subscriber count, paid tier pricing, posting frequency, and optional enrichment data. Two modes:
- URL Mode — paste a list of newsletter URLs and get a detailed profile for each.
- Discovery Mode — pick a category (Technology, Business, Finance, Crypto, …) and pull the top newsletters with subscriber metrics.
Both modes hit Substack's public JSON endpoints directly — no Cloudflare juggling, no headless browser, no proxy required.
Why scrape Substack?
Substack hosts the highest concentration of independent media operators in B2B — solo writers, ex-journalists, and analysts with engaged paid audiences. That makes it a goldmine for:
- 📨 Cold outreach to newsletter authors — pitch a tool, sponsorship, or interview using the author's name, handle, and Twitter.
- 💰 Sponsorship prospecting — combine subscriber count + paid tier price to estimate audience value before you reach out.
- 📈 Market research — map the newsletter landscape in your category and benchmark against competitors.
- 🏢 Competitive intelligence — track who's paid vs. free, who's growing (bestseller tier), who recommends whom.
- 🎯 Influencer & thought-leader discovery — sort by subscriber count and bestseller tier to surface the loudest voices in any niche.
Apify platform extras: scheduled runs, REST/JS/Python API, integrations with Make/Zapier/n8n, dataset export to JSON/CSV/Excel, and pay-per-result billing.
What data can Substack Lead Gen Scraper extract?
| Field | Type | Description |
|---|---|---|
name | string | Newsletter name (e.g., The Pragmatic Engineer) |
authorName | string | Author's display name |
authorHandle | string | Substack handle (@pragmaticengineer) |
authorBio | string | Author bio / about text |
twitterHandle | string | Twitter/X handle |
websiteUrl | string | Author's external website |
subscriberCount | string | Free subscriber count (e.g., 241,000) |
subscriberCountEstimate | string | Order-of-magnitude estimate (241K+) |
bestsellerTier | number | Substack bestseller tier (1 = top tier) |
hasPaidSubscription | boolean | Whether a paid tier exists |
monthlyPriceCents / yearlyPriceCents | number | Paid tier pricing |
estimatedPostsPerMonth | number | Posting frequency from the last ~5 posts |
lastPostDate | string | Most recent post timestamp |
category | string | Substack category |
customDomain | string | Custom domain (e.g., newsletter.pragmaticengineer.com) |
hasPodcast / hasCommunity | boolean | Feature flags on the publication |
Plus optional enrichment fields (recommendations, contributors, full pricing plans, sponsorship signal, recent podcast guests, health flags) — see the Enrichment flags section below.
How to scrape Substack newsletters
- Click Try for free on the Apify Store page.
- Choose your mode:
- URL mode — paste full newsletter URLs (must include
https://) into the Newsletter URLs field. - Discovery mode — pick a Category from the dropdown and (optionally) a Newsletter Type (
all,paid,free).
- URL mode — paste full newsletter URLs (must include
- Set Max Results (default
50, max1000). - (Optional) Enable any enrichment flags — sponsorship signal, recommendations network, contributors, full pricing, health signals, recent guests.
- Click Start and watch results populate the dataset in real time.
- Export as JSON, CSV, Excel, or HTML, or stream them via the Apify API into your CRM / data warehouse.
You can also schedule the run (daily/weekly) to keep a live snapshot of any category, or trigger it from Make / Zapier / n8n when you need fresh leads on demand.
How much does it cost to scrape Substack?
Flat $1.00 per 1,000 newsletter profiles ($0.001 per profile). Enrichment flags do not add to the price — only the result count is billed.
| Volume | Cost |
|---|---|
| 100 newsletters | $0.10 |
| 500 newsletters | $0.50 |
| 1,000 newsletters | $1.00 |
| 10,000 newsletters | $10.00 |
The Apify Free plan includes $5 of platform usage per month — enough to pull ~5,000 newsletter profiles before paying for results.
Input
See the Input tab for the full configuration UI. The two main parameters:
urls— array of full newsletter URLs. Must includehttps://. Custom domains and*.substack.comsubdomains both work.category— one of the 31 official Substack category slugs (case-sensitive). Use the dropdown to avoid typos.
You can provide either urls or category — or both in the same run.
URL Mode
{"urls": ["https://newsletter.pragmaticengineer.com","https://platformer.news","https://lenny.substack.com"]}
Discovery Mode
{"category": "technology","newsletterType": "paid","maxResults": 100}
All input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
urls | string[] | — | Newsletter URLs (substack.com or custom domains, with https://) |
category | string | — | Category slug for discovery mode (e.g., technology) |
newsletterType | enum | all | all, paid, or free |
maxResults | integer | 50 | Max newsletter profiles to return (1–1000) |
includeActivityMetrics | boolean | true | Calculate posting frequency + last post date |
includeHomepageLinks | boolean | false | External links + sponsorship signal |
includeRecommendations | boolean | false | Outbound + inbound recommendations |
includeContributors | boolean | false | Co-authors / contributors |
includeFullPricing | boolean | false | All Stripe plans + group coupons |
includeHealthSignals | boolean | false | paymentsState, paused, spam flag, etc. |
includeGuests | boolean | false | Recent podcast / co-author guests |
concurrency | integer | 20 | Newsletters processed in parallel (1–50) |
requestDelayMs | integer | 0 | Per-worker delay in ms (raise if rate-limited) |
Output
You can download the dataset in various formats such as JSON, HTML, CSV, or Excel. Each newsletter is one record:
{"name": "The Pragmatic Engineer","description": "Big Tech and high-growth startups, from the inside.","url": "https://newsletter.pragmaticengineer.com","substackUrl": "https://pragmaticengineer.substack.com","customDomain": "newsletter.pragmaticengineer.com","logoUrl": "https://substackcdn.com/image/...","language": "en","createdAt": "2019-07-15T00:00:00.000Z","authorName": "Gergely Orosz","authorHandle": "pragmaticengineer","authorBio": "Writing about big tech and startups...","authorPhotoUrl": "https://substackcdn.com/image/...","twitterHandle": "GergelyOrosz","websiteUrl": "https://newsletter.pragmaticengineer.com","socialLinks": [{ "platform": "twitter", "url": "https://twitter.com/GergelyOrosz" }],"subscriberCount": "241,000","subscriberCountEstimate": "241K+","paidSubscriberDetail": "Thousands of paid subscribers","bestsellerTier": 1,"hasPaidSubscription": true,"monthlyPriceCents": 1500,"yearlyPriceCents": 15000,"lastPostDate": "2026-02-25T12:00:00.000Z","estimatedPostsPerMonth": 8.2,"category": "Technology","hasPodcast": true,"hasCommunity": true,"scrapedAt": "2026-02-27T10:30:00.000Z"}
Enrichment flags
All flags are optional and off by default. Enabling them adds extra HTTP calls per newsletter (no extra billing — pricing is per result, not per call):
| Flag | What you get | Extra HTTP calls per newsletter |
|---|---|---|
includeHomepageLinks | External links + acceptsSponsorships: boolean + sponsorshipInquiryUrl | 1 |
includeRecommendations | Outbound + inbound recommendation network | 2 |
includeContributors | Co-authors / contributors with handles, photos, bios | 0 (Discovery) / 1 (URL) |
includeFullPricing | All Stripe plans, founding tier, group coupons | 0 (Discovery) / 1 (URL) |
includeHealthSignals | paymentsState, pauseReturnDate, flaggedAsSpam, explicit, noIndex/noFollow, inviteOnly | 0 (Discovery) / 1 (URL) |
includeGuests | Recent podcast / co-author guests across last 12 posts | 1 (free if Activity Metrics is on) |
In URL mode, includeContributors / includeFullPricing / includeHealthSignals share a single posts/by-id/{id} call per newsletter, so enabling all three together still costs only one extra HTTP call.
ℹ️
recommendedByCountreflects the inbound recommendations returned by Substack'srecommendations/to/{pubId}endpoint, which currently caps at 50 per call. Newsletters with more than 50 inbound recommenders will report50.
Tips & advanced options
- Speed up large discovery runs — bump
concurrencyto40–50. Discovery mode pulls 25 newsletters per leaderboard call, so 1,000 results finish in seconds. - If you see rate-limit warnings — drop
concurrencyto10and setrequestDelayMsto100–500. - Cheaper runs — leave all enrichment flags off. The base profile already includes subscriber count, pricing, Twitter handle, and posting frequency — enough for most outreach use cases.
- Filter dead newsletters — enable
includeHealthSignalsand discard records wherepaymentsState !== "active",flaggedAsSpam === true, orpauseReturnDateis set. - Find sponsorship-friendly authors — enable
includeHomepageLinksand filteracceptsSponsorships === true.
Available categories
Culture, Technology, Business, U.S. Politics, Finance, Food & Drink, Podcast, Sports, Art & Illustration, World Politics, Health Politics, News, Fashion & Beauty, Music, Faith & Spirituality, Climate & Environment, Science, Literature, Fiction, Health & Wellness, Design, Travel, Parenting, Philosophy, Comics, International, Crypto, History, Humor, Education, Film & TV.
Use the Category dropdown in the input form to pick the exact slug.
What's new in v0.3
- 6 new opt-in enrichment flags (sponsorship signal, recommendations, contributors, full pricing, health signals, guests).
- Worker-pool concurrency (default 20) — large discovery jobs finish in a fraction of v0.2 time.
- Cleaner error handling — input issues now exit with a status message instead of marking the run Failed.
Heads-up for v0.2 users (breaking changes)
- The "first 100 free" wedge has been removed. All results are charged at $0.001 each from result 1.
- URLs must include
https://. Bare domains likenewsletter.pragmaticengineer.comare rejected. - Categories must be one of the 31 official Substack slugs (case-sensitive). Variants like
"Technology"or"tech"are rejected — use the dropdown.
FAQ
Is it legal to scrape Substack?
The actor only reads data that Substack itself exposes publicly via its own JSON endpoints — the same data any visitor can see in their browser. We do not bypass logins, paywalls, or rate limits, and we do not extract paid-only post content.
Why is my custom domain returning no profile?
Make sure the URL includes https://. Bare hostnames like newsletter.example.com are rejected by the input schema.
How fresh is the data?
Live — every run hits Substack's API directly. No caching layer.
Can I scrape posts / full content?
This actor is for lead generation (publication + author + metrics). For full post content, use the Substack Scraper.
Why does recommendedByCount cap at 50?
Substack's recommendations/to/{pubId} endpoint returns a maximum of 50 recommenders per call. Pagination support is tracked for v0.4.
What if a run fails or migrates mid-way?
The actor checkpoints pushed records on every push and on platform migrating events. Resumed runs skip already-pushed newsletters and won't double-charge.
Disclaimer
Our actors are ethical and do not extract any private user data, such as email addresses, gender, or location. They only extract what the user has chosen to share publicly on Substack. We therefore believe that this actor, when used for ethical purposes by Apify users, is safe. However, you should be aware that your results could contain personal data. Personal data is protected by the GDPR in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers.
Support & feedback
- Found a bug or want a feature? Open it on the Issues tab.
- Need programmatic access? See the API tab for ready-to-use code snippets in JS, Python, and curl.
Related actors
- Substack Scraper — scrape full post content from Substack newsletters