Substack Leads Scraper avatar

Substack Leads Scraper

Pricing

from $1.00 / 1,000 lead scrape events

Go to Apify Store
Substack Leads Scraper

Substack Leads Scraper

Find newsletter creators to pitch. Extracts author contact email, social profiles (Twitter, LinkedIn, Facebook, Instagram), subscriber counts, and publication metadata from Substack newsletters. One flat lead row per URL — drop straight into your CRM. Pay-per-event pricing.

Pricing

from $1.00 / 1,000 lead scrape events

Rating

0.0

(0)

Developer

Akram

Akram

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Share

Substack Author Scraper

Find newsletter creators to pitch. Takes Substack URLs, returns one CRM-ready lead row per author — name, email, social profiles, subscriber count, publication metadata.

Focused for B2B outreach, sponsorship sales, partnership prospecting, and competitive lead research. No posts, no comments, no article scraping — just the contact data you actually need.

🚀 Key Features

  • 📧 Email Discovery: Author contact emails extracted from About pages, footers, and mailto: links with confidence-based ranking. Blacklists 100+ noise domains (analytics, CDNs, social platforms) and generic prefixes (info@, support@, no-reply@).
  • 🔗 Social Profiles: Twitter, LinkedIn, Facebook, Instagram, and personal website URLs.
  • 📊 Publication Metadata: Title, description, custom domain, logo, created date, paid status, founding-plan name, subscriber count (when publicly visible).
  • 🌐 Custom Domain Support: Handles *.substack.com and custom domains (e.g. platformer.news).
  • 👥 Multi-Publication Authors: Correctly attributes the requested publication (not the author's primary one) for writers who run multiple newsletters.
  • ⚡ Batch Processing: Process hundreds of URLs in a single run.
  • 💰 Pay-per-Lead: One charge per URL processed (success or fail).

📋 Input

Required

  • newsletterUrls (array): List of Substack newsletter URLs to enrich
    • Example: [{"url": "https://platformer.substack.com"}, {"url": "https://lennysnewsletter.com"}]
    • Supports both *.substack.com URLs and custom domains

Optional

  • delayBetweenRequests (number, default: 3000, range: 500-10000): Delay in milliseconds between HTTP requests to avoid rate limiting
  • maxRetries (number, default: 2, range: 0-5): Retry count per URL on failure

💡 Example Input

{
"newsletterUrls": [
{ "url": "https://platformer.substack.com" },
{ "url": "https://lennysnewsletter.com" },
{ "url": "https://andrewsullivan.substack.com" }
],
"delayBetweenRequests": 3000,
"maxRetries": 2
}

📊 Example Output

One row per submitted URL:

{
"author_id": 241262,
"author_name": "Casey Newton",
"author_handle": "platformer",
"author_bio": "Casey Newton is the founder and editor of Platformer...",
"author_photo_url": "https://substack-post-media.s3.amazonaws.com/...",
"author_previous_name": null,
"author_subscriber_count": 176000,
"author_subscriber_count_string": "176K+ subscribers",
"author_follower_count": 209469,
"author_twitter_url": "https://twitter.com/CaseyNewton",
"author_linkedin_url": null,
"author_facebook_url": "https://www.facebook.com/...",
"author_instagram_url": "https://instagram.com/crumbler",
"author_is_active": true,
"author_profile_set_up_at": "2021-04-22T18:51:48.648Z",
"author_profile_disabled": false,
"email": "casey@platformer.news",
"website_url": "https://www.platformer.news",
"publication_id": 7976,
"publication_name": "Platformer",
"publication_subdomain": "platformer",
"publication_url": "https://www.platformer.news",
"publication_description": "News at the intersection of Silicon Valley and democracy...",
"publication_logo_url": "https://bucketeer-...",
"publication_custom_domain": "www.platformer.news",
"publication_created_at": "2019-03-29T13:28:21.009Z",
"publication_subscriber_count": 176000,
"publication_subscriber_count_visible": true,
"publication_is_paid": false,
"publication_founding_plan_name": "Mystery Tier",
"publication_has_posts": true,
"publication_count": 1,
"leaderboard_category": "Technology",
"leaderboard_rank": 4,
"leaderboard_ranking_type": "paid",
"all_publications": [...],
"scraped_at": "2026-05-23T19:19:13.110957+00:00"
}

📋 Output Fields

Grouped by prefix. All fields except those marked required may be null.

Author identity & profile

  • author_id (int, required), author_name (string, required), author_handle (string, required)
  • author_bio, author_photo_url, author_previous_name
  • author_subscriber_count (int), author_subscriber_count_string (string), author_follower_count (int)
  • author_twitter_url, author_linkedin_url, author_facebook_url, author_instagram_url
  • author_is_active (bool), author_profile_set_up_at (ISO 8601), author_profile_disabled (bool)

Contact

  • email — best-effort, may be a personal address, team alias, or null
  • website_url — personal/business site from the author's profile

Publication

  • publication_id (int, required), publication_name (string, required), publication_url (string, required)
  • publication_subdomain, publication_description, publication_logo_url, publication_custom_domain, publication_created_at (ISO 8601)
  • publication_subscriber_count (int), publication_subscriber_count_visible (bool), publication_is_paid (bool), publication_founding_plan_name
  • publication_has_posts (bool), publication_count (int) — how many publications this author runs

Leaderboard (present only when ranked)

  • leaderboard_category, leaderboard_rank (int), leaderboard_ranking_type

Other

  • all_publications (array of objects) — full list with id/name/subdomain/url/is_primary/role/payments_state per entry
  • scraped_at (ISO 8601, required)

🔧 Use Cases

  • Newsletter sponsorship outreach: Build target lists with the actual emails you need to pitch
  • Creator partnership prospecting: Find collaborators in specific niches with subscriber-count context
  • B2B lead generation: Convert a Substack URL list into a CRM-ready contact file
  • Competitive intelligence: Map who's writing what and at what scale
  • Market research: Audience size + monetization status across a topic area

⚠️ Limitations

  • Email is best-effort, not guaranteed. When an author doesn't expose a contact address publicly, email is null.
  • Subscriber count visibility is opt-in. Authors can hide it; publication_subscriber_count_visible: false means it isn't exposed.
  • Broken/dead URLs are still billed. One lead-scrape charge fires per URL regardless of whether enrichment succeeds — compute is consumed either way. Validate your URL list before submitting.
  • Some custom-domain newsletters can't be resolved. When the requested URL doesn't expose the data we need, that URL produces a typed failure record in the run's key-value store under failed_urls.

💰 Pricing

Pay-per-event:

  • One lead-scrape charge per URL processed, whether enrichment succeeds or fails
  • Failed URLs (broken sites, missing data, parsing errors) are billed the same — compute is consumed regardless
  • Validate URLs before submitting to avoid charges on dead newsletters

🛟 Support

Found a bug or have a feature request? Open an issue on the Actor's Issues tab in Apify Console. We respond within a few business days.

🏷️ Tags

Substack, lead generation, B2B outreach, newsletter, email discovery, contact enrichment, creator economy, sponsorship sales