Substack Scraper — Newsletters, Posts, Authors & Subscribers avatar

Substack Scraper — Newsletters, Posts, Authors & Subscribers

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Substack Scraper — Newsletters, Posts, Authors & Subscribers

Substack Scraper — Newsletters, Posts, Authors & Subscribers

Discover Substack newsletters by category & leaderboard rank, then pull every post, author and publication. 30+ categories or direct subdomain. Per post: title, audience (free/paid), reactions, restacks. Per pub: subdomain, custom domain, author, subscription tiers. Public Substack API — no auth.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Logiover

Logiover

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

11 hours ago

Last modified

Share

Substack Scraper — Newsletter Discovery, Posts, Authors & Subscriber Data

Discover Substack newsletters by category and leaderboard rank, then pull every post, every author and full publication metadata in a single normalized dataset. Search across 30+ Substack categories — Technology, Business, Finance, Crypto, News, Culture, Health, Politics, Science, Design and more — or jump straight to a known subdomain or custom domain.

Built on Substack's public JSON API — no authentication, no proxy, no scraping fight. Per post: title, subtitle, audience (free / paid / founding), post date, reaction count, restacks, cover image, podcast info and canonical URL. Per publication: name, subdomain, custom domain, author display name, hero description, language, subscription benefits and founded date.

Perfect for content intelligence, RAG / LLM training data, newsletter sponsorship outreach, competitive monitoring, author / creator lead generation, and podcast network discovery.


🚀 What does this Substack scraper do?

Two complementary modes — combine them or use one in isolation:

ModeWhen to useWhat it returns
Category DiscoveryFind every tech / finance / crypto newsletter on Substack ranked by leaderboard, top paid, or all-publicationsTop-N publications per category + every post per publication
Direct Newsletter URLsYou already know the newsletter — pass its subdomain, custom domain, or just its nameAll posts of that publication, with metadata

Optional: also push one publication-level record per discovered newsletter (denormalized list of every newsletter alongside posts) for downstream relational joins.


💡 Use cases

  • Content intelligence platforms — track every AI / startup / crypto newsletter on Substack with daily refresh
  • Newsletter sponsorship outreach — pull author names, custom domains, free-vs-paid mix, subscription tier descriptions for cold outreach lists
  • RAG / LLM training data — every public Substack post is high-quality long-form content with author + date + topic metadata, ready for vector indexing
  • Substack-to-Beehiiv migration tools — bulk-export an author's archive
  • Competitive monitoring — track when a competitor's newsletter publishes, what audience tier it targets, and how reader reactions trend
  • Investor / VC research — every fintech / crypto / SaaS founder publishes here now; find them at scale
  • Podcast network discovery — Substack hosts thousands of independent podcasts; the actor exposes podcastFeedUrl for each publication
  • Newsletter ranking dashboards — leaderboard mode returns subscriber-weighted rankings per category

⚙️ Input configuration

FieldTypeDefaultDescription
categorySlugsstring[][]Substack category slugs (see list below). Each is paginated and enumerated to maxPublicationsPerCategory.
categoryRankingstring"leaderboard"leaderboard (top by subs + engagement), paid (top paid), all (default Substack sort).
newsletterUrlsstring[][]Specific newsletters to scrape. Accepts subdomain, subdomain.substack.com, or custom domain.
maxPublicationsPerCategoryinteger25Hard cap per category. Substack returns 25 per page, the actor auto-paginates.
maxPostsPerPublicationinteger50Hard cap per publication via the archive API. 0 = skip posts (publication-only mode).
audienceFilterstring"all"all / free (only posts open to everyone) / paid (only paid-subscriber posts).
minPostDatestringnullDrop posts published before this YYYY-MM-DD.
maxPostDatestringnullDrop posts published after this YYYY-MM-DD.
keywordFilterstring[][]Client-side title/subtitle substring filter (case-insensitive). E.g. ["ai","gpt"].
includePublicationMetadatabooleantrueEnrich each post with parent-publication fields.
alsoPushPublicationRecordbooleanfalsePush one extra record per publication (recordType: "publication") alongside post records.
languagestring""ISO 639-1 publication-language filter (e.g. en).

Supported category slugs

technology · business · finance · crypto · news · us-politics · world-politics · health-politics · culture · science · health · design · travel · parenting · literature · fiction · philosophy · history · climate · art · music · sports · food · film-and-tv · comics · humor · fashionandbeauty · education · faith · international · home-garden · podcast

The full live category list with subcategories is fetched from Substack at runtime, so additions automatically flow through.


📦 Output fields

Records have recordType: "post" or recordType: "publication".

Per-post fields

FieldDescriptionExample
recordType"post""post"
postIdSubstack post ID195672025
postSlugURL slug"why-saas-freemium-playbooks-dont"
postTitlePost title"Why SaaS freemium playbooks don't work in AI..."
postSubtitleSubtitle"How to build an AI monetization strategy..."
postTypenewsletter, podcast, thread, etc."newsletter"
audienceeveryone / only_paid / founding"only_paid"
postDatePublication timestamp (ISO)"2026-05-05T13:03:32.007Z"
canonicalUrlFull post URL"https://www.lennysnewsletter.com/p/..."
coverImageHero image URL"https://substackcdn.com/.../cover.png"
reactionsTotal reactions313
restacksNumber of restacks10
wordCountWord count (if returned)2500
podcastDurationEpisode duration (seconds)1820
podcastUrlPodcast audio URL"https://.../episode.mp3"
videoUploadIdVideo upload ID (if any)"vid_abc"
sectionNameSubstack section"Premium"

Per-publication fields (added when includePublicationMetadata: true)

FieldDescriptionExample
publicationIdSubstack publication ID10845
publicationNameNewsletter name"Lenny's Newsletter"
subdomain*.substack.com subdomain"lennysnewsletter"
customDomainCustom domain (if any)"www.lennysnewsletter.com"
publicationUrlPrimary base URL"https://www.lennysnewsletter.com"
logoUrlLogo image"https://substackcdn.com/.../logo.png"
coverPhotoUrlCover photo"https://..."
publicationDescriptionHero/about text"The #1 product / growth newsletter..."
languageISO 639-1"en"
authorIdAuthor user ID22329494
authorNameDisplay name / copyright owner"Lenny Rachitsky"
categoryCategory slug used to discover"technology"
publicationCreatedAtFounded timestamp"2020-09-08T..."
freeSubscriptionBenefitsBullets shown to free subscribers["Weekly free post"]
paidSubscriptionBenefitsBullets for paid tier["Full archive", "AMAs", ...]
foundingSubscriptionBenefitsTop-tier benefits["Direct access to Lenny"]
communityEnabledComments / Notes enabledtrue
podcastFeedUrlSubstack RSS feed (audio)"https://lennysnewsletter.substack.com/feed/podcast"

🧪 Example inputs

1. Top 25 tech newsletters and their latest 50 posts

{
"categorySlugs": ["technology"],
"categoryRanking": "leaderboard",
"maxPublicationsPerCategory": 25,
"maxPostsPerPublication": 50
}

2. Free posts from top AI/crypto newsletters in the last month

{
"categorySlugs": ["technology", "crypto", "finance"],
"categoryRanking": "leaderboard",
"maxPublicationsPerCategory": 50,
"maxPostsPerPublication": 20,
"audienceFilter": "free",
"minPostDate": "2026-04-15",
"keywordFilter": ["ai", "gpt", "llm", "crypto"]
}

3. One specific newsletter's full archive

{
"newsletterUrls": ["lennysnewsletter"],
"maxPostsPerPublication": 1000
}

4. Build a newsletter directory (publication records only)

{
"categorySlugs": ["technology", "business", "finance", "crypto", "news", "science"],
"categoryRanking": "leaderboard",
"maxPublicationsPerCategory": 100,
"maxPostsPerPublication": 0,
"alsoPushPublicationRecord": true
}

5. Mix categories + direct URLs in one run

{
"categorySlugs": ["technology"],
"categoryRanking": "paid",
"maxPublicationsPerCategory": 30,
"newsletterUrls": ["www.semianalysis.com", "stratechery.substack.com"],
"maxPostsPerPublication": 30,
"audienceFilter": "all"
}

6. English-only top-paid tech newsletters with podcast feeds

{
"categorySlugs": ["technology"],
"categoryRanking": "paid",
"language": "en",
"alsoPushPublicationRecord": true,
"maxPublicationsPerCategory": 100,
"maxPostsPerPublication": 5
}

🧠 How it works

  1. CategoriesGET https://substack.com/api/v1/categories returns every active category and subcategory with numeric IDs.
  2. DiscoveryGET https://substack.com/api/v1/category/public/{id}/{leaderboard|paid|all}?page=N paginates 25 publications per page.
  3. PostsGET https://{subdomain}.substack.com/api/v1/archive?sort=new&offset=O&limit=12 (or custom-domain equivalent) paginates 12 posts per page in reverse-chronological order.
  4. Direct URLs → if you pass a custom domain, the actor probes the /api/v1/archive endpoint on multiple host candidates to find a working base URL.
  5. Deduplication → publications are keyed by publicationId; cross-category enumeration never double-fetches the same newsletter.

No authentication. No proxy. Substack publishes all of this data on its public web.


🛑 Limits & notes

  • Word count, full post body, and subscriber counts are not exposed in the archive endpoint. For the full post HTML/body, the per-post detail endpoint https://{base}/api/v1/posts/{slug} can be added — open an issue if you need it.
  • Subscriber counts are private — Substack only shows them to publication owners. The actor returns proxies (reactions, restacks, leaderboard rank).
  • Paid post bodies are paywalled — you only get the public preview unless the actor is run with a paid-subscriber cookie (out of scope here).
  • Rate limits — Substack does not publish explicit limits but throttles aggressive callers. The actor uses exponential backoff and realistic browser headers; in practice 25k+ posts per run runs cleanly.
  • Non-Substack platforms (Beehiiv, Ghost, ConvertKit, Stratechery's custom platform) will be skipped with a warning when passed in newsletterUrls.

💰 Pricing

Monetized via pay-per-event on Apify — pay per post or publication record saved. Substack's public API is free.


❓ FAQ

Can I get subscriber counts? No — Substack treats subscriber counts as private. The leaderboard rank, reactions, and restack count are the public proxy metrics.

Can I get the full post body? Free posts only. The actor currently returns metadata + canonical URL; for the rendered HTML body, request the per-post detail endpoint as a feature addition.

Does this work with Beehiiv / Ghost? No — Substack-only. Beehiiv has its own public API (separate actor).

How is this different from existing Substack actors? Most existing actors require a list of URLs upfront. This one does discovery first (by category + leaderboard rank), which is the hard part for outreach / intelligence use cases.

Can I export to CSV / Excel? Yes — every Apify dataset can be exported in CSV, Excel, XML, JSONL or RSS straight from the run page.

Will Substack block this? The endpoints used are the same ones the Substack website itself calls. No authentication is required and no rate limit has been hit at typical use. Use respectfully.


  • logiover/apple-podcasts-episode-scraper — feed the podcastFeedUrl from each publication into the podcast scraper for full episode lists
  • logiover/website-contact-scraper — enrich each publication's customDomain with author contact emails for sponsorship outreach
  • logiover/google-news-scraper — track press mentions of the newsletters you scraped
  • logiover/sitemap-to-url-crawler — crawl the custom domain of each publication for landing pages and partnerships

🆘 Support

Need a specific Substack-related feature (full post body, comments, subscriber recommendations graph)? Open an issue on the actor's Apify page.