Pricing

Pay per event

Substack Scraper - Posts, Authors, Reactions & Newsletters

Scrape Substack newsletters via official API. Title, author, bio, audience (free/paid), reactions, comments, cover, podcast duration. HTTP only, $5/1K.

Pricing

Pay per event

Rating

0.0

(0)

Developer

deusex machine

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Why use this Substack scraper

Substack has no advertised public API, but every publication on the platform exposes its archive at /api/v1/archive — the same endpoint Substack's own web app uses to render the post list. The data is canonical, complete and stable.

This actor uses that endpoint directly. That means:

✅ Official-grade reliability — when Substack changes UI, the API stays
✅ Complete fields — every datapoint Substack itself stores (audience, reactions emoji counts, podcast duration, word count when available)
✅ Full byline objects — author name, handle, bio, photo URL, primary publication, guest status
✅ No anti-bot encountered on the archive endpoint as of the latest tests
✅ Fast — 4–8 posts per second, single worker, 256 MB memory
✅ Cheap — $0.005 per post ($5 per 1,000), undercutting every Substack scraper in the Apify Store

What this Substack scraper extracts

Per post

Field	Description	Example
`postId`	Substack internal post ID	`156234812`
`publicationId`	Internal publication ID	`10845`
`publicationUrl`	Canonical publication URL	`https://www.lennysnewsletter.com`
`title`	Post title	`Why SaaS freemium playbooks don't work in AI`
`subtitle`	Subhead/dek shown under the title	`How to build an AI monetization strategy that actually works`
`slug`	URL-safe post slug	`why-saas-freemium-playbooks-dont`
`canonicalUrl`	Full canonical URL of the post	`https://www.lennysnewsletter.com/p/why-saas-freemium...`
`postType`	`newsletter`, `podcast`, `thread`, `video`	`newsletter`
`audience`	Who can read: `everyone`, `only_paid`, `only_subscribers`, `founding`	`only_paid`
`postDate`	ISO 8601 publication date	`2026-05-05T13:03:32.007Z`
`description`	Short marketing description	`"How to build an AI monetization strategy..."`
`coverImage`	Hero/cover image URL	`https://substackcdn.com/image/...`
`reactions`	Emoji-keyed dict of reaction counts	`{"❤": 315}`
`reactionsTotal`	Sum across all emoji	`315`
`commentCount`	Total comments on the post	`6`
`wordCount`	Word count (when Substack stores it)	`2840`
`podcastDuration`	Seconds (for podcast posts only)	`2715`
`freeUnlockRequired`	True if reader must subscribe (even free) to unlock	`false`
`isAudio`	True if post has audio content (podcast/voiceover)	`false`
`sectionId`	Internal section ID if publication has sections	`8127`
`bylines`	Array of full author objects (see below)	`[{...}]`
`bylinesCount`	Number of authors on the post	`1`
`scrapedAt`	ISO 8601 UTC timestamp	`2026-05-18T20:55:14+00:00`

Per byline (author embedded in each post)

Field	Description	Example
`id`	Author user ID	`131847289`
`name`	Display name	`Vikas Kansal`
`handle`	Substack username	`vikaskansal`
`bio`	Self-written bio	`"Product lead for Google AI subscriptions..."`
`photoUrl`	Profile photo URL	`https://substack-post-media...`
`isGuest`	True if the author is a guest writer	`true`
`primaryPublicationName`	Their own newsletter name	`Vikas Kansal`
`primaryPublicationUrl`	Their own newsletter URL	`https://vikaskansal.substack.com`
`primaryPublicationId`	Their own publication ID	`8927213`

Optional: global category catalog

Set includeCategories: true and the actor emits one extra helper record containing all 32 Substack categories (id, name, canonical name, rank). Useful for building a category tree in your application.

Use cases for this Substack data API

Tools like Beehiiv's ad marketplace, Swapstack, Letterhead and Hypefury sponsorship need fresh metadata for every newsletter they list. Schedule this actor weekly per publication URL — get the latest 50 posts, audience type (paid vs free) and reactions to qualify ad inventory.

💰 B2B SaaS sales prospecting Substack writers

Substack writers are heavy buyers of newsletter automation, video tools, email design tools, course platforms, podcast hosting and CRM software. Build a sales target list by scraping the top 200 Substack publications in your niche and enriching each byline's bio for fit signals ("ex-Google", "Y Combinator", "PhD"...).

🤖 LLM training data and RAG pipelines

Substack hosts the highest density of long-form, original, well-edited writing on the open web. Extract posts from technology, finance, science or culture publications to build a corpus for fine-tuning, retrieval-augmented generation, or topical agents.

📊 Creator economy analytics

How does post frequency correlate with reactions? Which audience type (free vs paid) gets the most comments? Pull thousands of posts across hundreds of publications and answer those questions with real data.

✍️ Competitive content marketing

Marketing teams at Notion, Hubspot, Linear, Vercel and Stripe monitor what their target audience reads on Substack. Schedule this scraper to send daily digests of new posts from competitor / industry publications to a Slack channel via Apify integrations.

📰 Journalism trend tracking

Want to know what every tech newsletter said about the OpenAI o5 launch? Pull the latest 20 posts from the top 50 Substack tech writers and run sentiment + keyword extraction on the bodies.

🎯 Investor / VC research

VCs increasingly track Substack engagement as a leading indicator of founder reputation, market thesis traction and sector heat. Pull the top tech / finance / AI publications and watch reaction velocity.

How to use this Substack scraper

Mode 1: Publication URLs (core)

Pass one or more Substack publication URLs. The actor paginates through each publication's archive using the public /api/v1/archive endpoint and returns one record per post.

{
  "publicationUrls": [
    "https://www.lennysnewsletter.com",
    "https://stratechery.com",
    "https://www.platformer.news"
  ],
  "maxPostsPerPublication": 100,
  "maxTotalPosts": 500,
  "sortOrder": "new",
  "audienceFilter": "all"
}

Mode 2: Audience filter

Filter results by audience tier:

"all" (default) — every post regardless of paywall
"free" — only audience: "everyone" posts (readable without subscribing)
"paid" — only audience: "only_paid", "only_subscribers", "founding" posts (paywalled)

Mode 3: Categories catalog

Set includeCategories: true to also receive the global list of 32 Substack categories (Culture, Technology, Business, U.S. Politics, Finance, AI, Crypto, etc) with internal IDs, ranks and parent relationships. Useful if you're building a Substack discovery interface in your own product.

{
  "publicationUrls": ["https://www.lennysnewsletter.com"],
  "includeCategories": true,
  "maxPostsPerPublication": 20
}

Step-by-step tutorial — your first Substack run in 2 minutes

Click "Try for free" on this actor's Apify Store page. Apify gives every new user $5 in credit.
Find a Substack publication you want to scrape. Any URL like https://{anything}.substack.com or any custom domain (stratechery.com, lennysnewsletter.com) works.

Paste the example input:

{
  "publicationUrls": ["https://www.lennysnewsletter.com"],
  "maxPostsPerPublication": 20,
  "audienceFilter": "all"
}

Click "Start". The actor pages the publication's archive and pushes one record per post.
Download your dataset as JSON, CSV, Excel, RSS or HTML.

You'll have 20 fully-structured Substack posts in under 10 seconds.

Performance and cost

HTTP only, no browser, no proxy. Uses curl_cffi Chrome 120 impersonate against the publication's native API endpoints.
4–8 posts per second sustained throughput.
No anti-bot on the archive endpoint as of the latest tests (Substack designed this endpoint to power their own web app — it's intentionally fast and unauthenticated for free archive access).
Pricing: $0.005 per post + $0.00005 per actor start. No subscription, no commitment.

Pricing scenarios

Workload	Posts	Cost
Try the actor on Lenny's	20	$0.10
One Apify free $5 credit	~1,000	$5.00
Top 10 tech publications × 50 latest posts	500	$2.50
Daily refresh of 30 newsletters (1 month)	~9,000	$45.00
Bulk archive snapshot — 100 pubs × 500 posts each	50,000	$250.00

Output example (single Substack post)

{
  "type": "post",
  "postId": 156234812,
  "publicationId": 10845,
  "publicationUrl": "https://www.lennysnewsletter.com",
  "title": "Why SaaS freemium playbooks don't work in AI, and what to do instead",
  "subtitle": "How to build an AI monetization strategy that actually works",
  "slug": "why-saas-freemium-playbooks-dont",
  "canonicalUrl": "https://www.lennysnewsletter.com/p/why-saas-freemium-playbooks-dont",
  "postType": "newsletter",
  "audience": "only_paid",
  "postDate": "2026-05-05T13:03:32.007Z",
  "description": "How to build an AI monetization strategy that actually works",
  "coverImage": "https://substackcdn.com/image/...",
  "reactions": {"❤": 315},
  "reactionsTotal": 315,
  "commentCount": 6,
  "wordCount": null,
  "podcastDuration": null,
  "freeUnlockRequired": false,
  "isAudio": false,
  "sectionId": null,
  "bylines": [
    {
      "id": 131847289,
      "name": "Vikas Kansal",
      "handle": "vikaskansal",
      "bio": "Product lead for Google AI subscriptions...",
      "photoUrl": "https://substack-post-media.s3.amazonaws.com/...",
      "isGuest": true,
      "primaryPublicationName": "Vikas Kansal",
      "primaryPublicationUrl": "https://vikaskansal.substack.com",
      "primaryPublicationId": 8927213
    }
  ],
  "bylinesCount": 1,
  "scrapedAt": "2026-05-18T20:55:14+00:00"
}

How this Substack scraper compares

Approach	Pros	Cons
This actor	Official API endpoint, full bylines, $5/1K, no proxy	No global discovery — user provides publication URLs
Substack RSS feeds	Free	Sparse fields, no audience info, no reactions, no byline IDs
Newsletter Stack DBs (Letterhead, Hypefury)	Curated	Paid subscriptions $50–$300/mo, smaller coverage
BeehiivAds inventory data	Real-time	Beehiiv-only, no Substack overlap
Manual RSS-to-CSV scripts	Free	Brittle, no paid-post metadata, no API stability
Hiring a freelancer	Custom	$200–$800 one-off, not maintained

How to call this Substack scraper from your code

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("makework36/substack-scraper").call(run_input={
    "publicationUrls": ["https://www.lennysnewsletter.com"],
    "maxPostsPerPublication": 50,
    "audienceFilter": "all",
})
for p in client.dataset(run["defaultDatasetId"]).iterate_items():
    if p["type"] == "post":
        print(p["title"], p["audience"], p["reactionsTotal"], p["commentCount"])

Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('makework36/substack-scraper').call({
  publicationUrls: ['https://stratechery.com'],
  maxPostsPerPublication: 30,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(p => console.log(p.title, p.bylines[0]?.name, p.reactionsTotal));

cURL (synchronous run)

curl -X POST "https://api.apify.com/v2/acts/makework36~substack-scraper/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"publicationUrls":["https://www.lennysnewsletter.com"],"maxPostsPerPublication":10}'

Frequently Asked Questions about scraping Substack

Is scraping Substack legal?

The /api/v1/archive endpoint is the same one Substack's own web client uses to render the archive page on every publication. It is unauthenticated by design — Substack actively wants the archive to be indexable. This actor consumes that same public endpoint at human-like rates. You are responsible for how you store and redistribute the data; respect each writer's copyright on post content, and the Substack Terms of Service for downstream use.

Why doesn't the scraper return post body text?

The archive endpoint returns post metadata only — title, subtitle, description, audience, byline. Full body content lives at the post detail endpoint ({pub}/p/{slug}) and is paywalled for only_paid and only_subscribers posts. Extracting paid bodies is not part of v1 (and would not pass Substack's TOS without subscriber credentials).

Will my account get banned?

The actor runs unauthenticated against public endpoints. Substack has no account-level rate limiting on the archive endpoint. We've not observed any IP block as of the latest tests. The actor inserts a 300 ms pause between paginated requests to remain polite.

How current is the data?

Live — every run hits Substack directly and returns posts as listed at request time. There is no cache.

Can I scrape every Substack publication that exists?

Substack hosts hundreds of thousands of publications. There is no public "top publications" discovery endpoint. To build a comprehensive dataset, supply known URLs (Substack writers usually advertise their newsletters on Twitter/X, LinkedIn or their own websites), or parse Substack's sitemap-tt.xml.gz (millions of pubs — uses lots of credits).

How do I find publication URLs?

Visit https://substack.com/explore and click any newsletter
Look for *.substack.com subdomains or custom domains on Twitter/X bios of writers in your niche
Use the includeCategories: true option to get the 32-category catalog, then manually browse top publications

Can I filter by paid vs free posts?

Yes — set audienceFilter to "free" (only public posts) or "paid" (paywalled posts). The actor still extracts metadata for paid posts; only the body is gated.

Does the actor support podcasts?

Yes. Substack hosts thousands of podcast publications. Posts of postType: "podcast" include podcastDuration in seconds. Full audio is on the canonical URL and not part of the JSON output.

Can I schedule this scraper?

Yes. Use Apify's built-in scheduler to refresh your dataset daily, weekly or monthly. Push results to Google Sheets, BigQuery, Postgres or Slack via Apify integrations.

Will the reactions count include all emoji?

Yes — the reactions field is a dict like {"❤": 315, "🔥": 22, "👍": 11}. The reactionsTotal field sums them.

How accurate is the byline data?

Bylines come straight from Substack's user database. The bio field is the self-written bio at the moment of post publication (which Substack stores per-post, not as a live join). primaryPublicationName and primaryPublicationUrl may have changed if the author migrated newsletters since publishing.

Is there a free trial?

Yes — Apify gives every new user $5 in platform credit, enough to extract ~1,000 Substack posts with this actor.

Can I get subscriber counts per publication?

Substack does not expose subscriber counts publicly; the archive endpoint doesn't include them. Some publications display "X,000 subscribers" in their hero copy — that's HTML scraping for a future v1.1 release.

What about Substack Notes (the Twitter-like feed)?

Notes are a separate platform with its own API. This actor covers posts only. A Notes-specific scraper may come in v2.

Can I use this for academic / research projects?

Absolutely. Many social-science researchers and digital-humanities labs use Substack data for studying long-form journalism, polarization, paid content dynamics and creator economy trends. Cite the actor in your bibliography.

🔗 Other actors by makework36

Building content marketing, sales prospecting, or creator-economy tooling? Combine with these:

Shopify Products Scraper — full Shopify catalog: title, SKU, price, variants, inventory
Goodreads Scraper — books, authors, ratings, ISBN
IndiaMART Suppliers Scraper — India B2B suppliers
Email Finder Scraper — verified business emails
Reddit SaaS Leads Scraper — startup pain points & buyers
Lovable Sites Scraper — enumerate AI-builder apps
Trustpilot Reviews Scraper — customer ratings

See all actors by makework36 on the Apify Store.

Roadmap

v1.1: subscriber count extraction from publication homepage HTML.
v1.2: post body extraction for audience: "everyone" posts (free posts only).
v1.3: comments thread extraction.
v2: Substack Notes scraper as a separate actor.

Disclaimer

This actor consumes the /api/v1/archive endpoint that every Substack publication exposes by design — the same endpoint that powers the publication's own web client. You are responsible for respecting each writer's copyright on the post content, Substack's Terms of Service, and applicable data protection regulations (GDPR for EU subjects, CCPA for California subjects) when storing, transforming or redistributing the data.

🙏 Ran this Substack scraper successfully? Leaving a review helps the Apify algorithm surface this actor to other newsletter operators and creator-economy teams. Much appreciated.

Substack Scraper - Newsletters, Posts & Authors

logiover/substack-newsletter-scraper

Substack API alternative: scrape newsletters, posts & authors without login. Export Substack data to CSV/JSON. No key, no proxy.

Logiover

Substack Scraper - Download Newsletter Content Fast

stanvanrooy6/substack-scraper

Substack scraper for newsletters. Extract posts with titles, dates, authors, tags, and reactions.

Stan Van Rooy

Substack Newsletter Scraper

dataharvest/substack-scraper

Scrape Substack newsletters, posts and comments.

Alex v

Substack Scraper — Newsletters, Posts & Creator Leads

scrapesage/substack-scraper

Scrape Substack: search newsletters by keyword, browse category leaderboards, pull full publication profiles (subscribers, paid pricing, podcast), posts, authors and the recommendation network. Turn creators into leads with contact emails. Monitoring mode. No API key, no browser.

Scrape Sage

Substack Posts Scraper 📚

easyapi/substack-posts-scraper

Scrape Substack posts and articles by keywords. Extract comprehensive post data including title, author, publication details, podcast information, reactions, and more. Perfect for content analysis and research.

EasyApi

209

1.9

Substack Scraper — Publication Posts | $1.50/1K

bovi/substack-publication

Scrape any Substack newsletter's post list via the official Substack public API. No auth, no proxy. Title, subtitle, date, free/paid audience, type, reactions, restacks, podcast_url. Podcast posts billed at premium rate ($2.50/1K). Pay per post.

Vitalii Bondarev

Substack Scraper

automation-lab/substack-scraper

Scrape Substack newsletters — posts, comments, publication metadata. Full archive depth with no caps. Export to JSON, CSV, Excel, or connect via API.

Stas Persiianenko

252

Substack Scraper: Newsletter Posts, Archives & Subscribers

perconey/substack-scraper

Scrape any Substack publication: full post archive, single post detail with body, comment counts, reactions, paid/free audience, podcast metadata. No auth, no proxies, no cookies. Uses Substack official JSON API. Pay only per result.

Perconey

Substack Scraper

noximilian/substack-scraper

Scrape Substack newsletters — fetch post archives, individual posts, comments, recommendations, and publication metadata. Search Substack for publications and content. No auth required for public content.

Noximilian

Substack Scraper: Posts, Comments & Authors

doggo/substack-scraper-posts-comments-authors

Scrape any Substack publication: post archives, article text, comments, author profiles and subscriber signals. Search across newsletters and export structured data for research, monitoring and AI datasets. No browser. Output to CSV, JSON or Excel.

Doggo

5.0

Substack Scraper - Posts, Authors, Reactions & Newsletters

Substack Scraper — Posts, Authors, Reactions & Newsletter Metadata

Why use this Substack scraper

What this Substack scraper extracts

Per post

Per byline (author embedded in each post)

Optional: global category catalog

Use cases for this Substack data API

📨 Newsletter ad networks & sponsorship platforms

💰 B2B SaaS sales prospecting Substack writers

🤖 LLM training data and RAG pipelines

📊 Creator economy analytics

✍️ Competitive content marketing

📰 Journalism trend tracking

🎯 Investor / VC research

How to use this Substack scraper

Mode 1: Publication URLs (core)

Mode 2: Audience filter

Mode 3: Categories catalog

Step-by-step tutorial — your first Substack run in 2 minutes

Performance and cost

Pricing scenarios

Output example (single Substack post)

How this Substack scraper compares

How to call this Substack scraper from your code

Python

Node.js

cURL (synchronous run)

Frequently Asked Questions about scraping Substack

Is scraping Substack legal?

Why doesn't the scraper return post body text?

Will my account get banned?

How current is the data?

Can I scrape every Substack publication that exists?

How do I find publication URLs?

Can I filter by paid vs free posts?

Does the actor support podcasts?

Can I schedule this scraper?

Will the reactions count include all emoji?

How accurate is the byline data?

Is there a free trial?

Can I get subscriber counts per publication?

What about Substack Notes (the Twitter-like feed)?

Can I use this for academic / research projects?

🔗 Other actors by makework36

Roadmap

Disclaimer

You might also like

Substack Scraper - Newsletters, Posts & Authors

Substack Scraper - Download Newsletter Content Fast

Substack Newsletter Scraper

Substack Scraper — Newsletters, Posts & Creator Leads

Substack Posts Scraper 📚

Substack Scraper — Publication Posts | $1.50/1K

Substack Scraper

Substack Scraper: Newsletter Posts, Archives & Subscribers

Substack Scraper

Substack Scraper: Posts, Comments & Authors