Pricing

from $2.00 / 1,000 posts

Substack Posts & Creator Scraper

Scrape posts, engagement metrics, and author data from any Substack publication. Get title, author, publish date, likes, comments, paywall status, and full body in Markdown or HTML. Paginates the full archive automatically.

Pricing

from $2.00 / 1,000 posts

Rating

0.0

(0)

Developer

Daniel Dimitrov

Actor stats

Bookmarked

Total users

Monthly active users

8 days ago

Last modified

What does Substack Scraper do?

Substack Scraper extracts post content, engagement metrics, and author data from any Substack publication or individual post URL. It accesses Substack's internal JSON structure directly — no headless browser needed — giving you clean, structured data in seconds rather than minutes.

With a single publication URL, the scraper automatically paginates through the entire archive and returns every post with its title, author, publish date, total reactions, comment count, paywall status, and full body content in your choice of Markdown or HTML.

Why scrape Substack?

Substack has become the home for thousands of high-quality newsletters and independent journalists. The data available on public posts is invaluable for:

Competitor and trend analysis — track what content performs best in your niche, monitor publishing frequency and engagement patterns across publications
Creator and influencer research — build lists of authors with engagement benchmarks for outreach and partnership decisions
Newsletter research — study the structure, cadence, and topics of top-performing newsletters before launching your own
Content backup — archive your own Substack posts with engagement history before platform changes
AI and NLP training data — extract clean, structured long-form text with rich metadata at scale
PR and media monitoring — track journalist activity and media coverage across Substack publications

If you would like more inspiration on how scraping Substack could help your business, check out our industry pages.

Before you start scraping Substack

You need a free Apify account to run this Actor. The free plan includes $5 in monthly credits — enough to scrape several thousand posts. No credit card required.

For large-scale jobs (100,000+ posts), the Actor automatically uses Apify datacenter proxies to rotate IPs and avoid Substack rate limiting.

How to scrape Substack

Open the Actor in Apify Console and click Try for free
Enter one or more Substack publication URLs (e.g., https://www.astralcodexten.com/) or direct post URLs
Set Max Posts Per Publication (default: 100) and choose your preferred Output Format
Click Start and wait for the run to complete
Download your data from the Dataset tab — available in JSON, CSV, Excel, and HTML

Substack Scraper input parameters

Parameter	Type	Required	Default	Description
`startUrls`	Array	✅	—	Substack publication homepages or individual post URLs
`maxItems`	Number	❌	100	Max posts to scrape per publication
`scrapeFormat`	String	❌	`"markdown"`	Post body format: `"markdown"`, `"html"`, or `"none"` (metadata only)
`maxRequestRetries`	Number	❌	3	Retry attempts before a request is abandoned
`maxSessionRotations`	Number	❌	10	Session rotations per request before giving up
`webhookUrl`	String	❌	—	URL to notify when the run finishes (success or failure). Useful for Zapier, Make, and n8n integrations

Input examples

Scrape 50 recent posts from a publication

{
  "startUrls": [{ "url": "https://www.astralcodexten.com/" }],
  "maxItems": 50,
  "scrapeFormat": "markdown"
}

Scrape specific posts

{
  "startUrls": [
    { "url": "https://www.astralcodexten.com/p/seiu-delenda-est" }
  ],
  "scrapeFormat": "html"
}

Metadata only — fastest option, no body content

{
  "startUrls": [
    { "url": "https://www.astralcodexten.com/" },
    { "url": "https://platformer.news/" }
  ],
  "maxItems": 500,
  "scrapeFormat": "none"
}

Substack Scraper output

Each scraped post is stored as a single JSON record in the Actor's dataset:

{
  "url": "https://www.astralcodexten.com/p/seiu-delenda-est",
  "publicationName": "Astral Codex Ten",
  "authorName": "Scott Alexander",
  "title": "SEIU Delenda Est",
  "subtitle": "",
  "postDate": "2024-01-15T10:00:00.000Z",
  "likes": 551,
  "comments": 655,
  "isPaywalled": false,
  "body": "# SEIU Delenda Est\n\nPost content in markdown..."
}

Field	Type	Description
`url`	String	Canonical post URL
`publicationName`	String	Name of the Substack publication
`authorName`	String	Author's display name
`title`	String	Post title
`subtitle`	String	Post subtitle (if present)
`postDate`	String	ISO 8601 publish timestamp
`likes`	Number	Total reactions across all 8 reaction types (❤ 👍 🎉 🔥 😂 😮 😢 😡)
`comments`	Number	Number of comments
`isPaywalled`	Boolean	`true` if the post requires a paid subscription to read in full
`body`	String\|null	Post content in the requested format; `null` when `scrapeFormat` is `"none"`

How much will it cost to scrape Substack?

This Actor uses Pay Per Result pricing — you are charged per post scraped, not per compute time.

Apify gives you $5 free usage credits every month on the Apify Free plan. You can scrape around 2,500 Substack posts per month for that, so those 2,500 results will be completely free!

But if you need to get more data regularly from Substack, you should grab an Apify subscription. We recommend our $49/month Personal plan — you can get up to 25,000 posts every month with the $49 monthly plan!

Or get 250,000+ posts for $499 with the Team plan — wow!

What are the limitations of Substack Scraper?

Paywalled content — only free-preview text is available for paid-only posts; full body requires a subscriber session, which is not supported
Rate limiting — Substack may throttle aggressive scraping; the Actor uses automatic IP rotation via datacenter proxies to mitigate this
Frontend changes — if Substack modifies their internal page structure, the Actor may need an update
Custom domains — most custom-domain Substacks work correctly; a small number with non-standard configurations may not
Comment content — only the comment count is extracted; individual comment text is not supported

Is it legal to scrape Substack?

This Actor only accesses publicly available posts and metadata. Paywalled content is never extracted. Web scraping of publicly accessible data is generally considered lawful in most jurisdictions for research, journalism, and personal use.

Note that personal data is protected by GDPR in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers.

You are responsible for complying with Substack's Terms of Service and applicable laws in your jurisdiction. We also recommend that you read our blog post: is web scraping legal?

Scrape Substack with the Apify API

You can trigger this Actor and download results programmatically using the Apify API. See the API tab on this Actor's page for ready-to-use code examples in JavaScript and Python, or check out the Apify API reference for full details.

Substack Scraper integrations

This Actor works with any platform that supports webhooks or the Apify API:

Zapier / Make / n8n — use the webhookUrl input field to receive a POST notification when the run finishes, then pass the actorRunId to the Apify API to fetch your results
Apify Integrations tab — configure webhooks, scheduled runs, and connections to Google Sheets, Slack, Airtable, and more directly in the Apify Console without writing code
REST API — start a run, poll for completion, and download the dataset via the Apify API v2

API example — Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("sleek_waveform/substack-creator-scraper").call(run_input={
    "startUrls": [{"url": "https://www.astralcodexten.com/"}],
    "maxItems": 100,
    "scrapeFormat": "markdown"
})

posts = client.dataset(run["defaultDatasetId"]).list_items().items
for post in posts:
    print(f"{post['postDate'][:10]} | {post['likes']} likes | {post['title']}")

FAQ about Substack Scraper

Does this Actor require a Substack account or login? No. It only extracts publicly available posts and metadata — no login credentials, session cookies, or Substack API key are required.

Can I scrape paid/paywalled posts? Only the free-preview portion of paywalled posts is accessible. Full body content behind a paid subscription wall is not extracted. The isPaywalled field tells you whether a post is behind a paywall.

How do I scrape the full archive of a newsletter? Set maxItems to a high number (e.g., 1000) and point startUrls to the publication homepage (e.g., https://platformer.news/). The scraper auto-paginates through the entire archive until it hits maxItems or exhausts all posts.

Can I scrape multiple publications at once? Yes. Add multiple URLs to startUrls. Each publication is scraped independently, and all results land in the same dataset with publicationName as a filter column.

What format does the body field use? Your choice: "markdown" (clean prose, good for LLMs and vector databases), "html" (preserves formatting for display), or "none" (metadata only — fastest option for engagement analysis without needing body text).

How many posts can I scrape on the free plan? With Apify's $5 monthly free credit, approximately 2,500 posts per month at no cost.

Does it scrape reader comments? Comment count is extracted (comments field), but individual comment text is not — Substack serves comments via a separate authenticated endpoint.

How do I monitor new posts from a publication weekly? Set up a scheduled run on Apify: Actor page → Schedule → weekly. Filter for posts newer than a specific date by combining maxItems: 20 (which always returns the most recent) with the postDate field in your downstream processing.

Can I use this for LLM training data? Yes. The "markdown" output format produces clean, boilerplate-free prose ideal for LLM fine-tuning and RAG pipelines. Pair with the Website to Markdown Scraper to build multi-source AI training datasets.

High-value Substack publications to scrape

Category	Example publications
AI / Tech	Stratechery, Import AI, The Batch, AI Supremacy
Finance	The Diff, Money Stuff (Bloomberg), Odd Lots
Media / Politics	Semafor, Platformer, The Atlantic
Growth / Startups	Lenny's Newsletter, The Generalist, Not Boring
Newsletter operators	The Rebooting, Inbox Collective

Other sleek_waveform Actors you might like

Website to Markdown Scraper — crawl any website and extract clean Markdown for RAG pipelines. Pairs with Substack Scraper to build multi-source LLM datasets.
Threads Profile & Post Scraper — scrape Threads posts, hashtags, and engagement metrics. Many Substack writers cross-post to Threads — combine both scrapers for a full picture of a creator's reach.
YouTube Trend Scraper — track trending YouTube videos by keyword. Compare Substack newsletter topics against what's gaining traction on YouTube for cross-platform content strategy.

Found this Actor useful? Leave a review on the Apify Store — it takes 30 seconds and helps other developers discover it.

Substack Post Scraper

seemuapps/substack-post-scraper

Scrape all posts from any Substack publication. Title, publish date, likes, comments, restacks, word count, paywall status, and author for every post in the archive.

Andrew

Substack Newsletter Scraper - Posts & Archives

wetyr_corporation/substack-newsletter-scraper

Bulk extract posts from any public Substack newsletter. Title, full content, author, paywall status, reactions. For creator economy intelligence and content trends.

WETYR

📰 Substack Posts Monitor

skootle/substack-posts

Pull every post from any Substack publication via the public archive API: title, author, paywall flag, full body, engagement counts, reading time. For newsletter aggregators, content marketers, and AI summarizers. Export, run via API, schedule, or integrate with other tools.

Skootle

Substack Posts Scraper 📚

easyapi/substack-posts-scraper

Scrape Substack posts and articles by keywords. Extract comprehensive post data including title, author, publication details, podcast information, reactions, and more. Perfect for content analysis and research.

EasyApi

157

1.9

Substack Scraper

automation-lab/substack-scraper

Scrape Substack newsletters — posts, comments, publication metadata. Full archive depth with no caps. Export to JSON, CSV, Excel, or connect via API.

Stas Persiianenko

163

Substack Post Content Fetcher

seemuapps/substack-post-content

Fetch the full HTML content of any public Substack post by URL. Body text, title, subtitle, tags, engagement stats, and author details.

Andrew

Substack Posts Scraper

getdataforme/substack-posts-scraper

The Substack Posts Scraper efficiently extracts and organizes content from Substack posts for market research, competitive intelligence, and content aggregation....

GetDataForMe

Substack Profile Scraper

getdataforme/substack-profile-scraper

The Substack Profile Scraper efficiently extracts detailed data from Substack profiles and posts for analysis, research, and content aggregation....

GetDataForMe

Substack Publication Scraper

parseforge/substack-publication-scraper

Pull every public post from any Substack publication with title, subtitle, body preview, author, publish date, podcast URL, audience type, comment count, and reactions. Filter by post type and date range. Export to JSON, CSV, or Excel for newsletter research and competitive intelligence.

ParseForge

Substack Scraper: Newsletter Posts, Archives & Subscribers

perconey/substack-scraper

Scrape any Substack publication: full post archive, single post detail with body, comment counts, reactions, paid/free audience, podcast metadata. No auth, no proxies, no cookies. Uses Substack official JSON API. Pay only per result.