Substack Posts & Creator Scraper avatar

Substack Posts & Creator Scraper

Pricing

from $2.00 / 1,000 posts

Go to Apify Store
Substack Posts & Creator Scraper

Substack Posts & Creator Scraper

Scrape posts, engagement metrics, and author data from any Substack publication. Get title, author, publish date, likes, comments, paywall status, and full body in Markdown or HTML. Paginates the full archive automatically.

Pricing

from $2.00 / 1,000 posts

Rating

0.0

(0)

Developer

Daniel Dimitrov

Daniel Dimitrov

Maintained by Community

Actor stats

0

Bookmarked

11

Total users

3

Monthly active users

11 hours ago

Last modified

Share

What does Substack Scraper do?

Substack Scraper extracts post content, engagement metrics, and author data from any Substack publication or individual post URL. It accesses Substack's internal JSON structure directly — no headless browser needed — giving you clean, structured data in seconds rather than minutes.

With a single publication URL, the scraper automatically paginates through the entire archive and returns every post with its title, author, publish date, total reactions, comment count, paywall status, and full body content in your choice of Markdown or HTML.

Why scrape Substack?

Substack has become the home for thousands of high-quality newsletters and independent journalists. The data available on public posts is invaluable for:

  • Competitor and trend analysis — track what content performs best in your niche, monitor publishing frequency and engagement patterns across publications
  • Creator and influencer research — build lists of authors with engagement benchmarks for outreach and partnership decisions
  • Newsletter research — study the structure, cadence, and topics of top-performing newsletters before launching your own
  • Content backup — archive your own Substack posts with engagement history before platform changes
  • AI and NLP training data — extract clean, structured long-form text with rich metadata at scale
  • PR and media monitoring — track journalist activity and media coverage across Substack publications

If you would like more inspiration on how scraping Substack could help your business, check out our industry pages.

Before you start scraping Substack

You need a free Apify account to run this Actor. The free plan includes $5 in monthly credits — enough to scrape several thousand posts. No credit card required.

For large-scale jobs (100,000+ posts), the Actor automatically uses Apify datacenter proxies to rotate IPs and avoid Substack rate limiting.

How to scrape Substack

  1. Open the Actor in Apify Console and click Try for free
  2. Enter one or more Substack publication URLs (e.g., https://www.astralcodexten.com/) or direct post URLs
  3. Set Max Posts Per Publication (default: 100) and choose your preferred Output Format
  4. Click Start and wait for the run to complete
  5. Download your data from the Dataset tab — available in JSON, CSV, Excel, and HTML

Substack Scraper input parameters

ParameterTypeRequiredDefaultDescription
startUrlsArraySubstack publication homepages or individual post URLs
maxItemsNumber100Max posts to scrape per publication
scrapeFormatString"markdown"Post body format: "markdown", "html", or "none" (metadata only)
maxRequestRetriesNumber3Retry attempts before a request is abandoned
maxSessionRotationsNumber10Session rotations per request before giving up
webhookUrlStringURL to notify when the run finishes (success or failure). Useful for Zapier, Make, and n8n integrations

Input examples

Scrape 50 recent posts from a publication

{
"startUrls": [{ "url": "https://www.astralcodexten.com/" }],
"maxItems": 50,
"scrapeFormat": "markdown"
}

Scrape specific posts

{
"startUrls": [
{ "url": "https://www.astralcodexten.com/p/seiu-delenda-est" }
],
"scrapeFormat": "html"
}

Metadata only — fastest option, no body content

{
"startUrls": [
{ "url": "https://www.astralcodexten.com/" },
{ "url": "https://platformer.news/" }
],
"maxItems": 500,
"scrapeFormat": "none"
}

Substack Scraper output

Each scraped post is stored as a single JSON record in the Actor's dataset:

{
"url": "https://www.astralcodexten.com/p/seiu-delenda-est",
"publicationName": "Astral Codex Ten",
"authorName": "Scott Alexander",
"title": "SEIU Delenda Est",
"subtitle": "",
"postDate": "2024-01-15T10:00:00.000Z",
"likes": 551,
"comments": 655,
"isPaywalled": false,
"body": "# SEIU Delenda Est\n\nPost content in markdown..."
}
FieldTypeDescription
urlStringCanonical post URL
publicationNameStringName of the Substack publication
authorNameStringAuthor's display name
titleStringPost title
subtitleStringPost subtitle (if present)
postDateStringISO 8601 publish timestamp
likesNumberTotal reactions across all 8 reaction types (❤ 👍 🎉 🔥 😂 😮 😢 😡)
commentsNumberNumber of comments
isPaywalledBooleantrue if the post requires a paid subscription to read in full
bodyString|nullPost content in the requested format; null when scrapeFormat is "none"

How much will it cost to scrape Substack?

This Actor uses Pay Per Result pricing — you are charged per post scraped, not per compute time.

Apify gives you $5 free usage credits every month on the Apify Free plan. You can scrape around 2,500 Substack posts per month for that, so those 2,500 results will be completely free!

But if you need to get more data regularly from Substack, you should grab an Apify subscription. We recommend our $49/month Personal plan — you can get up to 25,000 posts every month with the $49 monthly plan!

Or get 250,000+ posts for $499 with the Team plan — wow!

What are the limitations of Substack Scraper?

  • Paywalled content — only free-preview text is available for paid-only posts; full body requires a subscriber session, which is not supported
  • Rate limiting — Substack may throttle aggressive scraping; the Actor uses automatic IP rotation via datacenter proxies to mitigate this
  • Frontend changes — if Substack modifies their internal page structure, the Actor may need an update
  • Custom domains — most custom-domain Substacks work correctly; a small number with non-standard configurations may not
  • Comment content — only the comment count is extracted; individual comment text is not supported

This Actor only accesses publicly available posts and metadata. Paywalled content is never extracted. Web scraping of publicly accessible data is generally considered lawful in most jurisdictions for research, journalism, and personal use.

Note that personal data is protected by GDPR in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers.

You are responsible for complying with Substack's Terms of Service and applicable laws in your jurisdiction. We also recommend that you read our blog post: is web scraping legal?

Scrape Substack with the Apify API

You can trigger this Actor and download results programmatically using the Apify API. See the API tab on this Actor's page for ready-to-use code examples in JavaScript and Python, or check out the Apify API reference for full details.

Substack Scraper integrations

This Actor works with any platform that supports webhooks or the Apify API:

  • Zapier / Make / n8n — use the webhookUrl input field to receive a POST notification when the run finishes, then pass the actorRunId to the Apify API to fetch your results
  • Apify Integrations tab — configure webhooks, scheduled runs, and connections to Google Sheets, Slack, Airtable, and more directly in the Apify Console without writing code
  • REST API — start a run, poll for completion, and download the dataset via the Apify API v2

API example — Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("sleek_waveform/substack-creator-scraper").call(run_input={
"startUrls": [{"url": "https://www.astralcodexten.com/"}],
"maxItems": 100,
"scrapeFormat": "markdown"
})
posts = client.dataset(run["defaultDatasetId"]).list_items().items
for post in posts:
print(f"{post['postDate'][:10]} | {post['likes']} likes | {post['title']}")

FAQ about Substack Scraper

Does this Actor require a Substack account or login? No. It only extracts publicly available posts and metadata — no login credentials, session cookies, or Substack API key are required.

Can I scrape paid/paywalled posts? Only the free-preview portion of paywalled posts is accessible. Full body content behind a paid subscription wall is not extracted. The isPaywalled field tells you whether a post is behind a paywall.

How do I scrape the full archive of a newsletter? Set maxItems to a high number (e.g., 1000) and point startUrls to the publication homepage (e.g., https://platformer.news/). The scraper auto-paginates through the entire archive until it hits maxItems or exhausts all posts.

Can I scrape multiple publications at once? Yes. Add multiple URLs to startUrls. Each publication is scraped independently, and all results land in the same dataset with publicationName as a filter column.

What format does the body field use? Your choice: "markdown" (clean prose, good for LLMs and vector databases), "html" (preserves formatting for display), or "none" (metadata only — fastest option for engagement analysis without needing body text).

How many posts can I scrape on the free plan? With Apify's $5 monthly free credit, approximately 2,500 posts per month at no cost.

Does it scrape reader comments? Comment count is extracted (comments field), but individual comment text is not — Substack serves comments via a separate authenticated endpoint.

How do I monitor new posts from a publication weekly? Set up a scheduled run on Apify: Actor page → Schedule → weekly. Filter for posts newer than a specific date by combining maxItems: 20 (which always returns the most recent) with the postDate field in your downstream processing.

Can I use this for LLM training data? Yes. The "markdown" output format produces clean, boilerplate-free prose ideal for LLM fine-tuning and RAG pipelines. Pair with the Website to Markdown Scraper to build multi-source AI training datasets.

High-value Substack publications to scrape

CategoryExample publications
AI / TechStratechery, Import AI, The Batch, AI Supremacy
FinanceThe Diff, Money Stuff (Bloomberg), Odd Lots
Media / PoliticsSemafor, Platformer, The Atlantic
Growth / StartupsLenny's Newsletter, The Generalist, Not Boring
Newsletter operatorsThe Rebooting, Inbox Collective

Other sleek_waveform Actors you might like

  • Website to Markdown Scraper — crawl any website and extract clean Markdown for RAG pipelines. Pairs with Substack Scraper to build multi-source LLM datasets.
  • Threads Profile & Post Scraper — scrape Threads posts, hashtags, and engagement metrics. Many Substack writers cross-post to Threads — combine both scrapers for a full picture of a creator's reach.
  • YouTube Trend Scraper — track trending YouTube videos by keyword. Compare Substack newsletter topics against what's gaining traction on YouTube for cross-platform content strategy.

Found this Actor useful? Leave a review on the Apify Store — it takes 30 seconds and helps other developers discover it.