Substack Scraper avatar

Substack Scraper

Pricing

from $0.35 / 1,000 posts

Go to Apify Store
Substack Scraper

Substack Scraper

Extract complete data from Substack newsletters including posts, authors, engagement metrics, and article text. 13 fields per post. Fast and reliable.

Pricing

from $0.35 / 1,000 posts

Rating

2.6

(2)

Developer

LIAICHI MUSTAPHA

LIAICHI MUSTAPHA

Maintained by Community

Actor stats

5

Bookmarked

46

Total users

2

Monthly active users

4.9 hours

Issues response

a day ago

Last modified

Share

Substack Newsletter Scraper

Scrape any Substack newsletter and extract posts, engagement metrics, author data, and full article text — ready for AI training, competitive analysis, or content research.

Features

  • 13 data fields per post — headline, subheading, author, date, likes, comments, restacks, article text, and more
  • Full article text extraction (preview text for paywalled posts)
  • Engagement metrics — likes, comments, and restacks per post
  • Two scraping methods — sitemap (fast, recommended) and archive page (fallback)
  • Batch processing — scrape dozens of newsletters in a single run
  • Dynamic memory — scales automatically, no manual configuration needed

Use Cases

  • AI training data — Build large text datasets from thousands of Substack articles
  • Competitive analysis — Track what newsletters in your niche publish and what resonates
  • Content research — Identify trending topics and high-engagement post formats
  • Newsletter audits — Analyze posting frequency, author mix, and free/paid ratio
  • Market research — Monitor thought leaders and industry publications at scale

Input

FieldTypeRequiredDefaultDescription
substackUrlsArrayYesSubstack newsletter URLs (e.g. https://example.substack.com)
scrapingMethodStringNositemap"sitemap" (faster) or "archive" (fallback)
maxPostsPerSubstackIntegerNo0Posts per newsletter — 0 means unlimited
batchSizeIntegerNo20Newsletters processed per batch

Example input:

{
"substackUrls": [
"https://tedhope.substack.com",
"https://stratechery.com"
],
"scrapingMethod": "sitemap",
"maxPostsPerSubstack": 100,
"batchSize": 20
}

Output

Each item in the dataset represents one Substack post:

{
"substack_url": "https://tedhope.substack.com",
"post_url": "https://tedhope.substack.com/p/the-regeneration-will-be-live",
"headline": "The Regeneration Will Be Live",
"subheading": "Why in-person experiences are making a comeback",
"author_name": "Ted Hope",
"author_url": "https://substack.com/@tedhope",
"date": "December 10, 2024",
"free_or_paid": "Free",
"likes": 156,
"comments": 23,
"restacks": 12,
"article_text": "Full article content here...",
"content_type": "full"
}

content_type values: "full" (complete text), "preview_only" (paywalled), "failed" (extraction error).

How to Use

Via Apify Console

  1. Open the actor on Apify Store
  2. Click Try for free
  3. Paste your Substack URLs into the Substack URLs field
  4. Optionally set maxPostsPerSubstack to limit results per newsletter
  5. Click Start and wait for the run to complete
  6. Download your results as JSON, CSV, or Excel

Via Apify API (Python)

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("USERNAME/substack-scraper").call(run_input={
"substackUrls": ["https://tedhope.substack.com"],
"scrapingMethod": "sitemap",
"maxPostsPerSubstack": 100
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["headline"], item["likes"])

Via Apify API (JavaScript)

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('USERNAME/substack-scraper').call({
substackUrls: ['https://tedhope.substack.com'],
maxPostsPerSubstack: 100,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();

Pricing

This actor is billed by compute units — you pay only for what you use.

ScaleApproximate CostEstimated Time
1 newsletter, 100 posts< $0.01~30 seconds
10 newsletters, 100 posts each$0.01–$0.052–5 minutes
100 newsletters$0.50–$1.0030–60 minutes
1,000 newsletters$5–$105–10 hours

New to Apify? Every account includes $5 in free monthly credits — enough to scrape thousands of posts at no cost.

FAQ

Can I scrape paywalled Substack posts? You'll receive preview text for paid posts, not the full article. Full content is extracted for free posts only. Login-based access is not currently supported.

How many newsletters can I scrape in one run? There is no hard limit. For large-scale runs (100+ newsletters), lower batchSize to 5–10 for more stable results.

What is the difference between sitemap and archive scraping methods? Sitemap is faster and more reliable — it discovers all posts directly from the newsletter's XML sitemap. Archive page is a fallback for newsletters that don't publish a sitemap.

How do I get only the most recent posts? Set maxPostsPerSubstack to a small number (e.g. 10). Posts are returned newest-first, so you'll always get the latest content.

How do I use the scraped data for AI training? The article_text field contains the full article body. Export the dataset as JSON or CSV, then load it into your training pipeline. Filter by content_type: "full" to exclude previews.

Is scraping Substack legal? This actor accesses only publicly available content. Always respect Substack's robots.txt, use reasonable rate limits, and comply with applicable terms of service and data regulations.


Changelog

v1.0.4 — April 2, 2026

  • Fixed: Incomplete scraping results caused by mishandled nested sitemap indexes
  • Fixed: Memory misconfiguration (default was incorrectly set to 16GB)
  • Improved: Cost efficiency — single-newsletter runs now ~94% cheaper

v1.0.0 — Initial Release

  • Full post scraping via sitemap and archive methods
  • 13 data fields per post including engagement metrics
  • Batch processing support
  • Paid/free post detection

Built by Mustapha Liaichi — Automation & Web Scraping Specialist