Substack Scraper
Pricing
from $0.35 / 1,000 posts
Substack Scraper
Extract complete data from Substack newsletters including posts, authors, engagement metrics, and article text. 13 fields per post. Fast and reliable.
Pricing
from $0.35 / 1,000 posts
Rating
2.6
(2)
Developer
LIAICHI MUSTAPHA
Maintained by CommunityActor stats
5
Bookmarked
46
Total users
2
Monthly active users
4.9 hours
Issues response
a day ago
Last modified
Categories
Share
Substack Newsletter Scraper
Scrape any Substack newsletter and extract posts, engagement metrics, author data, and full article text — ready for AI training, competitive analysis, or content research.
Features
- 13 data fields per post — headline, subheading, author, date, likes, comments, restacks, article text, and more
- Full article text extraction (preview text for paywalled posts)
- Engagement metrics — likes, comments, and restacks per post
- Two scraping methods — sitemap (fast, recommended) and archive page (fallback)
- Batch processing — scrape dozens of newsletters in a single run
- Dynamic memory — scales automatically, no manual configuration needed
Use Cases
- AI training data — Build large text datasets from thousands of Substack articles
- Competitive analysis — Track what newsletters in your niche publish and what resonates
- Content research — Identify trending topics and high-engagement post formats
- Newsletter audits — Analyze posting frequency, author mix, and free/paid ratio
- Market research — Monitor thought leaders and industry publications at scale
Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
substackUrls | Array | Yes | — | Substack newsletter URLs (e.g. https://example.substack.com) |
scrapingMethod | String | No | sitemap | "sitemap" (faster) or "archive" (fallback) |
maxPostsPerSubstack | Integer | No | 0 | Posts per newsletter — 0 means unlimited |
batchSize | Integer | No | 20 | Newsletters processed per batch |
Example input:
{"substackUrls": ["https://tedhope.substack.com","https://stratechery.com"],"scrapingMethod": "sitemap","maxPostsPerSubstack": 100,"batchSize": 20}
Output
Each item in the dataset represents one Substack post:
{"substack_url": "https://tedhope.substack.com","post_url": "https://tedhope.substack.com/p/the-regeneration-will-be-live","headline": "The Regeneration Will Be Live","subheading": "Why in-person experiences are making a comeback","author_name": "Ted Hope","author_url": "https://substack.com/@tedhope","date": "December 10, 2024","free_or_paid": "Free","likes": 156,"comments": 23,"restacks": 12,"article_text": "Full article content here...","content_type": "full"}
content_type values: "full" (complete text), "preview_only" (paywalled), "failed" (extraction error).
How to Use
Via Apify Console
- Open the actor on Apify Store
- Click Try for free
- Paste your Substack URLs into the Substack URLs field
- Optionally set
maxPostsPerSubstackto limit results per newsletter - Click Start and wait for the run to complete
- Download your results as JSON, CSV, or Excel
Via Apify API (Python)
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("USERNAME/substack-scraper").call(run_input={"substackUrls": ["https://tedhope.substack.com"],"scrapingMethod": "sitemap","maxPostsPerSubstack": 100})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item["headline"], item["likes"])
Via Apify API (JavaScript)
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('USERNAME/substack-scraper').call({substackUrls: ['https://tedhope.substack.com'],maxPostsPerSubstack: 100,});const { items } = await client.dataset(run.defaultDatasetId).listItems();
Pricing
This actor is billed by compute units — you pay only for what you use.
| Scale | Approximate Cost | Estimated Time |
|---|---|---|
| 1 newsletter, 100 posts | < $0.01 | ~30 seconds |
| 10 newsletters, 100 posts each | $0.01–$0.05 | 2–5 minutes |
| 100 newsletters | $0.50–$1.00 | 30–60 minutes |
| 1,000 newsletters | $5–$10 | 5–10 hours |
New to Apify? Every account includes $5 in free monthly credits — enough to scrape thousands of posts at no cost.
FAQ
Can I scrape paywalled Substack posts? You'll receive preview text for paid posts, not the full article. Full content is extracted for free posts only. Login-based access is not currently supported.
How many newsletters can I scrape in one run?
There is no hard limit. For large-scale runs (100+ newsletters), lower batchSize to 5–10 for more stable results.
What is the difference between sitemap and archive scraping methods? Sitemap is faster and more reliable — it discovers all posts directly from the newsletter's XML sitemap. Archive page is a fallback for newsletters that don't publish a sitemap.
How do I get only the most recent posts?
Set maxPostsPerSubstack to a small number (e.g. 10). Posts are returned newest-first, so you'll always get the latest content.
How do I use the scraped data for AI training?
The article_text field contains the full article body. Export the dataset as JSON or CSV, then load it into your training pipeline. Filter by content_type: "full" to exclude previews.
Is scraping Substack legal? This actor accesses only publicly available content. Always respect Substack's robots.txt, use reasonable rate limits, and comply with applicable terms of service and data regulations.
Changelog
v1.0.4 — April 2, 2026
- Fixed: Incomplete scraping results caused by mishandled nested sitemap indexes
- Fixed: Memory misconfiguration (default was incorrectly set to 16GB)
- Improved: Cost efficiency — single-newsletter runs now ~94% cheaper
v1.0.0 — Initial Release
- Full post scraping via sitemap and archive methods
- 13 data fields per post including engagement metrics
- Batch processing support
- Paid/free post detection
Built by Mustapha Liaichi — Automation & Web Scraping Specialist