Substack Posts & Creator Scraper
Pricing
from $2.00 / 1,000 posts
Substack Posts & Creator Scraper
Scrape posts, engagement metrics, and author data from any Substack publication. Get title, author, publish date, likes, comments, paywall status, and full body in Markdown or HTML. Paginates the full archive automatically.
Pricing
from $2.00 / 1,000 posts
Rating
0.0
(0)
Developer
Daniel Dimitrov
Actor stats
0
Bookmarked
3
Total users
1
Monthly active users
15 days ago
Last modified
Categories
Share
What does Substack Scraper do?
Substack Scraper extracts post content, engagement metrics, and author data from any Substack publication or individual post URL. It accesses Substack's internal JSON structure directly — no headless browser needed — giving you clean, structured data in seconds rather than minutes.
With a single publication URL, the scraper automatically paginates through the entire archive and returns every post with its title, author, publish date, total reactions, comment count, paywall status, and full body content in your choice of Markdown or HTML.
Why scrape Substack?
Substack has become the home for thousands of high-quality newsletters and independent journalists. The data available on public posts is invaluable for:
- Competitor and trend analysis — track what content performs best in your niche, monitor publishing frequency and engagement patterns across publications
- Creator and influencer research — build lists of authors with engagement benchmarks for outreach and partnership decisions
- Newsletter research — study the structure, cadence, and topics of top-performing newsletters before launching your own
- Content backup — archive your own Substack posts with engagement history before platform changes
- AI and NLP training data — extract clean, structured long-form text with rich metadata at scale
- PR and media monitoring — track journalist activity and media coverage across Substack publications
If you would like more inspiration on how scraping Substack could help your business, check out our industry pages.
Before you start scraping Substack
You need a free Apify account to run this Actor. The free plan includes $5 in monthly credits — enough to scrape several thousand posts. No credit card required.
For large-scale jobs (100,000+ posts), the Actor automatically uses Apify datacenter proxies to rotate IPs and avoid Substack rate limiting.
How to scrape Substack
- Open the Actor in Apify Console and click Try for free
- Enter one or more Substack publication URLs (e.g.,
https://www.astralcodexten.com/) or direct post URLs - Set Max Posts Per Publication (default: 100) and choose your preferred Output Format
- Click Start and wait for the run to complete
- Download your data from the Dataset tab — available in JSON, CSV, Excel, and HTML
Substack Scraper input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
startUrls | Array | ✅ | — | Substack publication homepages or individual post URLs |
maxItems | Number | ❌ | 100 | Max posts to scrape per publication |
scrapeFormat | String | ❌ | "markdown" | Post body format: "markdown", "html", or "none" (metadata only) |
maxRequestRetries | Number | ❌ | 3 | Retry attempts before a request is abandoned |
maxSessionRotations | Number | ❌ | 10 | Session rotations per request before giving up |
webhookUrl | String | ❌ | — | URL to notify when the run finishes (success or failure). Useful for Zapier, Make, and n8n integrations |
Input examples
Scrape 50 recent posts from a publication
{"startUrls": [{ "url": "https://www.astralcodexten.com/" }],"maxItems": 50,"scrapeFormat": "markdown"}
Scrape specific posts
{"startUrls": [{ "url": "https://www.astralcodexten.com/p/seiu-delenda-est" }],"scrapeFormat": "html"}
Metadata only — fastest option, no body content
{"startUrls": [{ "url": "https://www.astralcodexten.com/" },{ "url": "https://platformer.news/" }],"maxItems": 500,"scrapeFormat": "none"}
Substack Scraper output
Each scraped post is stored as a single JSON record in the Actor's dataset:
{"url": "https://www.astralcodexten.com/p/seiu-delenda-est","publicationName": "Astral Codex Ten","authorName": "Scott Alexander","title": "SEIU Delenda Est","subtitle": "","postDate": "2024-01-15T10:00:00.000Z","likes": 551,"comments": 655,"isPaywalled": false,"body": "# SEIU Delenda Est\n\nPost content in markdown..."}
| Field | Type | Description |
|---|---|---|
url | String | Canonical post URL |
publicationName | String | Name of the Substack publication |
authorName | String | Author's display name |
title | String | Post title |
subtitle | String | Post subtitle (if present) |
postDate | String | ISO 8601 publish timestamp |
likes | Number | Total reactions across all 8 reaction types (❤ 👍 🎉 🔥 😂 😮 😢 😡) |
comments | Number | Number of comments |
isPaywalled | Boolean | true if the post requires a paid subscription to read in full |
body | String|null | Post content in the requested format; null when scrapeFormat is "none" |
How much will it cost to scrape Substack?
This Actor uses Pay Per Result pricing — you are charged per post scraped, not per compute time.
Apify gives you $5 free usage credits every month on the Apify Free plan. You can scrape around 2,500 Substack posts per month for that, so those 2,500 results will be completely free!
But if you need to get more data regularly from Substack, you should grab an Apify subscription. We recommend our $49/month Personal plan — you can get up to 25,000 posts every month with the $49 monthly plan!
Or get 250,000+ posts for $499 with the Team plan — wow!
What are the limitations of Substack Scraper?
- Paywalled content — only free-preview text is available for paid-only posts; full body requires a subscriber session, which is not supported
- Rate limiting — Substack may throttle aggressive scraping; the Actor uses automatic IP rotation via datacenter proxies to mitigate this
- Frontend changes — if Substack modifies their internal page structure, the Actor may need an update
- Custom domains — most custom-domain Substacks work correctly; a small number with non-standard configurations may not
- Comment content — only the comment count is extracted; individual comment text is not supported
Is it legal to scrape Substack?
This Actor only accesses publicly available posts and metadata. Paywalled content is never extracted. Web scraping of publicly accessible data is generally considered lawful in most jurisdictions for research, journalism, and personal use.
Note that personal data is protected by GDPR in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers.
You are responsible for complying with Substack's Terms of Service and applicable laws in your jurisdiction. We also recommend that you read our blog post: is web scraping legal?
Scrape Substack with the Apify API
You can trigger this Actor and download results programmatically using the Apify API. See the API tab on this Actor's page for ready-to-use code examples in JavaScript and Python, or check out the Apify API reference for full details.
Substack Scraper integrations
This Actor works with any platform that supports webhooks or the Apify API:
- Zapier / Make / n8n — use the
webhookUrlinput field to receive a POST notification when the run finishes, then pass theactorRunIdto the Apify API to fetch your results - Apify Integrations tab — configure webhooks, scheduled runs, and connections to Google Sheets, Slack, Airtable, and more directly in the Apify Console without writing code
- REST API — start a run, poll for completion, and download the dataset via the Apify API v2