Substack Posts & Creator Scraper
Pricing
from $2.00 / 1,000 posts
Substack Posts & Creator Scraper
Scrape posts, engagement metrics, and author data from any Substack publication. Get title, author, publish date, likes, comments, paywall status, and full body in Markdown or HTML. Paginates the full archive automatically.
Pricing
from $2.00 / 1,000 posts
Rating
0.0
(0)
Developer
Daniel Dimitrov
Maintained by CommunityActor stats
0
Bookmarked
11
Total users
3
Monthly active users
11 hours ago
Last modified
Categories
Share
What does Substack Scraper do?
Substack Scraper extracts post content, engagement metrics, and author data from any Substack publication or individual post URL. It accesses Substack's internal JSON structure directly — no headless browser needed — giving you clean, structured data in seconds rather than minutes.
With a single publication URL, the scraper automatically paginates through the entire archive and returns every post with its title, author, publish date, total reactions, comment count, paywall status, and full body content in your choice of Markdown or HTML.
Why scrape Substack?
Substack has become the home for thousands of high-quality newsletters and independent journalists. The data available on public posts is invaluable for:
- Competitor and trend analysis — track what content performs best in your niche, monitor publishing frequency and engagement patterns across publications
- Creator and influencer research — build lists of authors with engagement benchmarks for outreach and partnership decisions
- Newsletter research — study the structure, cadence, and topics of top-performing newsletters before launching your own
- Content backup — archive your own Substack posts with engagement history before platform changes
- AI and NLP training data — extract clean, structured long-form text with rich metadata at scale
- PR and media monitoring — track journalist activity and media coverage across Substack publications
If you would like more inspiration on how scraping Substack could help your business, check out our industry pages.
Before you start scraping Substack
You need a free Apify account to run this Actor. The free plan includes $5 in monthly credits — enough to scrape several thousand posts. No credit card required.
For large-scale jobs (100,000+ posts), the Actor automatically uses Apify datacenter proxies to rotate IPs and avoid Substack rate limiting.
How to scrape Substack
- Open the Actor in Apify Console and click Try for free
- Enter one or more Substack publication URLs (e.g.,
https://www.astralcodexten.com/) or direct post URLs - Set Max Posts Per Publication (default: 100) and choose your preferred Output Format
- Click Start and wait for the run to complete
- Download your data from the Dataset tab — available in JSON, CSV, Excel, and HTML
Substack Scraper input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
startUrls | Array | ✅ | — | Substack publication homepages or individual post URLs |
maxItems | Number | ❌ | 100 | Max posts to scrape per publication |
scrapeFormat | String | ❌ | "markdown" | Post body format: "markdown", "html", or "none" (metadata only) |
maxRequestRetries | Number | ❌ | 3 | Retry attempts before a request is abandoned |
maxSessionRotations | Number | ❌ | 10 | Session rotations per request before giving up |
webhookUrl | String | ❌ | — | URL to notify when the run finishes (success or failure). Useful for Zapier, Make, and n8n integrations |
Input examples
Scrape 50 recent posts from a publication
{"startUrls": [{ "url": "https://www.astralcodexten.com/" }],"maxItems": 50,"scrapeFormat": "markdown"}
Scrape specific posts
{"startUrls": [{ "url": "https://www.astralcodexten.com/p/seiu-delenda-est" }],"scrapeFormat": "html"}
Metadata only — fastest option, no body content
{"startUrls": [{ "url": "https://www.astralcodexten.com/" },{ "url": "https://platformer.news/" }],"maxItems": 500,"scrapeFormat": "none"}
Substack Scraper output
Each scraped post is stored as a single JSON record in the Actor's dataset:
{"url": "https://www.astralcodexten.com/p/seiu-delenda-est","publicationName": "Astral Codex Ten","authorName": "Scott Alexander","title": "SEIU Delenda Est","subtitle": "","postDate": "2024-01-15T10:00:00.000Z","likes": 551,"comments": 655,"isPaywalled": false,"body": "# SEIU Delenda Est\n\nPost content in markdown..."}
| Field | Type | Description |
|---|---|---|
url | String | Canonical post URL |
publicationName | String | Name of the Substack publication |
authorName | String | Author's display name |
title | String | Post title |
subtitle | String | Post subtitle (if present) |
postDate | String | ISO 8601 publish timestamp |
likes | Number | Total reactions across all 8 reaction types (❤ 👍 🎉 🔥 😂 😮 😢 😡) |
comments | Number | Number of comments |
isPaywalled | Boolean | true if the post requires a paid subscription to read in full |
body | String|null | Post content in the requested format; null when scrapeFormat is "none" |
How much will it cost to scrape Substack?
This Actor uses Pay Per Result pricing — you are charged per post scraped, not per compute time.
Apify gives you $5 free usage credits every month on the Apify Free plan. You can scrape around 2,500 Substack posts per month for that, so those 2,500 results will be completely free!
But if you need to get more data regularly from Substack, you should grab an Apify subscription. We recommend our $49/month Personal plan — you can get up to 25,000 posts every month with the $49 monthly plan!
Or get 250,000+ posts for $499 with the Team plan — wow!
What are the limitations of Substack Scraper?
- Paywalled content — only free-preview text is available for paid-only posts; full body requires a subscriber session, which is not supported
- Rate limiting — Substack may throttle aggressive scraping; the Actor uses automatic IP rotation via datacenter proxies to mitigate this
- Frontend changes — if Substack modifies their internal page structure, the Actor may need an update
- Custom domains — most custom-domain Substacks work correctly; a small number with non-standard configurations may not
- Comment content — only the comment count is extracted; individual comment text is not supported
Is it legal to scrape Substack?
This Actor only accesses publicly available posts and metadata. Paywalled content is never extracted. Web scraping of publicly accessible data is generally considered lawful in most jurisdictions for research, journalism, and personal use.
Note that personal data is protected by GDPR in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers.
You are responsible for complying with Substack's Terms of Service and applicable laws in your jurisdiction. We also recommend that you read our blog post: is web scraping legal?
Scrape Substack with the Apify API
You can trigger this Actor and download results programmatically using the Apify API. See the API tab on this Actor's page for ready-to-use code examples in JavaScript and Python, or check out the Apify API reference for full details.
Substack Scraper integrations
This Actor works with any platform that supports webhooks or the Apify API:
- Zapier / Make / n8n — use the
webhookUrlinput field to receive a POST notification when the run finishes, then pass theactorRunIdto the Apify API to fetch your results - Apify Integrations tab — configure webhooks, scheduled runs, and connections to Google Sheets, Slack, Airtable, and more directly in the Apify Console without writing code
- REST API — start a run, poll for completion, and download the dataset via the Apify API v2
API example — Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("sleek_waveform/substack-creator-scraper").call(run_input={"startUrls": [{"url": "https://www.astralcodexten.com/"}],"maxItems": 100,"scrapeFormat": "markdown"})posts = client.dataset(run["defaultDatasetId"]).list_items().itemsfor post in posts:print(f"{post['postDate'][:10]} | {post['likes']} likes | {post['title']}")
FAQ about Substack Scraper
Does this Actor require a Substack account or login? No. It only extracts publicly available posts and metadata — no login credentials, session cookies, or Substack API key are required.
Can I scrape paid/paywalled posts?
Only the free-preview portion of paywalled posts is accessible. Full body content behind a paid subscription wall is not extracted. The isPaywalled field tells you whether a post is behind a paywall.
How do I scrape the full archive of a newsletter?
Set maxItems to a high number (e.g., 1000) and point startUrls to the publication homepage (e.g., https://platformer.news/). The scraper auto-paginates through the entire archive until it hits maxItems or exhausts all posts.
Can I scrape multiple publications at once?
Yes. Add multiple URLs to startUrls. Each publication is scraped independently, and all results land in the same dataset with publicationName as a filter column.
What format does the body field use?
Your choice: "markdown" (clean prose, good for LLMs and vector databases), "html" (preserves formatting for display), or "none" (metadata only — fastest option for engagement analysis without needing body text).
How many posts can I scrape on the free plan? With Apify's $5 monthly free credit, approximately 2,500 posts per month at no cost.
Does it scrape reader comments?
Comment count is extracted (comments field), but individual comment text is not — Substack serves comments via a separate authenticated endpoint.
How do I monitor new posts from a publication weekly?
Set up a scheduled run on Apify: Actor page → Schedule → weekly. Filter for posts newer than a specific date by combining maxItems: 20 (which always returns the most recent) with the postDate field in your downstream processing.
Can I use this for LLM training data?
Yes. The "markdown" output format produces clean, boilerplate-free prose ideal for LLM fine-tuning and RAG pipelines. Pair with the Website to Markdown Scraper to build multi-source AI training datasets.
High-value Substack publications to scrape
| Category | Example publications |
|---|---|
| AI / Tech | Stratechery, Import AI, The Batch, AI Supremacy |
| Finance | The Diff, Money Stuff (Bloomberg), Odd Lots |
| Media / Politics | Semafor, Platformer, The Atlantic |
| Growth / Startups | Lenny's Newsletter, The Generalist, Not Boring |
| Newsletter operators | The Rebooting, Inbox Collective |
Other sleek_waveform Actors you might like
- Website to Markdown Scraper — crawl any website and extract clean Markdown for RAG pipelines. Pairs with Substack Scraper to build multi-source LLM datasets.
- Threads Profile & Post Scraper — scrape Threads posts, hashtags, and engagement metrics. Many Substack writers cross-post to Threads — combine both scrapers for a full picture of a creator's reach.
- YouTube Trend Scraper — track trending YouTube videos by keyword. Compare Substack newsletter topics against what's gaining traction on YouTube for cross-platform content strategy.
Found this Actor useful? Leave a review on the Apify Store — it takes 30 seconds and helps other developers discover it.