Substack Scraper — Posts, Authors & Newsletters
Pricing
$4.99/month + usage
Substack Scraper — Posts, Authors & Newsletters
Extract Substack newsletter content. Get post titles, authors, publish dates, paywall status, subscriber counts, and full article text. Ideal for newsletter research and content monitoring. PPE pricing — pay only for results.
Pricing
$4.99/month + usage
Rating
0.0
(0)
Developer
Web Data Labs
Actor stats
0
Bookmarked
17
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Substack Scraper — Posts, Comments & Publication Data
Extract structured data from any Substack newsletter at scale. Scrape posts with full article text, reader comments, and publication metadata — no login required. Export to JSON, CSV, or Excel with a single click.
Why Use This Scraper?
Substack has grown into one of the most important platforms for independent journalism, thought leadership, and niche expertise. With over 35 million active subscriptions and 17,000+ paid writers, it's a goldmine for researchers, marketers, and analysts — but Substack offers no bulk export or public API.
This actor solves that. It programmatically extracts posts, comments, and publication info from any Substack newsletter, giving you clean, structured data ready for analysis.
Key Features
- Three scrape modes: Posts, comments, and publication info
- Search across Substack: Find posts by keyword across the entire platform
- Publication-specific scraping: Target one or more newsletters by subdomain
- Full article text: Optionally include the complete body text of each post
- Flexible sorting: Sort by newest or top-performing posts
- Scale control: Scrape from 1 to 500 items per run
- No authentication needed: Works without any Substack account
- Multiple export formats: JSON, CSV, Excel, XML, HTML
Use Cases
1. Content Research & Competitive Analysis
Track what topics are trending across newsletters in your industry. Monitor competitors' publishing frequency, engagement, and content strategy.
2. Media Monitoring & PR Intelligence
Set up regular scrapes to track mentions of your brand, product, or industry across Substack newsletters. Stay ahead of narratives before they hit mainstream media.
3. Academic & Market Research
Collect large datasets of expert opinion pieces, industry analysis, and commentary for qualitative research. Study how narratives form and spread through independent media.
4. Newsletter Discovery & Curation
Search for newsletters covering specific topics, then scrape their publication info to evaluate subscriber counts, posting cadence, and content quality.
5. Sentiment & Trend Analysis
Extract posts about specific topics or companies, then run NLP or sentiment analysis on the text. Detect shifts in expert opinion over time.
6. Lead Generation for B2B
Find Substack authors writing about your domain and extract their publication details. These are high-value contacts who are actively engaged in your space.
7. Content Repurposing & Summarization
Pull posts from newsletters you subscribe to and feed them into LLMs for summarization, translation, or content repurposing workflows.
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
publications | Array of strings | No | — | Substack subdomains to scrape (e.g., platformer for platformer.substack.com) |
searchQuery | String | No | — | Search keyword to find posts across all of Substack |
scrapeType | String | No | posts | What to scrape: posts, comments, or info |
maxItems | Integer | No | 50 | Maximum items to return (1–500) |
sortBy | String | No | new | Sort order: new (newest first) or top (most popular) |
includeBodyText | Boolean | No | false | Include the full body text of each post |
Tip: Use
publicationsto target specific newsletters, orsearchQueryto search across the entire platform. You can combine both.
Sample Output
Posts Output
{"title": "The AI Trust Crisis","subtitle": "Why users are losing faith in AI-generated content","slug": "the-ai-trust-crisis","publishedAt": "2026-03-01T10:30:00.000Z","canonicalUrl": "https://platformer.substack.com/p/the-ai-trust-crisis","author": "Casey Newton","publicationName": "Platformer","publicationSubdomain": "platformer","likes": 847,"comments": 132,"wordCount": 2450,"isPaywalled": false,"previewText": "The past month has brought a reckoning for AI companies...","coverImage": "https://substackcdn.com/image/fetch/...","tags": ["AI", "trust", "technology"]}
Comments Output
{"body": "This is exactly what I've been seeing in my industry...","author": "John Reader","date": "2026-03-01T14:22:00.000Z","likes": 23,"postTitle": "The AI Trust Crisis","publicationSubdomain": "platformer"}
Publication Info Output
{"name": "Platformer","subdomain": "platformer","description": "Tech and democracy coverage","authorName": "Casey Newton","heroImage": "https://substackcdn.com/image/fetch/...","logoUrl": "https://substackcdn.com/image/fetch/...","themeColor": "#FF6719","subscriberCount": 250000,"postCount": 1200}
Integration Examples
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run_input = {"publications": ["platformer", "thebrowser"],"scrapeType": "posts","maxItems": 50,"sortBy": "new","includeBodyText": True,}run = client.actor("cryptosignals/substack-scraper").call(run_input=run_input)for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"{item['title']} — {item.get('likes', 0)} likes")
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const input = {publications: ["platformer", "thebrowser"],scrapeType: "posts",maxItems: 50,sortBy: "new",includeBodyText: true,};const run = await client.actor("cryptosignals/substack-scraper").call(input);const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach(item => {console.log(`${item.title} — ${item.likes || 0} likes`);});
Using the Apify API Directly
curl -X POST "https://api.apify.com/v2/acts/cryptosignals~substack-scraper/runs?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"publications": ["platformer"],"scrapeType": "posts","maxItems": 20}'
Pricing & Costs
This actor runs on the Apify platform using your account's compute units (CUs).
| Scenario | Estimated Cost |
|---|---|
| 50 posts from one publication | ~$0.01–$0.02 |
| 200 posts from multiple publications | ~$0.05–$0.10 |
| 500 posts with full body text | ~$0.10–$0.25 |
Costs depend on the number of items, whether body text is included (larger payloads), and the Apify plan you're on. Free plan users get $5/month in platform credits — enough for hundreds of scrapes.
Tips for Best Results
- Start small: Set
maxItemsto 5–10 for your first run to verify the output format meets your needs. - Use publication subdomains: For
platformer.substack.com, enter justplatformerin the publications list. - Enable body text selectively: Full article text significantly increases output size. Only enable it when you need the content for analysis.
- Combine with Apify integrations: Send results directly to Google Sheets, Slack, Zapier, Make, or webhooks for automated workflows.
- Schedule regular runs: Set up recurring scrapes to build longitudinal datasets or monitor newsletters over time.
Frequently Asked Questions
Can I scrape paywalled/subscriber-only posts?
The scraper extracts publicly available data. For paywalled posts, you'll get the title, preview text, metadata, and publication info, but not the full subscriber-only content.
How do I find a publication's subdomain?
Look at the newsletter URL. For https://platformer.substack.com, the subdomain is platformer. For custom domains, check the Substack about page.
Can I scrape custom domain Substack newsletters?
Yes. Use the publication's original Substack subdomain (before they switched to a custom domain). You can usually find it referenced on their about page or through a web search.
How often is the data updated?
Every run fetches live data directly from Substack. You always get the latest posts, comments, and metrics.
Is there a rate limit?
The scraper handles rate limiting automatically with built-in delays and retries. You don't need to configure anything.
Can I search for posts about a specific topic?
Yes! Use the searchQuery parameter to search across all of Substack, or combine it with publications to search within specific newsletters.
What export formats are available?
Apify supports JSON, CSV, Excel (XLSX), XML, HTML, and RSS. You can download in any format from the dataset tab after a run completes.
How do I integrate this with my existing workflow?
Use Apify's built-in integrations (Zapier, Make, Google Sheets, webhooks) or call the API directly from any programming language. See the code examples above.
Can I run this on a schedule?
Yes. Apify supports cron-like scheduling. Set up daily, weekly, or custom schedules from the actor's Schedules tab. Each run stores results in a new dataset.
What happens if a publication doesn't exist?
The scraper will log a warning for invalid subdomains and continue processing the remaining publications. Your run won't fail because of one bad input.