Substack Scraper — Posts, Authors & Newsletters avatar

Substack Scraper — Posts, Authors & Newsletters

Pricing

$4.99/month + usage

Go to Apify Store
Substack Scraper — Posts, Authors & Newsletters

Substack Scraper — Posts, Authors & Newsletters

Extract Substack newsletter content. Get post titles, authors, publish dates, paywall status, subscriber counts, and full article text. Ideal for newsletter research and content monitoring. PPE pricing — pay only for results.

Pricing

$4.99/month + usage

Rating

0.0

(0)

Developer

Web Data Labs

Web Data Labs

Maintained by Community

Actor stats

0

Bookmarked

17

Total users

1

Monthly active users

a day ago

Last modified

Share

Substack Scraper — Posts, Comments & Publication Data

Extract structured data from any Substack newsletter at scale. Scrape posts with full article text, reader comments, and publication metadata — no login required. Export to JSON, CSV, or Excel with a single click.

Why Use This Scraper?

Substack has grown into one of the most important platforms for independent journalism, thought leadership, and niche expertise. With over 35 million active subscriptions and 17,000+ paid writers, it's a goldmine for researchers, marketers, and analysts — but Substack offers no bulk export or public API.

This actor solves that. It programmatically extracts posts, comments, and publication info from any Substack newsletter, giving you clean, structured data ready for analysis.

Key Features

  • Three scrape modes: Posts, comments, and publication info
  • Search across Substack: Find posts by keyword across the entire platform
  • Publication-specific scraping: Target one or more newsletters by subdomain
  • Full article text: Optionally include the complete body text of each post
  • Flexible sorting: Sort by newest or top-performing posts
  • Scale control: Scrape from 1 to 500 items per run
  • No authentication needed: Works without any Substack account
  • Multiple export formats: JSON, CSV, Excel, XML, HTML

Use Cases

1. Content Research & Competitive Analysis

Track what topics are trending across newsletters in your industry. Monitor competitors' publishing frequency, engagement, and content strategy.

2. Media Monitoring & PR Intelligence

Set up regular scrapes to track mentions of your brand, product, or industry across Substack newsletters. Stay ahead of narratives before they hit mainstream media.

3. Academic & Market Research

Collect large datasets of expert opinion pieces, industry analysis, and commentary for qualitative research. Study how narratives form and spread through independent media.

4. Newsletter Discovery & Curation

Search for newsletters covering specific topics, then scrape their publication info to evaluate subscriber counts, posting cadence, and content quality.

5. Sentiment & Trend Analysis

Extract posts about specific topics or companies, then run NLP or sentiment analysis on the text. Detect shifts in expert opinion over time.

6. Lead Generation for B2B

Find Substack authors writing about your domain and extract their publication details. These are high-value contacts who are actively engaged in your space.

7. Content Repurposing & Summarization

Pull posts from newsletters you subscribe to and feed them into LLMs for summarization, translation, or content repurposing workflows.

Input Parameters

ParameterTypeRequiredDefaultDescription
publicationsArray of stringsNoSubstack subdomains to scrape (e.g., platformer for platformer.substack.com)
searchQueryStringNoSearch keyword to find posts across all of Substack
scrapeTypeStringNopostsWhat to scrape: posts, comments, or info
maxItemsIntegerNo50Maximum items to return (1–500)
sortByStringNonewSort order: new (newest first) or top (most popular)
includeBodyTextBooleanNofalseInclude the full body text of each post

Tip: Use publications to target specific newsletters, or searchQuery to search across the entire platform. You can combine both.

Sample Output

Posts Output

{
"title": "The AI Trust Crisis",
"subtitle": "Why users are losing faith in AI-generated content",
"slug": "the-ai-trust-crisis",
"publishedAt": "2026-03-01T10:30:00.000Z",
"canonicalUrl": "https://platformer.substack.com/p/the-ai-trust-crisis",
"author": "Casey Newton",
"publicationName": "Platformer",
"publicationSubdomain": "platformer",
"likes": 847,
"comments": 132,
"wordCount": 2450,
"isPaywalled": false,
"previewText": "The past month has brought a reckoning for AI companies...",
"coverImage": "https://substackcdn.com/image/fetch/...",
"tags": ["AI", "trust", "technology"]
}

Comments Output

{
"body": "This is exactly what I've been seeing in my industry...",
"author": "John Reader",
"date": "2026-03-01T14:22:00.000Z",
"likes": 23,
"postTitle": "The AI Trust Crisis",
"publicationSubdomain": "platformer"
}

Publication Info Output

{
"name": "Platformer",
"subdomain": "platformer",
"description": "Tech and democracy coverage",
"authorName": "Casey Newton",
"heroImage": "https://substackcdn.com/image/fetch/...",
"logoUrl": "https://substackcdn.com/image/fetch/...",
"themeColor": "#FF6719",
"subscriberCount": 250000,
"postCount": 1200
}

Integration Examples

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run_input = {
"publications": ["platformer", "thebrowser"],
"scrapeType": "posts",
"maxItems": 50,
"sortBy": "new",
"includeBodyText": True,
}
run = client.actor("cryptosignals/substack-scraper").call(run_input=run_input)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{item['title']}{item.get('likes', 0)} likes")

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const input = {
publications: ["platformer", "thebrowser"],
scrapeType: "posts",
maxItems: 50,
sortBy: "new",
includeBodyText: true,
};
const run = await client.actor("cryptosignals/substack-scraper").call(input);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(item => {
console.log(`${item.title}${item.likes || 0} likes`);
});

Using the Apify API Directly

curl -X POST "https://api.apify.com/v2/acts/cryptosignals~substack-scraper/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"publications": ["platformer"],
"scrapeType": "posts",
"maxItems": 20
}'

Pricing & Costs

This actor runs on the Apify platform using your account's compute units (CUs).

ScenarioEstimated Cost
50 posts from one publication~$0.01–$0.02
200 posts from multiple publications~$0.05–$0.10
500 posts with full body text~$0.10–$0.25

Costs depend on the number of items, whether body text is included (larger payloads), and the Apify plan you're on. Free plan users get $5/month in platform credits — enough for hundreds of scrapes.

Tips for Best Results

  1. Start small: Set maxItems to 5–10 for your first run to verify the output format meets your needs.
  2. Use publication subdomains: For platformer.substack.com, enter just platformer in the publications list.
  3. Enable body text selectively: Full article text significantly increases output size. Only enable it when you need the content for analysis.
  4. Combine with Apify integrations: Send results directly to Google Sheets, Slack, Zapier, Make, or webhooks for automated workflows.
  5. Schedule regular runs: Set up recurring scrapes to build longitudinal datasets or monitor newsletters over time.

Frequently Asked Questions

Can I scrape paywalled/subscriber-only posts?

The scraper extracts publicly available data. For paywalled posts, you'll get the title, preview text, metadata, and publication info, but not the full subscriber-only content.

How do I find a publication's subdomain?

Look at the newsletter URL. For https://platformer.substack.com, the subdomain is platformer. For custom domains, check the Substack about page.

Can I scrape custom domain Substack newsletters?

Yes. Use the publication's original Substack subdomain (before they switched to a custom domain). You can usually find it referenced on their about page or through a web search.

How often is the data updated?

Every run fetches live data directly from Substack. You always get the latest posts, comments, and metrics.

Is there a rate limit?

The scraper handles rate limiting automatically with built-in delays and retries. You don't need to configure anything.

Can I search for posts about a specific topic?

Yes! Use the searchQuery parameter to search across all of Substack, or combine it with publications to search within specific newsletters.

What export formats are available?

Apify supports JSON, CSV, Excel (XLSX), XML, HTML, and RSS. You can download in any format from the dataset tab after a run completes.

How do I integrate this with my existing workflow?

Use Apify's built-in integrations (Zapier, Make, Google Sheets, webhooks) or call the API directly from any programming language. See the code examples above.

Can I run this on a schedule?

Yes. Apify supports cron-like scheduling. Set up daily, weekly, or custom schedules from the actor's Schedules tab. Each run stores results in a new dataset.

What happens if a publication doesn't exist?

The scraper will log a warning for invalid subdomains and continue processing the remaining publications. Your run won't fail because of one bad input.