Substack Scraper - Newsletter Posts and Authors avatar

Substack Scraper - Newsletter Posts and Authors

Pricing

Pay per usage

Go to Apify Store
Substack Scraper - Newsletter Posts and Authors

Substack Scraper - Newsletter Posts and Authors

Free Substack scraper. Extract newsletter posts, authors, subscribers at scale. No API key needed. Export JSON, CSV, Excel. Newsletter intelligence and lead gen.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

CryptoSignals Agent

CryptoSignals Agent

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 hours ago

Last modified

Share

Substack Scraper

Scrape Substack newsletter posts, authors, and publication data at scale. No API key or login needed.

What It Does

  • Scrape newsletter posts from any public Substack publication
  • Extract post metadata: title, author, date, engagement metrics, paywall status, word count
  • Get publication info: name, description, author, logo, custom domain
  • Search publications by keyword to discover newsletters
  • Pagination built-in: scrape hundreds or thousands of posts automatically
  • Export to JSON, CSV, or Excel

Use Cases

  • Newsletter Intelligence: Track what topics competitors cover, how often they publish, and which posts get the most engagement
  • B2B Lead Generation: Find newsletter authors and publications in your industry for outreach and partnerships
  • Content Analysis: Analyze publishing frequency, word counts, and engagement patterns across newsletters
  • Competitive Research: Monitor competitor newsletters for content strategy insights
  • Journalist Research: Find expert voices and trending topics across Substack's ecosystem
  • Market Research: Discover which newsletter niches have the most engagement and subscriber interest

Input Parameters

ParameterTypeDefaultDescription
publicationsArray of strings[]Substack publication subdomains (e.g., ["platformer", "thebrowser"])
searchQueryString""Search for publications by keyword
scrapeTypeString"posts"What to scrape: "posts", "publications", or "both"
maxItemsInteger50Max items per publication (0 = unlimited)
sortByString"new"Sort posts: "new" or "top"
includeBodyTextBooleanfalseInclude body text excerpt in output

Example Inputs

Scrape Posts from Multiple Publications

{
"publications": ["platformer", "thebrowser", "slow-boring"],
"scrapeType": "posts",
"maxItems": 25,
"sortBy": "new"
}

Get Publication Info

{
"publications": ["noahpinion", "astralcodexten"],
"scrapeType": "publications"
}

Scrape Posts with Body Text

{
"publications": ["stratechery"],
"scrapeType": "both",
"maxItems": 100,
"includeBodyText": true
}

Search for AI Newsletters

{
"searchQuery": "AI",
"maxItems": 10
}

Output

Post Fields

FieldTypeDescription
typeStringAlways "post"
titleStringPost title
subtitleStringPost subtitle
slugStringURL slug
postUrlStringFull URL to the post
authorNameStringAuthor's display name
publicationNameStringPublication name
publicationUrlStringPublication URL
publishDateStringISO 8601 publish date
descriptionStringPost description/excerpt
likeCountIntegerTotal reactions/likes
restackCountIntegerNumber of restacks
commentCountIntegerTotal comments (including replies)
wordCountIntegerWord count
isPaidBooleanWhether the post is behind a paywall
audienceString"everyone", "only_paid", etc.
coverImageStringCover image URL
canonicalUrlStringCanonical URL
postTypeString"newsletter", "podcast", etc.
tagsArrayPost tags
bodyTextExcerptStringTruncated body text (if includeBodyText is true)

Publication Fields

FieldTypeDescription
typeStringAlways "publication"
nameStringPublication name
subdomainStringSubstack subdomain
descriptionStringPublication description
authorNameStringPrimary author name
authorBioStringAuthor biography
logoUrlStringPublication logo URL
heroImageUrlStringHero/banner image URL
publicationUrlStringSubstack URL
customDomainUrlStringCustom domain URL (if set)
communityEnabledBooleanWhether community features are on
paymentsStateStringPayment status
twitterHandleStringAuthor's Twitter/X handle
createdAtStringPublication creation date

API Integration

JavaScript

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('cryptosignals/substack-scraper').call({
publications: ['platformer', 'thebrowser'],
scrapeType: 'posts',
maxItems: 50,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("cryptosignals/substack-scraper").call(run_input={
"publications": ["platformer", "thebrowser"],
"scrapeType": "posts",
"maxItems": 50,
})
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
for item in items:
print(f"{item['title']} - {item['likeCount']} likes")

cURL

curl "https://api.apify.com/v2/acts/cryptosignals~substack-scraper/runs?token=YOUR_API_TOKEN" \
-X POST \
-H "Content-Type: application/json" \
-d '{
"publications": ["platformer"],
"scrapeType": "posts",
"maxItems": 10
}'

FAQ

Does this need a Substack account or API key? No. This scraper uses Substack's public API endpoints. No authentication required.

Can it scrape paywalled content? No. Paywalled posts are flagged with isPaid: true, but the full content is not accessible. You'll get the title, metadata, and public excerpt.

How do I find a publication's subdomain? Look at the URL: https://platformer.substack.com -> subdomain is platformer. For custom domains, check the publication's Substack URL.

What about publications with custom domains? If a publication has moved to a custom domain, you still use the original Substack subdomain. The scraper will return both the Substack URL and custom domain URL.

Is there a rate limit? The scraper includes built-in rate limiting and retries. It's designed to be respectful of Substack's servers.

How many posts can I scrape? There's no hard limit. Set maxItems to control how many posts per publication. Set to 0 for unlimited.

Cost

This actor runs on the Apify platform. Typical runs:

  • 50 posts from 1 publication: ~5 seconds, minimal compute
  • 500 posts from 5 publications: ~30 seconds
  • Pricing details coming soon

This scraper accesses publicly available data through Substack's public API. Users are responsible for complying with Substack's terms of service and applicable laws. Do not use scraped data for spam, harassment, or any illegal purpose.