Substack Publication Scraper
Pricing
from $8.25 / 1,000 items
Substack Publication Scraper
Pull every public post from any Substack publication with title, subtitle, body preview, author, publish date, podcast URL, audience type, comment count, and reactions. Filter by post type and date range. Export to JSON, CSV, or Excel for newsletter research and competitive intelligence.
Pricing
from $8.25 / 1,000 items
Rating
0.0
(0)
Developer
ParseForge
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Share

📰 Substack Publication Scraper
🚀 Pull every public post from any Substack publication. Title, body preview, author, podcast, paywall flag, comment count, reactions. No login, no API key, no manual scrolling.
🕒 Last updated: 2026-05-01 · 📊 27 fields per post · 📰 millions of newsletters · 🎙️ podcast metadata included · 💎 paid + free posts
The Substack Publication Scraper queries the public Substack archive endpoints for any publication and returns every post in the feed. Each record includes the post title, social title, subtitle, description, slug, canonical URL, publish date, post type, audience flag, paywall status, cover image, podcast duration, word count, reaction count, comment count, restack count, section info, and a truncated body preview.
Substack hosts millions of newsletters and is the largest creator-operated publishing platform on the internet. Top publications cross hundreds of thousands of paid subscribers and rival traditional media in influence. This Actor exports the full archive of any publication in a single run, letting you research content cadence, audience signals, and editorial mix without a manual subscribe-and-scroll workflow.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Newsletter writers, content marketers, ghost writers, journalists, podcasters, researchers | Content research, cadence analysis, audience mining, podcast discovery, competitive benchmarking |
📋 What the Substack Publication Scraper does
Five filtering workflows in a single run:
- 📰 Full archive export. Submit one publication subdomain or custom domain and pull its entire post archive.
- 📅 Date range filter. Pin to a specific year, quarter, or month using
minDateandmaxDate. - 🎙️ Type filter. Restrict to
newsletter,podcast, orthreadposts. - 💎 Paywall awareness. Each record flags whether the post is
everyone(free) oronly_paid(subscriber-only). - 🔍 Engagement signals. Comment count, reaction count, restack count, and word count surface engagement patterns.
Each row reports the publication slug, post ID, full title and subtitle, slug, canonical URL, publish timestamp, type, audience, cover image URL, podcast duration when present, word count, engagement counters, and a 200-character body preview.
💡 Why it matters: Substack publications are time-machines for content strategy. Cadence, average word count, paywall ratio, and reaction-to-comment ratios all reveal what resonates. Researchers cite Substack archives in studies of opinion journalism. Ghost writers reverse-engineer voice from existing posts. Content marketers benchmark themselves against the best operators in their niche.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
maxItems | integer | 10 | Posts to return. Free plan caps at 10, paid plan at 1,000,000. |
publication | string | "lex" | Subdomain (lex) or full custom domain (www.lennysnewsletter.com). |
postType | string | "all" | Filter to newsletter, podcast, thread, or all. |
minDate | string | empty | ISO date YYYY-MM-DD. Only posts on or after this date. |
maxDate | string | empty | ISO date YYYY-MM-DD. Only posts on or before this date. |
Example: 100 most recent posts from a custom-domain publication.
{"maxItems": 100,"publication": "www.lennysnewsletter.com"}
Example: every paid podcast episode in 2026.
{"maxItems": 200,"publication": "lex","postType": "podcast","minDate": "2026-01-01","maxDate": "2026-12-31"}
⚠️ Good to Know: Substack subdomains are case sensitive in the URL but the Actor normalizes to lowercase before the request. Paid posts return only the truncated free preview in
truncatedBodyText. Subscriber-only full body content is not exposed by the public archive endpoint and is out of scope.
📊 Output
Each post record contains 27 fields. Download as CSV, Excel, JSON, or XML.
🧾 Schema
| Field | Type | Example |
|---|---|---|
🏷️ publication | string | "lex" |
🆔 postId | integer | 195849359 |
📰 title | string | "Analysis: The Machines are working..." |
🪧 subtitle | string | "AI capital is being mobilized..." |
🔖 slug | string | "analysis-the-machines-are-working" |
🔗 url | string | "https://lex.substack.com/p/..." |
📅 postDate | ISO 8601 | "2026-04-29T16:14:34.158Z" |
🏷️ type | string | "newsletter" |
👥 audience | string | "only_paid" |
💎 isPaid | boolean | true |
🖼️ coverImage | string | null | "https://substackcdn.com/..." |
🎙️ podcastDuration | integer | null | 1820 |
📝 wordCount | integer | null | 2116 |
💬 commentCount | integer | null | 1 |
❤️ reactionCount | integer | null | 6 |
🔁 restackCount | integer | null | 4 |
🎧 audioItems | integer | 1 |
🎬 videoUploadId | integer | null | null |
🆔 podcastUploadId | integer | null | null |
🗂️ sectionId | integer | null | 27625 |
🏷️ sectionName | string | null | "👑 Premium Analysis " |
📝 truncatedBodyText | string | "Gm Fintech Architects..." |
🕒 scrapedAt | ISO 8601 | "2026-05-01T00:35:02.344Z" |
📦 Sample records
✨ Why choose this Actor
| Capability | |
|---|---|
| 🆓 | No login. Reads the public Substack archive endpoints, no subscription needed. |
| 📰 | Subdomain or custom domain. Works with slug.substack.com and bring-your-own domains alike. |
| 🎙️ | Podcast and newsletter. Full coverage of all post types. |
| 💎 | Paywall flag. Each post tells you whether it is free or subscriber-only. |
| 📊 | Engagement signals. Reactions, comments, restacks, and word count out of the box. |
| 📅 | Date filtering. Restrict to a specific year, quarter, or month. |
| 🔄 | Bulk pagination. Pull thousands of posts per run with built-in throttling. |
📊 In a single 13-second run the Actor returned 100 posts from a single publication including paid and free items.
📈 How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| Manual subscribe + scroll | Free + paywall | Limited per session | One-shot | Date only | Account per publication |
| Generic web scrapers | $$ subscription | Brittle CSS | Daily | None | Engineer hours |
| RSS readers | Free | Latest 20 only | Live | None | Per-feed setup |
| ⭐ Substack Publication Scraper (this Actor) | Pay-per-event | Full archive | Live | Type, dates, paywall flag | None |
The same archive endpoints Substack itself uses, exposed as clean structured records.
🚀 How to use
- 🆓 Create a free Apify account. Sign up here and get $5 in free credit.
- 🔍 Open the Actor. Search for "Substack Publication" in the Apify Store.
- ⚙️ Set the publication. Enter the subdomain or custom domain and any filters.
- ▶️ Click Start. A 100-post run finishes in under 15 seconds.
- 📥 Download. Export as CSV, Excel, JSON, or XML.
⏱️ Total time from sign-up to first dataset: under five minutes.
💼 Business use cases
🌟 Beyond business use cases
Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.
🔌 Automating Substack Publication Scraper
Run this Actor on a schedule, from your codebase, or inside another tool:
- Node.js SDK: see Apify JavaScript client for programmatic runs and dataset exports.
- Python SDK: see Apify Python client for the same flow in Python.
- HTTP API: see Apify API docs for raw REST integration.
Schedule daily, weekly, or monthly runs from the Apify Console. Pipe results into Google Sheets, S3, BigQuery, or your own webhook with the built-in integrations.
❓ Frequently Asked Questions
🔌 Integrate with any app
- Make - drop run results into 1,800+ apps with a no-code visual builder.
- Zapier - trigger automations off completed runs.
- Slack - post run summaries to a channel.
- Google Sheets - sync each run into a spreadsheet.
- Webhooks - notify your own services on run finish.
- Airbyte - load runs into Snowflake, BigQuery, or Postgres.
🔗 Recommended Actors
- 🐝 Beehiiv Newsletter Scraper - the same workflow for Beehiiv-hosted newsletters.
- 📚 Wikipedia Pageviews Scraper - cross-reference newsletter trends with public-interest spikes.
- 💼 Indie Hackers Posts Scraper - mine founder commentary that often parallels Substack content.
- 🐙 GitHub Trending Repos Scraper - pair with technical newsletters for a developer-attention signal.
- 🅱️ Bing Search Scraper - track which posts rank for which keywords.
💡 Pro Tip: browse the complete ParseForge collection for more pre-built scrapers and data tools.
🆘 Need Help? Open our contact form and we'll route the question to the right person.
Substack is a registered trademark of Substack Inc. This Actor is not affiliated with or endorsed by Substack. It reads only publicly accessible archive endpoints and respects per-publication terms of service.