Substack Publication Scraper avatar

Substack Publication Scraper

Pricing

from $8.25 / 1,000 items

Go to Apify Store
Substack Publication Scraper

Substack Publication Scraper

Pull every public post from any Substack publication with title, subtitle, body preview, author, publish date, podcast URL, audience type, comment count, and reactions. Filter by post type and date range. Export to JSON, CSV, or Excel for newsletter research and competitive intelligence.

Pricing

from $8.25 / 1,000 items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

ParseForge Banner

📰 Substack Publication Scraper

🚀 Pull every public post from any Substack publication. Title, body preview, author, podcast, paywall flag, comment count, reactions. No login, no API key, no manual scrolling.

🕒 Last updated: 2026-05-01 · 📊 27 fields per post · 📰 millions of newsletters · 🎙️ podcast metadata included · 💎 paid + free posts

The Substack Publication Scraper queries the public Substack archive endpoints for any publication and returns every post in the feed. Each record includes the post title, social title, subtitle, description, slug, canonical URL, publish date, post type, audience flag, paywall status, cover image, podcast duration, word count, reaction count, comment count, restack count, section info, and a truncated body preview.

Substack hosts millions of newsletters and is the largest creator-operated publishing platform on the internet. Top publications cross hundreds of thousands of paid subscribers and rival traditional media in influence. This Actor exports the full archive of any publication in a single run, letting you research content cadence, audience signals, and editorial mix without a manual subscribe-and-scroll workflow.

🎯 Target Audience💡 Primary Use Cases
Newsletter writers, content marketers, ghost writers, journalists, podcasters, researchersContent research, cadence analysis, audience mining, podcast discovery, competitive benchmarking

📋 What the Substack Publication Scraper does

Five filtering workflows in a single run:

  • 📰 Full archive export. Submit one publication subdomain or custom domain and pull its entire post archive.
  • 📅 Date range filter. Pin to a specific year, quarter, or month using minDate and maxDate.
  • 🎙️ Type filter. Restrict to newsletter, podcast, or thread posts.
  • 💎 Paywall awareness. Each record flags whether the post is everyone (free) or only_paid (subscriber-only).
  • 🔍 Engagement signals. Comment count, reaction count, restack count, and word count surface engagement patterns.

Each row reports the publication slug, post ID, full title and subtitle, slug, canonical URL, publish timestamp, type, audience, cover image URL, podcast duration when present, word count, engagement counters, and a 200-character body preview.

💡 Why it matters: Substack publications are time-machines for content strategy. Cadence, average word count, paywall ratio, and reaction-to-comment ratios all reveal what resonates. Researchers cite Substack archives in studies of opinion journalism. Ghost writers reverse-engineer voice from existing posts. Content marketers benchmark themselves against the best operators in their niche.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Posts to return. Free plan caps at 10, paid plan at 1,000,000.
publicationstring"lex"Subdomain (lex) or full custom domain (www.lennysnewsletter.com).
postTypestring"all"Filter to newsletter, podcast, thread, or all.
minDatestringemptyISO date YYYY-MM-DD. Only posts on or after this date.
maxDatestringemptyISO date YYYY-MM-DD. Only posts on or before this date.

Example: 100 most recent posts from a custom-domain publication.

{
"maxItems": 100,
"publication": "www.lennysnewsletter.com"
}

Example: every paid podcast episode in 2026.

{
"maxItems": 200,
"publication": "lex",
"postType": "podcast",
"minDate": "2026-01-01",
"maxDate": "2026-12-31"
}

⚠️ Good to Know: Substack subdomains are case sensitive in the URL but the Actor normalizes to lowercase before the request. Paid posts return only the truncated free preview in truncatedBodyText. Subscriber-only full body content is not exposed by the public archive endpoint and is out of scope.


📊 Output

Each post record contains 27 fields. Download as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🏷️ publicationstring"lex"
🆔 postIdinteger195849359
📰 titlestring"Analysis: The Machines are working..."
🪧 subtitlestring"AI capital is being mobilized..."
🔖 slugstring"analysis-the-machines-are-working"
🔗 urlstring"https://lex.substack.com/p/..."
📅 postDateISO 8601"2026-04-29T16:14:34.158Z"
🏷️ typestring"newsletter"
👥 audiencestring"only_paid"
💎 isPaidbooleantrue
🖼️ coverImagestring | null"https://substackcdn.com/..."
🎙️ podcastDurationinteger | null1820
📝 wordCountinteger | null2116
💬 commentCountinteger | null1
❤️ reactionCountinteger | null6
🔁 restackCountinteger | null4
🎧 audioItemsinteger1
🎬 videoUploadIdinteger | nullnull
🆔 podcastUploadIdinteger | nullnull
🗂️ sectionIdinteger | null27625
🏷️ sectionNamestring | null"👑 Premium Analysis "
📝 truncatedBodyTextstring"Gm Fintech Architects..."
🕒 scrapedAtISO 8601"2026-05-01T00:35:02.344Z"

📦 Sample records


✨ Why choose this Actor

Capability
🆓No login. Reads the public Substack archive endpoints, no subscription needed.
📰Subdomain or custom domain. Works with slug.substack.com and bring-your-own domains alike.
🎙️Podcast and newsletter. Full coverage of all post types.
💎Paywall flag. Each post tells you whether it is free or subscriber-only.
📊Engagement signals. Reactions, comments, restacks, and word count out of the box.
📅Date filtering. Restrict to a specific year, quarter, or month.
🔄Bulk pagination. Pull thousands of posts per run with built-in throttling.

📊 In a single 13-second run the Actor returned 100 posts from a single publication including paid and free items.


📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
Manual subscribe + scrollFree + paywallLimited per sessionOne-shotDate onlyAccount per publication
Generic web scrapers$$ subscriptionBrittle CSSDailyNoneEngineer hours
RSS readersFreeLatest 20 onlyLiveNonePer-feed setup
⭐ Substack Publication Scraper (this Actor)Pay-per-eventFull archiveLiveType, dates, paywall flagNone

The same archive endpoints Substack itself uses, exposed as clean structured records.


🚀 How to use

  1. 🆓 Create a free Apify account. Sign up here and get $5 in free credit.
  2. 🔍 Open the Actor. Search for "Substack Publication" in the Apify Store.
  3. ⚙️ Set the publication. Enter the subdomain or custom domain and any filters.
  4. ▶️ Click Start. A 100-post run finishes in under 15 seconds.
  5. 📥 Download. Export as CSV, Excel, JSON, or XML.

⏱️ Total time from sign-up to first dataset: under five minutes.


💼 Business use cases

📰 Content marketing

  • Reverse-engineer top newsletter cadence
  • Mine high-engagement headlines for inspiration
  • Track competitor launch announcements
  • Build editorial calendars from real archives

👻 Ghost writing

  • Match author voice from past posts
  • Research recurring themes per publication
  • Identify gap topics the audience asks for
  • Quote and credit accurately by date and post ID

📰 Journalism

  • Find sources for stories on creator economy
  • Track newsletter consolidation and migrations
  • Cite specific posts with stable canonical URLs
  • Cross-reference posts with public reactions

📊 Market research

  • Size niche communities by post engagement
  • Spot rising newsletters before mainstream pickup
  • Build alternative data feeds for finance and policy
  • Benchmark your own newsletter against operators in the same niche

🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Empirical datasets for papers, thesis work, and coursework
  • Longitudinal studies tracking changes across snapshots
  • Reproducible research with cited, versioned data pulls
  • Classroom exercises on data analysis and ethical scraping

🎨 Personal and creative

  • Side projects, portfolio demos, and indie app launches
  • Data visualizations, dashboards, and infographics
  • Content research for bloggers, YouTubers, and podcasters
  • Hobbyist collections and personal trackers

🤝 Non-profit and civic

  • Transparency reporting and accountability projects
  • Advocacy campaigns backed by public-interest data
  • Community-run databases for local issues
  • Investigative journalism on public records

🧪 Experimentation

  • Prototype AI and machine-learning pipelines with real data
  • Validate product-market hypotheses before engineering spend
  • Train small domain-specific models on niche corpora
  • Test dashboard concepts with live input

🔌 Automating Substack Publication Scraper

Run this Actor on a schedule, from your codebase, or inside another tool:

Schedule daily, weekly, or monthly runs from the Apify Console. Pipe results into Google Sheets, S3, BigQuery, or your own webhook with the built-in integrations.


❓ Frequently Asked Questions


🔌 Integrate with any app

  • Make - drop run results into 1,800+ apps with a no-code visual builder.
  • Zapier - trigger automations off completed runs.
  • Slack - post run summaries to a channel.
  • Google Sheets - sync each run into a spreadsheet.
  • Webhooks - notify your own services on run finish.
  • Airbyte - load runs into Snowflake, BigQuery, or Postgres.

💡 Pro Tip: browse the complete ParseForge collection for more pre-built scrapers and data tools.


🆘 Need Help? Open our contact form and we'll route the question to the right person.


Substack is a registered trademark of Substack Inc. This Actor is not affiliated with or endorsed by Substack. It reads only publicly accessible archive endpoints and respects per-publication terms of service.