Substack Publication Scraper avatar

Substack Publication Scraper

Pricing

from $8.25 / 1,000 items

Go to Apify Store
Substack Publication Scraper

Substack Publication Scraper

Pull every public post from any Substack publication with title, subtitle, body preview, author, publish date, podcast URL, audience type, comment count, and reactions. Filter by post type and date range. Export to JSON, CSV, or Excel for newsletter research and competitive intelligence.

Pricing

from $8.25 / 1,000 items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

8 days ago

Last modified

Share

ParseForge Banner

📰 Substack Publication Scraper

🚀 Pull every public post from any Substack publication. Title, body preview, author, podcast, paywall flag, comment count, reactions. No login, no API key, no manual scrolling.

🕒 Last updated: 2026-05-01 · 📊 27 fields per post · 📰 millions of newsletters · 🎙️ podcast metadata included · 💎 paid + free posts

The Substack Publication Scraper queries the public Substack archive endpoints for any publication and returns every post in the feed. Each record includes the post title, social title, subtitle, description, slug, canonical URL, publish date, post type, audience flag, paywall status, cover image, podcast duration, word count, reaction count, comment count, restack count, section info, and a truncated body preview.

Substack hosts millions of newsletters and is the largest creator-operated publishing platform on the internet. Top publications cross hundreds of thousands of paid subscribers and rival traditional media in influence. This Actor exports the full archive of any publication in a single run, letting you research content cadence, audience signals, and editorial mix without a manual subscribe-and-scroll workflow.

🎯 Target Audience💡 Primary Use Cases
Newsletter writers, content marketers, ghost writers, journalists, podcasters, researchersContent research, cadence analysis, audience mining, podcast discovery, competitive benchmarking

📋 What the Substack Publication Scraper does

Five filtering workflows in a single run:

  • 📰 Full archive export. Submit one publication subdomain or custom domain and pull its entire post archive.
  • 📅 Date range filter. Pin to a specific year, quarter, or month using minDate and maxDate.
  • 🎙️ Type filter. Restrict to newsletter, podcast, or thread posts.
  • 💎 Paywall awareness. Each record flags whether the post is everyone (free) or only_paid (subscriber-only).
  • 🔍 Engagement signals. Comment count, reaction count, restack count, and word count surface engagement patterns.

Each row reports the publication slug, post ID, full title and subtitle, slug, canonical URL, publish timestamp, type, audience, cover image URL, podcast duration when present, word count, engagement counters, and a 200-character body preview.

💡 Why it matters: Substack publications are time-machines for content strategy. Cadence, average word count, paywall ratio, and reaction-to-comment ratios all reveal what resonates. Researchers cite Substack archives in studies of opinion journalism. Ghost writers reverse-engineer voice from existing posts. Content marketers benchmark themselves against the best operators in their niche.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Posts to return. Free plan caps at 10, paid plan at 1,000,000.
publicationstring"lex"Subdomain (lex) or full custom domain (www.lennysnewsletter.com).
postTypestring"all"Filter to newsletter, podcast, thread, or all.
minDatestringemptyISO date YYYY-MM-DD. Only posts on or after this date.
maxDatestringemptyISO date YYYY-MM-DD. Only posts on or before this date.

Example: 100 most recent posts from a custom-domain publication.

{
"maxItems": 100,
"publication": "www.lennysnewsletter.com"
}

Example: every paid podcast episode in 2026.

{
"maxItems": 200,
"publication": "lex",
"postType": "podcast",
"minDate": "2026-01-01",
"maxDate": "2026-12-31"
}

⚠️ Good to Know: Substack subdomains are case sensitive in the URL but the Actor normalizes to lowercase before the request. Paid posts return only the truncated free preview in truncatedBodyText. Subscriber-only full body content is not exposed by the public archive endpoint and is out of scope.


📊 Output

Each post record contains 27 fields. Download as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🏷️ publicationstring"lex"
🆔 postIdinteger195849359
📰 titlestring"Analysis: The Machines are working..."
🪧 subtitlestring"AI capital is being mobilized..."
🔖 slugstring"analysis-the-machines-are-working"
🔗 urlstring"https://lex.substack.com/p/..."
📅 postDateISO 8601"2026-04-29T16:14:34.158Z"
🏷️ typestring"newsletter"
👥 audiencestring"only_paid"
💎 isPaidbooleantrue
🖼️ coverImagestring | null"https://substackcdn.com/..."
🎙️ podcastDurationinteger | null1820
📝 wordCountinteger | null2116
💬 commentCountinteger | null1
❤️ reactionCountinteger | null6
🔁 restackCountinteger | null4
🎧 audioItemsinteger1
🎬 videoUploadIdinteger | nullnull
🆔 podcastUploadIdinteger | nullnull
🗂️ sectionIdinteger | null27625
🏷️ sectionNamestring | null"👑 Premium Analysis "
📝 truncatedBodyTextstring"Gm Fintech Architects..."
🕒 scrapedAtISO 8601"2026-05-01T00:35:02.344Z"

📦 Sample records


✨ Why choose this Actor

Capability
🆓No login. Reads the public Substack archive endpoints, no subscription needed.
📰Subdomain or custom domain. Works with slug.substack.com and bring-your-own domains alike.
🎙️Podcast and newsletter. Full coverage of all post types.
💎Paywall flag. Each post tells you whether it is free or subscriber-only.
📊Engagement signals. Reactions, comments, restacks, and word count out of the box.
📅Date filtering. Restrict to a specific year, quarter, or month.
🔄Bulk pagination. Pull thousands of posts per run with built-in throttling.

📊 In a single 13-second run the Actor returned 100 posts from a single publication including paid and free items.


📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
Manual subscribe + scrollFree + paywallLimited per sessionOne-shotDate onlyAccount per publication
Generic web scrapers$$ subscriptionBrittle CSSDailyNoneEngineer hours
RSS readersFreeLatest 20 onlyLiveNonePer-feed setup
⭐ Substack Publication Scraper (this Actor)Pay-per-eventFull archiveLiveType, dates, paywall flagNone

The same archive endpoints Substack itself uses, exposed as clean structured records.


🚀 How to use

  1. 🆓 Create a free Apify account. Sign up here and get $5 in free credit.
  2. 🔍 Open the Actor. Search for "Substack Publication" in the Apify Store.
  3. ⚙️ Set the publication. Enter the subdomain or custom domain and any filters.
  4. ▶️ Click Start. A 100-post run finishes in under 15 seconds.
  5. 📥 Download. Export as CSV, Excel, JSON, or XML.

⏱️ Total time from sign-up to first dataset: under five minutes.


💼 Business use cases

📰 Content marketing

  • Reverse-engineer top newsletter cadence
  • Mine high-engagement headlines for inspiration
  • Track competitor launch announcements
  • Build editorial calendars from real archives

👻 Ghost writing

  • Match author voice from past posts
  • Research recurring themes per publication
  • Identify gap topics the audience asks for
  • Quote and credit accurately by date and post ID

📰 Journalism

  • Find sources for stories on creator economy
  • Track newsletter consolidation and migrations
  • Cite specific posts with stable canonical URLs
  • Cross-reference posts with public reactions

📊 Market research

  • Size niche communities by post engagement
  • Spot rising newsletters before mainstream pickup
  • Build alternative data feeds for finance and policy
  • Benchmark your own newsletter against operators in the same niche

🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Empirical datasets for papers, thesis work, and coursework
  • Longitudinal studies tracking changes across snapshots
  • Reproducible research with cited, versioned data pulls
  • Classroom exercises on data analysis and ethical scraping

🎨 Personal and creative

  • Side projects, portfolio demos, and indie app launches
  • Data visualizations, dashboards, and infographics
  • Content research for bloggers, YouTubers, and podcasters
  • Hobbyist collections and personal trackers

🤝 Non-profit and civic

  • Transparency reporting and accountability projects
  • Advocacy campaigns backed by public-interest data
  • Community-run databases for local issues
  • Investigative journalism on public records

🧪 Experimentation

  • Prototype AI and machine-learning pipelines with real data
  • Validate product-market hypotheses before engineering spend
  • Train small domain-specific models on niche corpora
  • Test dashboard concepts with live input

🔌 Automating Substack Publication Scraper

Run this Actor on a schedule, from your codebase, or inside another tool:

Schedule daily, weekly, or monthly runs from the Apify Console. Pipe results into Google Sheets, S3, BigQuery, or your own webhook with the built-in integrations.


🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

📰 What publications does this support?

Any public Substack publication, whether it sits on {name}.substack.com or a custom domain. The Actor sends the request to the publication host's /api/v1/archive endpoint, which Substack serves identically for both setups.

💎 Can I read full body content of paid posts?

No. The public archive endpoint returns the truncated free preview in truncatedBodyText for paid posts. Full subscriber-only content requires a paid subscription and a session cookie, which is out of scope for this Actor.

🔠 How do I find the publication slug?

For Substack-hosted publications, the slug is the part before .substack.com. For custom domains, use the full host like www.lennysnewsletter.com. The actor normalizes both forms.

📅 How far back does the data go?

The archive returns every public post the publication has ever published, going back to the publication's first post. Some long-running publications have thousands of posts.

📦 How many posts can I pull at once?

Free plan caps at 10 posts per run. Paid plans allow up to 1,000,000 posts. Each run paginates through the archive automatically.

🎙️ Are podcast episodes included?

Yes. Set postType to podcast to filter, or leave as all to mix newsletters, podcasts, and threads in the same dataset. Podcast posts include duration in seconds.

📊 Do reactions and comments work for paid posts?

Yes. Engagement counters are visible to non-subscribers and surfaced in every record regardless of paywall status.

💼 Can I use this for commercial work?

Yes. Substack post metadata is publicly accessible and the Actor reads only what Substack already publishes. Always respect each publication's terms of service when republishing content.

💳 Do I need a paid Apify plan?

The free plan returns up to 10 posts per run. Paid plans return up to 1,000,000 posts. The Actor uses pay-per-event pricing, so you only pay for the posts you receive.

⚠️ What if a run fails or returns empty?

The most common cause is a misspelled publication slug or a publication that has been deleted. Verify the URL works in a browser, then retry. If the issue persists, open a contact form and include the run URL.

🔁 How fresh is the data?

Live. The Actor calls the Substack archive endpoint at run time, so you get whatever is publicly visible on the publication right now.

This Actor reads Substack's own public archive endpoints, the same ones browsers use to render the archive page. It does not bypass paywalls or use credentials.


🔌 Integrate with any app

  • Make - drop run results into 1,800+ apps with a no-code visual builder.
  • Zapier - trigger automations off completed runs.
  • Slack - post run summaries to a channel.
  • Google Sheets - sync each run into a spreadsheet.
  • Webhooks - notify your own services on run finish.
  • Airbyte - load runs into Snowflake, BigQuery, or Postgres.

💡 Pro Tip: browse the complete ParseForge collection for more pre-built scrapers and data tools.


🆘 Need Help? Open our contact form and we'll route the question to the right person.


Substack is a registered trademark of Substack Inc. This Actor is not affiliated with or endorsed by Substack. It reads only publicly accessible archive endpoints and respects per-publication terms of service.