📰 Substack Posts Monitor
Pricing
from $2.50 / 1,000 substack post saveds
📰 Substack Posts Monitor
Pull every post from any Substack publication via the public archive API: title, author, paywall flag, full body, engagement counts, reading time. For newsletter aggregators, content marketers, and AI summarizers. Export, run via API, schedule, or integrate with other tools.

TL;DR
Newsletter trend analysts, competitive-intelligence teams, and AI training-data curators bookmark 30 Substacks and forget to check them. This pulls every recent post from any list of Substack publications (subdomain or custom domain) into clean structured JSON: title, author, typed isPaywalled boolean, publish timestamp in ISO 8601, wordcount, reaction and comment counts, optional full body for LLM summarization. Watchlist mode emits only posts new since the last run, so a daily schedule feeds your dashboard, Notion sync, or agent pipeline with zero duplicates and zero HTML parsing. Export, run via API, schedule, or integrate with other tools.
Try it on a small dataset, then let us know what you think in a review.
What does Substack Posts Monitor do?
Give it a list of Substack publications and it returns every recent post as a structured JSON record. Each record includes the post title, subtitle, slug, canonical URL, publication name, author name and handle, publish timestamp in ISO 8601, a typed isPaywalled boolean, raw audience value (everyone, only_paid, or founding), wordcount, estimated reading time in minutes, cover image URL, reaction count, comment count, post tags, and an agentMarkdown field that drops straight into Claude, GPT, Slack, or a Notion card.
Set includeFullBody: true and the actor adds the full HTML body and a stripped plain-text version to each record for downstream summarization or full-text search. Leave it off and the actor returns the publicly available short description plus the truncated free preview that Substack shows non-subscribers.
The actor talks to Substack's public archive API directly, no headless browser, no HTML parsing, no scraping fragility, so runs are fast (seconds per publication) and your downstream pipeline never has to deal with HTML drift.
Pass a bare subdomain (thedailyloop) for publications that live on substack.com, or a full URL (https://www.lennysnewsletter.com) for publications on a custom domain. Mix freely in one run.
Why scrape Substack?
Substack now hosts the working notebooks of operators and writers across every niche that matters: product (Lenny's Newsletter), tech strategy (Stratechery), AI (Latent Space), VC (The Generalist), finance (Doomberg), and thousands of vertical newsletters. Newsletter trend analysts, competitive-intel teams tracking competitor cadence and paywall mix, and AI training-data curators all need a way to watch 20+ publications without opening 20 tabs.
Substack offers no feed product for non-subscribers and no one-call API to monitor many publications at once. Bookmark 30 Substacks, forget to check half of them, and miss the post that mattered. One scheduled run replaces the manual ritual and feeds a digest, Notion sync, or agent pipeline.
Who needs this?
- Newsletter aggregators building daily or weekly digests from 20+ Substack publications
- AI agent builders wiring LLM newsletter summarizers, podcast notes, or topic monitors
- Content marketers tracking competitor newsletter cadence, paywall mix, and engagement signals
- VC analysts scouting operator-writers in their thesis areas (founder content as a deal-flow signal)
- Journalists covering the creator economy or any vertical Substack dominates
- Market researchers in tech, finance, AI, or biotech building literature reviews from Substack-hosted commentary
- PR teams monitoring brand mentions and influencer commentary across niche newsletters
- Subscriber-acquisition tools building lookalike-publication recommendations
How to use Substack Posts Monitor
- Open the actor on Apify Console.
- Paste publication identifiers into the Publications field. Use bare subdomains (
thedailyloop) for substack.com-hosted publications, full URLs (https://www.lennysnewsletter.com) for custom-domain publications. - Set Posts per publication (default 20).
- Optional: set a Published after ISO date, change the Audience filter, toggle Include full body, or enable Watchlist mode.
- Click Start and check the dataset when the run completes.
- Schedule the actor or call it via the Apify API to run on a cadence.
How much will scraping Substack cost?
Pricing is pay-per-event, no monthly minimum, no platform-cost surprises.
| Plan | Price per saved post | Posts on $5 free credit |
|---|---|---|
| FREE | $0.004 | ~1,250 |
| BRONZE | $0.0035 | n/a (paid plans have separate credit) |
| SILVER | $0.003 | |
| GOLD | $0.0025 | |
| PLATINUM | $0.002 | |
| DIAMOND | $0.002 |
Plus a one-time $0.005 actor-start charge per run. A typical daily-monitor run that pulls 20 new posts across 5 publications costs roughly $0.005 + 20 × $0.004 = $0.085 on the FREE tier. Turning on includeFullBody adds one extra request per post but does not change the per-record price.
Is it legal to scrape Substack?
Substack's robots.txt does not block the archive endpoints this actor uses. The actor only fetches publicly viewable post metadata that any logged-out browser visitor can see. It does not bypass the paywall, for paywalled posts, only the public preview text is available. It honors a polite request rate. As with any data you collect from third-party sources, consult your legal counsel before commercial redistribution. Always respect publication terms of use and applicable copyright law.
Examples
Daily monitor of one publication:
{"publications": ["thedailyloop"],"postsPerPublication": 5,"watchlistMode": true}
Backfill the last 6 months across multiple publications, free posts only:
{"publications": ["https://www.lennysnewsletter.com","https://stratechery.com","thedailyloop"],"postsPerPublication": 50,"publishedAfter": "2025-11-10","audience": "free"}
Pull full bodies for a small batch (for LLM summarization):
{"publications": ["thedailyloop"],"postsPerPublication": 10,"includeFullBody": true}
Input parameters
| Field | Type | Description |
|---|---|---|
publications | array of string | Required. Substack subdomains or full publication URLs. |
postsPerPublication | integer | Default 20. Max 1000 per publication. |
publishedAfter | string | Optional. ISO date (YYYY-MM-DD). Stops walking the archive past this date. |
audience | enum | all (default), free, or paid_only. |
includeFullBody | boolean | Default false. Adds full HTML + stripped text body per post. |
watchlistMode | boolean | Default false. Emits only posts new since the previous run. |
maxItems | integer | Default 10. Hard cap across all publications. Raise for production runs. |
proxyConfiguration | object | Optional. Apify proxy config for high-volume runs. |
Substack output format
substack_post
| Field | Type | Description |
|---|---|---|
recordType | string | Always substack_post. |
outputSchemaVersion | string | 2026-05-10. Bumped on breaking schema changes. |
postId | string | Substack's stable numeric post ID. |
recordId | string | substack:post:<postId>, idempotent across runs. |
publication | string | Publication subdomain. |
publicationName | string | Human-readable publication name. |
url | string | Canonical post URL. |
title | string | |
subtitle | string|null | |
slug | string | |
type | string | newsletter, podcast, thread, etc. |
author | object | { name, handle, photoUrl }. |
publishedAt | string | ISO 8601. |
isPaywalled | boolean | True if audience is only_paid or founding. |
audience | string | Raw Substack audience value. |
description | string|null | Short blurb. |
excerpt | string|null | First ~500 chars of body. |
fullBodyHtml | string|null | Only when includeFullBody=true. |
fullBodyText | string|null | HTML-stripped plain text. |
wordcount | integer|null | |
estimatedReadMinutes | integer|null | wordcount / 200. |
coverImageUrl | string|null | |
reactionsCount | integer | |
commentsCount | integer | |
tags | array of string | |
agentMarkdown | string | Pre-formatted markdown card for LLM context. |
fieldCompletenessScore | number | 0.0 to 1.0. Filter on this for high-quality records. |
scrapedAt | string | ISO 8601 of this run. |
Sample agentMarkdown:
📰 How to build an AI monetization strategy that actually works✍️ Vikas Kansal · lennysnewsletter📅 2026-05-05 · 15 min read · 💰 PAID👍 283 reactions · 💬 6 comments🔗 https://www.lennysnewsletter.com/p/why-saas-freemium-playbooks-dont
During the actor run
No authentication needed. A 5-post run on one publication completes in under 10 seconds; a 100-post run across 5 publications usually finishes in under 60 seconds. The actor honors a polite request cadence so publications stay reachable.
A run summary lands at the OUTPUT key, and a top-5 most-engaged digest at AGENT_BRIEFING.md, ready to drop into a Slack channel or daily LLM context window.
FAQ
How is this different from the existing Substack scrapers on the Store?
The leading Substack actor today is easyapi/substack-posts-scraper. It is rated 1.86 stars at the time of writing and ships generic keyword-search positioning. We built this actor specifically to fix the things that drove its rating down: a typed isPaywalled boolean (so paywalled and free posts are easy to filter without parsing the audience string), ISO 8601 timestamps everywhere (no mixing date and datetime formats), watchlist diff mode (so daily schedules emit only new posts), agent-grade fields (agentMarkdown, fieldCompletenessScore, recordId), pay-per-event pricing without a flat monthly fee, and an explicit per-publication base-URL strategy that handles both substack.com subdomains and custom domains in the same run.
Can I bulk-scrape paywalled posts?
No. The actor only returns what Substack returns publicly. For paywalled posts that means metadata, the preview text, and the truncated free preview body, never the full subscriber-only content. There is no bypassPaywall option and one will not be added.
What about authors who block scraping?
Substack's robots.txt is permissive on the archive endpoints this actor uses. We honor a polite request cadence and do not retry aggressively. If a specific publication has explicit terms of use that prohibit automated retrieval of their content, you should not scrape that publication.
Can I monitor only new posts?
Yes. Set watchlistMode: true. The actor stores post IDs it has already emitted in the key-value store and skips them on subsequent runs. The state is per-actor-run-storage, so duplicate-prevention works automatically when you schedule the actor.
Can I use this with Python, JavaScript, n8n, Make, or Zapier?
Yes. Apify has integrations for all of those plus a REST API. The dataset returned by this actor is plain JSON; pull it into any tool that consumes JSON.
Why does this cost more than free Substack scrapers?
Free actors break when source HTML or APIs change and there is no notification when they go silent. This actor uses Substack's official archive API, ships a versioned schema with recordId for idempotent upserts, and has watchlist diff mode built in so your scheduled runs do not re-emit posts you already have. If you are feeding this into a customer-facing product or a daily AI agent, the pennies per record buy you reliability the free actors cannot deliver.
How fast is it?
A 5-post run on one publication typically completes in under 10 seconds. A 100-post run across 5 publications with includeFullBody=false typically completes in under 60 seconds.
What happens with custom-domain publications?
Pass the full URL (e.g., https://www.lennysnewsletter.com) instead of a bare subdomain. The actor calls the API on whatever host you provide. The error message will tell you if you should switch when a subdomain redirects.
Why choose Substack Posts Monitor
- Monitor mode emits only what's new since last run, tracks seen post IDs across runs, so your competitor-cadence dashboard ingests each post exactly once
- Reliability free Substack scrapers can't deliver, the leading free actor sits at 1.86 stars because HTML scrapers break monthly with no notification. This actor uses Substack's archive API and has a 24-48 hour fix turnaround
- Watchlist 30+ publications in one run, subdomains and custom domains mixed freely (
thedailylooporhttps://www.lennysnewsletter.com) - Filter paywall vs free without string-parsing, typed
isPaywalledboolean and ISO 8601 timestamps everywhere - Sub-minute runtime, HTTP-only against Substack's archive API, no Playwright, no HTML parsing
- Drop-in LLM context,
agentMarkdownper record plus a per-runAGENT_BRIEFING.mddigest of the top 5 most-engaged posts - Re-runs are safe to dedupe by ID, stable
substack:post:<postId>keys - Schema doesn't break your pipeline, versioned and bumped on breaking change
- AI agents can self-filter sparse rows via
fieldCompletenessScore
Your feedback
Hit a bug or want a feature? Open an issue on the Issues tab rather than the reviews page, and we'll fix it fast (typically within 48 hours).
Other Skootle actors you might want to check
- 📰 Hacker News Watchlist, track new HN stories matching your keywords
- 🟠 Reddit Subreddit Monitor, daily diff of new subreddit posts
- 🍎 App Store Reviews, competitor app review monitoring
Support and contact
Issues: Issues tab. For other inquiries, contact via the Apify Store author profile.