Pricing

from $0.015 / entity scraped

Substack Insights Scraper

Scrape Substack publications, posts (HTML + markdown for AI/RAG), comments, Notes feeds, author profiles and contact info from bios. Input-time filters (paywall, reactions, words, keywords, subscribers). Historical subscriber-growth tracking across scheduled runs.

Pricing

from $0.015 / entity scraped

Rating

0.0

(0)

Developer

Yuliia Kulakova

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

What you get


Publication metadata	Name, subdomain, custom domain, description, language, subscriber count (when public), payments enabled, primary author
Full post archive	Up to 500 posts per publication. Title, subtitle, body (HTML + Markdown), word count, reading time, cover image, audience (free/paid), tags, section, podcast attachment
Engagement metrics	Reactions (with per-emoji breakdown), comments, restacks, quotes
AI/RAG-ready markdown	Substack platform widgets (subscribe boxes, share buttons) stripped automatically. Optional export of per-post `.md` files with YAML front matter to the key-value store — drop straight into LlamaIndex / LangChain
Nested comments	Full reply tree per post (parent → child mapping, depth, reactions per comment)
Author profiles	Bio, photo, Twitter handle, all publications they author (cross-pub graph for KOL / influencer discovery)
Notes feed	Per-author Notes timeline including restacks (which posts the author has boosted, with full source attribution)
Global publication search	Discover newsletters by keyword across all of Substack
Lead-gen contact info	Emails, phones, payment links (PayPal, Cash App, Venmo), aggregators (Linktree, Beacons), website, and 20+ social platforms categorised
Engagement filters at input	Free-only / paid-only, min reactions / comments / restacks, min word count, min subscribers, language, date range, include / exclude keywords, has podcast. You pay only for entities that match — no other Substack scraper on Apify does this.
Analytics block	Per-publication engagement rate, virality index, average reactions / comments per post, average word count, publish cadence per week, paid-post ratio, momentum (when history is on)
Historical trend tracking	Persisted snapshots + automatic deltas (subscriber growth per day, posts added, paid-ratio change, trend up / down / flat) across scheduled runs
Multi-format export	JSON, CSV, Excel, XML, HTML, plus optional markdown files

Works out of the box — proxies included, no configuration required.

Use cases

AI / RAG builders — schedule daily scrape of niche newsletters, get clean markdown chunks straight into Pinecone / Weaviate / LlamaIndex with one .md file per post
Content marketers / agencies — track 15-30 competitor newsletters weekly, surface viral posts and topic gaps with the keyword + min-reactions filter
VC scouts / investors — snapshot 200 creators monthly, flag newsletters with subscriber growth > X / day using the history feature
Brand sponsors / ad buyers — filter publications by subscriber threshold + topic keywords to build a placement target list
Lead-gen / SDR — extract author emails, websites, socials from publication bios for cold outreach
Recruiters — find domain experts who write about specific topics (search by keyword in body, filter by min reactions to qualify)
Newsletter operators — weekly archive diff on competitors, identify "what's working" by sorting posts by reactions
Journalists / researchers — full-text public newsletter archives for citation, with provenance per post

Quick start

Paste a Substack URL or bare handle — that's the whole minimum input:

{
  "publicationHandles": ["noahpinion"]
}

You get one publication record (Noahpinion: 450,000 subscribers, custom domain noahpinion.blog, payments enabled) plus 25 most-recent posts with full body, reactions, comments, tags, and analytics.

Common inputs

Multiple publications in one run:

{
  "publicationHandles": ["piratewires", "platformer", "noahpinion"],
  "maxPostsPerPublication": 10
}

AI / RAG corpus build — free posts only, high engagement, long-form, markdown output:

{
  "publicationHandles": ["noahpinion"],
  "maxPostsPerPublication": 50,
  "paywallFilter": "free-only",
  "minReactions": 100,
  "minWordCount": 500,
  "bodyFormat": "markdown",
  "outputMarkdownFiles": true
}

Each matching post is also written as <handle>__<slug>.md to the substack-markdown key-value store with YAML front matter — drop straight into LlamaIndex/LangChain.

Lead-gen sweep with contact extraction:

{
  "searchQueries": ["startups", "founders"],
  "maxSearchResults": 20,
  "minSubscribers": 5000,
  "includeContactInfo": true,
  "includePosts": false
}

Newsletter monitoring with subscriber-growth tracking — schedule daily:

{
  "publicationHandles": ["noahpinion", "platformer", "piratewires"],
  "enableHistory": true,
  "historyDatasetName": "my-newsletter-tracker"
}

From the second scheduled run onwards every publication carries a history.delta block with subscriberChange, subscriberGrowthPerDay, postsAdded, and a trend (up / down / flat).

Keyword-filtered competitor research:

{
  "publicationHandles": ["noahpinion"],
  "maxPostsPerPublication": 50,
  "keywordsInclude": ["china", "trade"],
  "dateFrom": "2026-01-01"
}

Single-author profile + Notes feed:

{
  "userHandles": ["casey", "noah"],
  "includeNotes": true,
  "maxNotesPerAuthor": 50
}

Global discovery — find newsletters by topic:

{
  "searchQueries": ["artificial intelligence", "macroeconomics", "biotech"],
  "maxSearchResults": 10
}

Inputs reference

Targeting

Field	Type	Default	Description
`urls`	array	—	Substack URLs (publications or posts) or `@username` form
`publicationHandles`	array	—	Bare publication handles (e.g. `noahpinion`)
`userHandles`	array	—	Bare user handles for author profile + Notes
`searchQueries`	array	—	Keywords for global publication search
`maxSearchResults`	int	10	Top N publications per search query
`includePosts`	boolean	true	Fetch the post archive per publication
`maxPostsPerPublication`	int	25	Cap on posts fetched per publication
`includeFullBody`	boolean	true	Extra request per post to fetch the full body
`bodyFormat`	enum	`both`	`both` / `html` / `markdown` / `none`
`includeComments`	boolean	false	Fetch nested comments per post
`includeNotes`	boolean	false	Fetch Notes / restacks feed per user handle

Engagement filters (paid only if matched)

Field	Default	Description
`paywallFilter`	`any`	`any` / `free-only` / `paid-only`
`minReactions`	0	Skip posts below this reaction count
`minComments`	0	Skip posts below this comment count
`minRestacks`	0	Skip posts below this restack count
`minWordCount`	0	Drop short posts (great for AI/RAG quality bar)
`maxWordCount`	0 (unlimited)	Drop overlong posts
`hasPodcast`	any	Keep only posts with (or without) a podcast attachment
`postType`	—	Allow-list (`newsletter`, `podcast`, `thread`)
`language`	—	Two-letter codes (e.g. `en`, `es`)
`dateFrom` / `dateTo`	—	ISO date or YYYY-MM-DD
`keywordsInclude`	—	Must contain at least one keyword in title / subtitle / body (case-insensitive)
`keywordsExclude`	—	Must NOT contain
`minSubscribers`	0	Publication-level: skip pubs below this subscriber count
`paymentsEnabledOnly`	false	Publication-level: keep only pubs that offer paid subscriptions

Enrichment

Field	Default	Description
`includeContactInfo`	true	Emails, phones, socials, payment links from bios
`includeAnalytics`	true	Engagement rate, virality, cadence, paid-post ratio
`enableHistory`	false	Persist snapshots, compute deltas across runs
`historyDatasetName`	`substack-history`	Named store reused across scheduled runs
`outputMarkdownFiles`	false	Per-post `.md` files written to the `substack-markdown` KV store with YAML front matter

Output

Every record carries type, id, handle (or publicationHandle for posts), url, scrapedAt. Type-specific fields are layered on top.

Publication record

{
  "type": "publication",
  "id": "12345",
  "handle": "noahpinion",
  "name": "Noahpinion",
  "url": "https://noahpinion.substack.com",
  "customDomain": "noahpinion.blog",
  "description": "Economics and other interesting stuff...",
  "subscriberCount": 450000,
  "subscriberCountString": "450,000",
  "paymentsEnabled": true,
  "language": "en",
  "primaryAuthor": { "id": "...", "handle": "...", "name": "..." },
  "topPosts": [ /* top 5 by reactions */ ],
  "analytics": { /* see below */ },
  "history": { /* when enableHistory */ },
  "contactInfo": { /* when includeContactInfo */ }
}

Post record

{
  "type": "post",
  "id": "...",
  "slug": "...",
  "url": "...",
  "publicationHandle": "noahpinion",
  "title": "...",
  "subtitle": "...",
  "publishedAt": "...",
  "audience": "everyone" /* or "only_paid" / "founding" */,
  "isPaywalled": false,
  "wordCount": 1234,
  "readingTimeMin": 6,
  "reactionCount": 256,
  "reactionsBreakdown": { "❤": 200, "🔥": 56 },
  "commentCount": 45,
  "restackCount": 12,
  "podcastUrl": null,
  "sectionName": null,
  "tags": ["..."],
  "bodyHtml": "...",
  "bodyMarkdown": "...",
  "coAuthors": [ /* with bio */ ]
}

Analytics block

{
  "sampleSize": 10,
  "sampleDateRange": { "from": "...", "to": "..." },
  "avgReactionsPerPost": 580,
  "avgCommentsPerPost": 65,
  "avgWordCount": 3200,
  "totalReactions": 5800,
  "topPostReactions": 937,
  "topPost": { "slug": "...", "title": "...", "reactionCount": 937 },
  "publishCadencePerWeek": 4.67,
  "paidPostRatio": 0.33,
  "engagementRate": 0.00129,
  "viralityIndex": 1.62,
  "momentum": { "subscriberGrowthPerDay": 250, "trend": "up" }
}

History block (with `enableHistory`)

{
  "firstSeen": "2026-06-01T00:00:00Z",
  "snapshotCount": 14,
  "previous": { "at": "...", "subscriberCount": 449750 },
  "delta": {
    "daysElapsed": 1.0,
    "subscriberChange": 250,
    "subscriberGrowthPerDay": 250,
    "postsAdded": 1,
    "trend": "up"
  },
  "series": [ /* recent points */ ]
}

Contact info block

{
  "emails": ["hello@example.com"],
  "phones": [],
  "website": "https://example.com",
  "socials": {
    "twitter": "...",
    "linkedin": "...",
    "medium": "...",
    "beehiiv": "..."
  },
  "paymentLinks": { "paypal": "..." },
  "aggregator": "https://linktr.ee/..."
}

Pricing

Simple pay-per-event:

Event	Cost
Actor start	$0.01
Each scraped entity (publication / post / comment / note / author)	$0.015

Example runs:

What you scrape	Calculation	Cost
1 publication + 25 posts	$0.01 + 26 × $0.015	$0.40
10 publications + 25 posts each	$0.01 + 260 × $0.015	$3.91
100 publications daily monitoring (no posts)	$1.51 / run × 30 days	~$45 / month
AI / RAG corpus: 50 newsletters × 100 posts	$0.01 + 5,050 × $0.015	$75.76
500 publications discovery sweep (search + metadata)	$0.01 + 500 × $0.015	$7.51

Compare: enterprise newsletter analytics tools start at $250–500 / month. We're per-use, no commitment.

FAQ

Does this work without configuration? Yes. Paste a Substack URL or bare handle and run. Proxies are included.

Why is subscriberCount sometimes null? About half of publications choose to hide their subscriber number in their public settings — when that's the case, no scraper (ours or anyone else's) can recover it without a logged-in session. When subscriberCountVisibility: "private" is set by the publication owner, we surface null. For the half that keep it public, we extract the real number (verified on Noahpinion 450k, Platformer 176k, etc.).

Are paywalled post bodies fully scraped? For free posts: full body is returned. For paid posts: Substack returns a short preview (the same the site shows to non-subscribers) — we surface what's exposed and flag isPaywalled: true. No scraper can extract paid bodies without an sp_dc-style subscriber cookie (and that requires a real paid subscription).

What's the markdown export mode? With outputMarkdownFiles: true, every scraped post body is written as a separate .md file to the substack-markdown key-value store. The file is named <handle>__<slug>.md and starts with YAML front matter (title, slug, publication, publishedAt, audience, reactionCount, commentCount, wordCount). Platform widgets (subscribe boxes, share buttons) are stripped. Ready for LlamaIndex / LangChain document loaders.

Can I get historical subscriber growth? Yes — with enableHistory: true plus a stable historyDatasetName. From the second scheduled run onwards every publication record carries a history.delta block with subscriberChange, subscriberGrowthPerDay, postsAdded, and a trend. This is the only Substack scraper on Apify that does this.

Will repeated runs over-charge me? Each run is metered independently. If you re-scrape the same publications daily, that's a billable entity per publication per run. The history store deduplicates time-series points so trend data stays clean.

Can I export to CSV / Excel? Yes. Apify automatically offers CSV, JSON, Excel, XML and HTML on every dataset.

Are the engagement filters applied before billing? Yes. If a post fails your minReactions / paywallFilter / language / date filter, it's never written to the dataset and you don't pay for it.

What about Notes vs restacks — what's the difference? A Substack Note is a short original post (like a tweet). A restack is an author boosting someone else's post into their feed. Both come back from the user's Notes feed. We tag each item with kind: "note" (original) or kind: "restack" (boost), so you can filter.

Does the cross-publication author graph work? Yes — for any user handle, we surface their full publications[] list (with role: admin / author / etc.). Find a Substack author who also writes on Beehiiv? contactInfo.socials.beehiiv will surface the link if it's in their bio.

How fresh is the data? Real-time per request — the actor reads Substack's public feed at run time.

What's the difference vs other Substack scrapers on Apify? Most return raw post dumps. We add input-time engagement filters (no charge for filtered entities), markdown export ready for AI/RAG, bio-contact extraction (lead-gen), per-publication analytics (engagement rate, virality, cadence, paid-post ratio), historical subscriber-growth tracking, and a cross-publication author graph. The 36 existing Substack scrapers on Apify offer at most one or two of these — never all.

Legal

This actor scrapes only publicly available Substack data. It is not affiliated with, endorsed by, or sponsored by Substack Inc. Use responsibly, respect Substack's Terms of Service and the original authors' copyright. For analytics, research, journalism, lead generation, and AI training corpora.

Maintained by brilliant_gum on Apify.

Substack Posts & Creator Scraper

sleek_waveform/substack-creator-scraper

Scrape posts, engagement metrics, and author data from any Substack publication. Get title, author, publish date, likes, comments, paywall status, and full body in Markdown or HTML. Paginates the full archive automatically.

Daniel Dimitrov

Substack Notes Scraper 🔍

easyapi/substack-notes-scraper

Extract notes and comments from Substack's search results with images, user info, and engagement metrics. Perfect for content analysis, user research, and tracking discussions around specific topics on Substack.

EasyApi

Substack Notes Scraper 🔍📥- Cheap

scrapestorm/substack-notes-scraper---cheap

🔍 Easily Scrape Substack Notes Enter a profile or keyword to collect real-time Notes from Substack 🗒️💬 Get insights like content, author, post date, reactions, restacks, replies & more 📊🧠 Seamlessly integrate with tools like Google Drive to automate community tracking & boost productivity ⚡🧩

Storm_Scraper

Substack Profile Scraper

getdataforme/substack-profile-scraper

The Substack Profile Scraper efficiently extracts detailed data from Substack profiles and posts for analysis, research, and content aggregation....

GetDataForMe

Substack Newsletter Scraper

dataharvest/substack-scraper

Scrape Substack newsletters, posts and comments.

Alex v

Substack Posts Scraper 📚

easyapi/substack-posts-scraper

Scrape Substack posts and articles by keywords. Extract comprehensive post data including title, author, publication details, podcast information, reactions, and more. Perfect for content analysis and research.

EasyApi

219

1.9

Substack Scraper

noximilian/substack-scraper

Scrape Substack newsletters — fetch post archives, individual posts, comments, recommendations, and publication metadata. Search Substack for publications and content. No auth required for public content.

Noximilian

Substack Scraper: Posts, Comments & Authors

doggo/substack-scraper-posts-comments-authors

Scrape any Substack publication: post archives, article text, comments, author profiles and subscriber signals. Search across newsletters and export structured data for research, monitoring and AI datasets. No browser. Output to CSV, JSON or Excel.