Pricing

from $2.00 / 1,000 posts

Substack Posts & Creator Scraper

Scrape posts, engagement metrics, and author data from any Substack publication. Get title, author, publish date, likes, comments, paywall status, and full body in Markdown or HTML. Paginates the full archive automatically.

Pricing

from $2.00 / 1,000 posts

Rating

0.0

(0)

Developer

Daniel Dimitrov

Actor stats

Bookmarked

Total users

Monthly active users

2 days ago

Last modified

[1.1.6] - 2026-07-12

Performance

Parallel post API fetches (markdown/html): Post bodies are fetched with concurrency 5 instead of serially (~2s each), cutting per-publication time for paying customers with maxItems=20.
Batched pushData + charge({ count }): One dataset write and one billing call per publication batch instead of per post.

Fixed

Datacenter proxy skipped for profile feeds: substack.com/@user/posts pages did not render /p/ links through Apify datacenter proxy (20s timeout → 0 posts). When any start URL is a profile feed, run without proxy so publication base can be derived from post links.

[1.1.5] - 2026-03-09

Fixed

Root cause fix for post data extraction: Substack's window._preloads is now assigned as JSON.parse("...") where the full page payload (including post data, author, body_html) is a JSON-encoded string argument. The previous extraction logic found the first { inside that string, parsed only the config metadata (isEU, base_url, etc.), and returned early — leaving post content unreachable. The new Attempt 1 uses a regex that correctly captures the full string argument from JSON.parse("..."), decodes it via JSON.parse('"' + content + '"'), then parses the resulting JSON. This recovers post.title, post.body_html, post.publishedBylines, post.comment_count, post.reactions, and all other post fields.

[1.1.4] - 2026-03-09

Fixed

extractNextData now handles Substack's newer HTML format where window._preloads is a JSON-encoded string assignment (= "{\"...\"}") rather than a plain object. Added JS string-escape unescaping as Attempt 2.
Added __NEXT_DATA__ script-tag extraction as a fallback (Attempt 3) for resilience against future Substack HTML changes.

[1.1.3] - 2026-03-09

Fixed

Archive API response normalisation: Substack returns a flat array [{...}], not { posts: [...] }. Updated ArchiveApiResponse type to a union and added normalisation logic (Array.isArray check) so both shapes are handled.
Pagination now falls back to page-fullness heuristic (≥12 items = more pages) when total_count is absent.

[1.1.2] - 2026-03-09

Fixed

ARCHIVE_API handler: replaced $('body').text() with context.body (raw response string). CheerioCrawler does not initialise Cheerio for application/json responses, causing $ is not a function errors and preventing any posts from being scraped.

[1.1.1] - 2026-03-09

Fixed

Added Substack detection guard in the DEFAULT handler — non-Substack URLs now log a clear error and exit cleanly instead of silently failing with a JSON parse error on the archive API endpoint.

[1.1.0] - 2025-07-07

Added

webhookUrl optional input field — pass any HTTPS URL to receive a POST notification when the Actor run succeeds, fails, times out, or is aborted. Enables zero-config integration with Zapier, Make, n8n, and custom pipelines.
Webhook is registered via Actor.addWebhook() with an idempotency key (APIFY_ACTOR_RUN_ID) to prevent duplicate registrations on Actor restart.
New Integrations section in README covering webhook, Apify Integrations tab, and REST API workflows.

Fixed

Critical bug: ARCHIVE_API handler was parsing request.payload (always undefined for GET requests) instead of the actual API response body. Publication archive scraping now correctly uses $('body').text() to read the JSON response.
Free-tier local run cap: when running outside the Apify platform, runs are capped at 50 posts with a clear status message, preventing accidental large local jobs.

Changed

actor.json: Added categories: ["SOCIAL_MEDIA", "MARKETING"], minMemoryMbytes: 256, defaultMemoryMbytes: 512, improved description tagline.
output_schema.json: Populated with full definitions for all 9 output fields (was empty {}).
README: Full rewrite to match Apify Store format — question-format H2 headers, no technical internals, no broken badges or placeholders, corrected pricing model (Pay Per Result).
package.json: Fixed "author" placeholder value.

[1.0.0] - 2026-03-09

Initial release of Substack Posts & Creator Scraper.
Implemented extremely fast JSON extraction via Substack's __NEXT_DATA__.
Extracted post bodies, paywall statuses, and engagement metrics (likes/comments).

Substack Post Scraper

seemuapps/substack-post-scraper

Scrape all posts from any Substack publication. Title, publish date, likes, comments, restacks, word count, paywall status, and author for every post in the archive.

Andrew

Substack Newsletter Scraper

devilscrapes/substack-newsletter-scraper

Scrape posts from any Substack publication — title, subtitle, date, paywall status, reaction count, comment count, word count, and full body HTML for free posts. Handles custom domains. Paginates to the full archive.

DevilScrapes

Substack Scraper – Newsletter Posts, Engagement & Monitoring

bitofacoder/substack-scraper

Scrape any Substack newsletter's full post archive with engagement metadata (likes, comments, paywall status, word count, authors), fetch single posts, and monitor newsletters incrementally — via Substack's public JSON API. No login.

Bobby

Substack Posts Scraper — Newsletter Archive & Stats

darknezz/substack-posts-scraper

Scrape any Substack publication's post archive: titles, subtitles, publish dates, likes, comments, paywall status and (optionally) full post text. Works with custom domains. Perfect for newsletter research, content analysis and AI training data.

Oaida Adrian

Substack Newsletter Scraper - Posts & Archives

wetyr_corporation/substack-newsletter-scraper

Bulk extract posts from any public Substack newsletter. Title, full content, author, paywall status, reactions. For creator economy intelligence and content trends.

WETYR

Substack Posts Scraper 📚

easyapi/substack-posts-scraper

Scrape Substack posts and articles by keywords. Extract comprehensive post data including title, author, publication details, podcast information, reactions, and more. Perfect for content analysis and research.

EasyApi

204

1.9

Substack Scraper

automation-lab/substack-scraper

Scrape Substack newsletters — posts, comments, publication metadata. Full archive depth with no caps. Export to JSON, CSV, Excel, or connect via API.

Stas Persiianenko

239

Substack Post Content Fetcher

seemuapps/substack-post-content

Fetch the full HTML content of any public Substack post by URL. Body text, title, subtitle, tags, engagement stats, and author details.

Andrew

Substack Posts Scraper

getdataforme/substack-posts-scraper

The Substack Posts Scraper efficiently extracts and organizes content from Substack posts for market research, competitive intelligence, and content aggregation....

GetDataForMe

Substack Newsletter Scraper

dataharvest/substack-scraper

Scrape Substack newsletters, posts and comments.

Alex v

Substack Posts & Creator Scraper