Substack Scraper — Publication Posts | $1.50/1K avatar

Substack Scraper — Publication Posts | $1.50/1K

Pricing

from $3.00 / 1,000 listings

Go to Apify Store
Substack Scraper — Publication Posts | $1.50/1K

Substack Scraper — Publication Posts | $1.50/1K

Scrape any Substack newsletter's post list via the official Substack public API. No auth, no proxy. Title, subtitle, date, free/paid audience, type, reactions, restacks, podcast_url. Podcast posts billed at premium rate ($2.50/1K). Pay per post.

Pricing

from $3.00 / 1,000 listings

Rating

0.0

(0)

Developer

Vitalii Bondarev

Vitalii Bondarev

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 hours ago

Last modified

Share

Substack Scraper — Publication Posts & Metadata | $1.50/1K | No Auth, Official API

For newsletter researchers, content agencies, competitive intelligence teams, and AI pipelines that need Substack content at scale.

Pricing: $1.50 per 1,000 post records · $2.50 per 1,000 podcast posts (posts where type=podcast and podcast_url is present — audio file URL included). No monthly fees. No authentication required.

Scrape any Substack publication's post listing via the official public REST API — no authentication, no proxy, no browser required. Returns structured metadata for every post: title, subtitle, publish date, audience (free vs paid), post type, reactions, comments, restacks, cover image, wordcount, and canonical URL.

Pay per post returned (PPE pricing).


What you get

FieldDescription
titlePost title
subtitleDeck / tagline
canonical_urlFull URL to the post
slugURL slug
post_datePublished timestamp (ISO 8601 UTC)
audienceeveryone (free) or only_paid (paywalled)
typenewsletter, podcast, video, etc.
podcast_urlAudio file URL (podcast posts only)
reactions_countTotal hearts
comment_countNumber of comments
restacksNumber of Substack reposts
cover_imageCover image URL
wordcountApproximate word count
publication_slugShort publication identifier
parse_confidenceData quality score 0–1
warningsList of missing-field codes

Note: Post bodies (full text / HTML) are not returned by the listing API. Paywalled posts return metadata only — body content requires a paid subscription and is not scraped.


Pricing example

$1.50 per 1,000 newsletter posts · $2.50 per 1,000 podcast posts (posts where type=podcast and podcast_url is present). A 500-post archive = $0.75. Scraping 5 newsletters × 200 posts = $1.50. No per-run fee.

Sample output

{
"title": "The Collapse of Web Scraping",
"subtitle": "Why every major site now requires a browser — and what to do about it",
"canonical_url": "https://on.substack.com/p/the-collapse-of-web-scraping",
"post_date": "2026-05-18T14:00:00Z",
"audience": "everyone",
"type": "newsletter",
"podcast_url": null,
"reactions_count": 312,
"comment_count": 47,
"restacks": 89,
"wordcount": 2100,
"publication_slug": "on",
"parse_confidence": 1.0,
"scraped_at": "2026-06-05T09:00:00Z"
}

Frequently asked questions

Do I need a Substack account or API key? No. The actor uses the official Substack public listing API — no authentication required for public publications.

Do I need a proxy? No. The Substack API is open. Zero proxy cost to you.

What formats does the output come in? JSON, CSV, and Excel via the Apify dataset. Native integration with n8n, Make, Zapier.

What if the publication returns a 403 or empty results? Some publications restrict their API access (e.g. bankless.substack.com). The actor logs the error and exits cleanly — it pushes nothing and charges nothing. Switch to a different slug and try again.

Input

ParameterTypeDefaultDescription
publicationstringonSlug (e.g. on), full URL (e.g. https://on.substack.com), or custom domain
maxPostsinteger100Max posts to return. 0 = no limit (fetch entire archive)

Publication examples

on → on.substack.com
bankless → bankless.substack.com
https://on.substack.com → on.substack.com (same)
https://platformer.news → custom domain (supported)

How it works

Uses the Substack per-publication public REST endpoint:

GET https://<publication>.substack.com/api/v1/posts?offset=0&limit=50

Paginates via offset until all posts are retrieved or maxPosts is reached. No auth headers needed. No proxy required for public publications.


Our edge over incumbents

  • Reliable pagination — offset-based, not page-based; survives large archives.
  • Reactions normalized — raw reactions dict summed to reactions_count (compatible with publications adding new reaction types).
  • parse_confidence score — every record includes a data-quality score and warnings list so you can detect schema drift without re-running.
  • restacks field — Substack's repost count, absent from most competitor actors.
  • Custom domain support — not just *.substack.com slugs.

Limitations

  • Post bodies not included — listing API returns metadata only. Full HTML/Markdown bodies require the individual post endpoint (not in this actor's scope).
  • Paywalled posts — metadata is returned for all posts, but body content is not accessible without a paid subscription.
  • Publications blocking API access — some publications (e.g. bankless.substack.com) return 403; this is a publication-level restriction, not a Substack platform restriction.

Competitor comparison

This scraperOther Substack actors
Official Substack APIpartial
restacks field
Custom domain support
podcast_url field
parse_confidence on every record
No auth required

Podcast use case

The podcast_url field makes this actor useful for extracting Substack podcast episode lists without an RSS parser. Filter records where type == "podcast" and podcast_url is non-null.

Podcast posts are billed at the podcast-post premium event ($2.50/1K, set in the Apify console) because they include a direct audio file URL — useful for media monitoring tools, podcast discovery platforms, and content aggregators that need the actual audio stream. Regular newsletter posts are billed at the standard post-item rate ($1.50/1K).

Monitoring use case

Track a competitor's newsletter for new posts — run daily and filter by post_date to see only new content. Set maxPosts=20 for fast incremental runs.

Use with AI Agents (MCP)

This Substack scraper is callable as a tool by AI agents (Claude Desktop, Cursor, VS Code, n8n, or any MCP-compatible client) via Apify's hosted Model Context Protocol server.

{
"mcpServers": {
"apify": {
"command": "npx",
"args": [
"mcp-remote",
"https://mcp.apify.com/?tools=bovi/substack-publication",
"--header",
"Authorization: Bearer <YOUR_APIFY_TOKEN>"
]
}
}
}

Integrations

Built for newsletter researchers, content agencies, and AI-pipeline teams ingesting Substack post metadata at scale — the JSON/dataset output drops into the tools you already run, no glue code:

  • n8n / Make / Zapier — trigger a run or pipe every new dataset item into 500+ apps (Google Sheets, Airtable, Slack, HubSpot, your database) with no code: n8n, Make, Zapier.
  • Webhooks — fire your own endpoint the moment a run finishes, to push results straight into your pipeline (docs).
  • MCP server — expose this actor as a tool to Claude, Cursor, or any MCP client so an AI agent can pull this data mid-conversation (guide).
  • API & SDKs — fetch the dataset as JSON, CSV, or Excel through the Apify REST API or the Python / JS SDKs.

See all Apify integrations.