Substack Scraper — Publication Posts | $1.50/1K
Pricing
from $3.00 / 1,000 listings
Substack Scraper — Publication Posts | $1.50/1K
Scrape any Substack newsletter's post list via the official Substack public API. No auth, no proxy. Title, subtitle, date, free/paid audience, type, reactions, restacks, podcast_url. Podcast posts billed at premium rate ($2.50/1K). Pay per post.
Pricing
from $3.00 / 1,000 listings
Rating
0.0
(0)
Developer
Vitalii Bondarev
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 hours ago
Last modified
Categories
Share
Substack Scraper — Publication Posts & Metadata | $1.50/1K | No Auth, Official API
For newsletter researchers, content agencies, competitive intelligence teams, and AI pipelines that need Substack content at scale.
Pricing: $1.50 per 1,000 post records · $2.50 per 1,000 podcast posts (posts where type=podcast and podcast_url is present — audio file URL included). No monthly fees. No authentication required.
Scrape any Substack publication's post listing via the official public REST API — no authentication, no proxy, no browser required. Returns structured metadata for every post: title, subtitle, publish date, audience (free vs paid), post type, reactions, comments, restacks, cover image, wordcount, and canonical URL.
Pay per post returned (PPE pricing).
What you get
| Field | Description |
|---|---|
title | Post title |
subtitle | Deck / tagline |
canonical_url | Full URL to the post |
slug | URL slug |
post_date | Published timestamp (ISO 8601 UTC) |
audience | everyone (free) or only_paid (paywalled) |
type | newsletter, podcast, video, etc. |
podcast_url | Audio file URL (podcast posts only) |
reactions_count | Total hearts |
comment_count | Number of comments |
restacks | Number of Substack reposts |
cover_image | Cover image URL |
wordcount | Approximate word count |
publication_slug | Short publication identifier |
parse_confidence | Data quality score 0–1 |
warnings | List of missing-field codes |
Note: Post bodies (full text / HTML) are not returned by the listing API. Paywalled posts return metadata only — body content requires a paid subscription and is not scraped.
Pricing example
$1.50 per 1,000 newsletter posts · $2.50 per 1,000 podcast posts (posts where type=podcast and podcast_url is present). A 500-post archive = $0.75. Scraping 5 newsletters × 200 posts = $1.50. No per-run fee.
Sample output
{"title": "The Collapse of Web Scraping","subtitle": "Why every major site now requires a browser — and what to do about it","canonical_url": "https://on.substack.com/p/the-collapse-of-web-scraping","post_date": "2026-05-18T14:00:00Z","audience": "everyone","type": "newsletter","podcast_url": null,"reactions_count": 312,"comment_count": 47,"restacks": 89,"wordcount": 2100,"publication_slug": "on","parse_confidence": 1.0,"scraped_at": "2026-06-05T09:00:00Z"}
Frequently asked questions
Do I need a Substack account or API key? No. The actor uses the official Substack public listing API — no authentication required for public publications.
Do I need a proxy? No. The Substack API is open. Zero proxy cost to you.
What formats does the output come in? JSON, CSV, and Excel via the Apify dataset. Native integration with n8n, Make, Zapier.
What if the publication returns a 403 or empty results? Some publications restrict their API access (e.g. bankless.substack.com). The actor logs the error and exits cleanly — it pushes nothing and charges nothing. Switch to a different slug and try again.
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
publication | string | on | Slug (e.g. on), full URL (e.g. https://on.substack.com), or custom domain |
maxPosts | integer | 100 | Max posts to return. 0 = no limit (fetch entire archive) |
Publication examples
on → on.substack.combankless → bankless.substack.comhttps://on.substack.com → on.substack.com (same)https://platformer.news → custom domain (supported)
How it works
Uses the Substack per-publication public REST endpoint:
GET https://<publication>.substack.com/api/v1/posts?offset=0&limit=50
Paginates via offset until all posts are retrieved or maxPosts is reached. No auth headers needed. No proxy required for public publications.
Our edge over incumbents
- Reliable pagination — offset-based, not page-based; survives large archives.
- Reactions normalized — raw
reactionsdict summed toreactions_count(compatible with publications adding new reaction types). parse_confidencescore — every record includes a data-quality score andwarningslist so you can detect schema drift without re-running.restacksfield — Substack's repost count, absent from most competitor actors.- Custom domain support — not just
*.substack.comslugs.
Limitations
- Post bodies not included — listing API returns metadata only. Full HTML/Markdown bodies require the individual post endpoint (not in this actor's scope).
- Paywalled posts — metadata is returned for all posts, but body content is not accessible without a paid subscription.
- Publications blocking API access — some publications (e.g. bankless.substack.com) return 403; this is a publication-level restriction, not a Substack platform restriction.
Competitor comparison
| This scraper | Other Substack actors | |
|---|---|---|
| Official Substack API | ✓ | partial |
restacks field | ✓ | ✗ |
| Custom domain support | ✓ | ✗ |
podcast_url field | ✓ | ✗ |
| parse_confidence on every record | ✓ | ✗ |
| No auth required | ✓ | ✓ |
Podcast use case
The podcast_url field makes this actor useful for extracting Substack podcast episode lists without an RSS parser. Filter records where type == "podcast" and podcast_url is non-null.
Podcast posts are billed at the podcast-post premium event ($2.50/1K, set in the Apify console) because they include a direct audio file URL — useful for media monitoring tools, podcast discovery platforms, and content aggregators that need the actual audio stream. Regular newsletter posts are billed at the standard post-item rate ($1.50/1K).
Monitoring use case
Track a competitor's newsletter for new posts — run daily and filter by post_date to see only new content. Set maxPosts=20 for fast incremental runs.
Use with AI Agents (MCP)
This Substack scraper is callable as a tool by AI agents (Claude Desktop, Cursor, VS Code, n8n, or any MCP-compatible client) via Apify's hosted Model Context Protocol server.
{"mcpServers": {"apify": {"command": "npx","args": ["mcp-remote","https://mcp.apify.com/?tools=bovi/substack-publication","--header","Authorization: Bearer <YOUR_APIFY_TOKEN>"]}}}
Integrations
Built for newsletter researchers, content agencies, and AI-pipeline teams ingesting Substack post metadata at scale — the JSON/dataset output drops into the tools you already run, no glue code:
- n8n / Make / Zapier — trigger a run or pipe every new dataset item into 500+ apps (Google Sheets, Airtable, Slack, HubSpot, your database) with no code: n8n, Make, Zapier.
- Webhooks — fire your own endpoint the moment a run finishes, to push results straight into your pipeline (docs).
- MCP server — expose this actor as a tool to Claude, Cursor, or any MCP client so an AI agent can pull this data mid-conversation (guide).
- API & SDKs — fetch the dataset as JSON, CSV, or Excel through the Apify REST API or the Python / JS SDKs.
See all Apify integrations.