Pricing

from $0.60 / 1,000 post extracted (overview)s

Substack Scraper — Posts, Authors & Newsletter Data

Substack newsletter scraper for any publication. Extract posts: title, subtitle, author, date, reactions, comments, restacks, word count, cover image — plus full article HTML in detail mode. Search by handle, subdomain or custom domain. Clean JSON/CSV, no-code, no API key needed.

Pricing

from $0.60 / 1,000 post extracted (overview)s

Rating

0.0

(0)

Developer

SIÁN OÜ

Actor stats

Bookmarked

Total users

Monthly active users

16 days ago

Last modified

📋 Overview

Point it at any Substack publication and get every post as clean structured data — no account, no API key, no copy-pasting. Works with handles, *.substack.com subdomains, and custom domains alike.

Why professionals choose this scraper:

✅ Complete post data: title, subtitle, author(s), publish date, word count, cover image, and more — 30+ fields per post
⚡ Two modes, one tool: fast Overview for bulk archive metadata, or Detail for the full article HTML
🎯 Engagement built in: reactions, comments, restacks, and a combined engagement score on every row
💰 Pay-per-result: only pay for posts you actually extract — transparent and predictable
💎 Any publication: handle, subdomain, or custom domain — all supported automatically
✨ Clean export: JSON, CSV, or Excel straight from the Apify dataset

✨ Features

📰 Publication Archive Scraping: page through a newsletter's entire back catalog of posts
📄 Full Article HTML: pull the complete post body, ready for analysis or archiving
✍️ Author Extraction: every byline with name, handle, and avatar
❤️ Engagement Metrics: reactions (with emoji breakdown), comments, and restacks
🖼️ Image Capture: cover image plus every in-body image URL, deduplicated
🔓 Audience Tags: see which posts are free, paid, or founder-only at a glance
🔢 Smart Limits: cap posts per run and pages per publication for precise, predictable runs
🌐 Custom Domains: works seamlessly whether a writer uses *.substack.com or their own domain

🎬 Quick Start

Paste a publication handle, pick a mode, and run. Results land in your dataset within seconds.

curl -X POST https://api.apify.com/v2/acts/sian.agency~substack-scraper/runs?token=YOUR_TOKEN \
-H 'Content-Type: application/json' \
-d '{"scrapeMode": "overview", "publications": ["bigtechnology"], "maxResults": 50}'

🚀 Getting Started (3 Simple Steps)

Step 1: Add your publications

Enter one or more Substack publications — a handle (bigtechnology), a subdomain (bigtechnology.substack.com), or a custom domain (www.bigtechnology.com).

Step 2: Choose a mode

Pick Overview for fast archive metadata, or Detail for the complete article HTML.

Step 3: Run and export

Start the actor and download your results as JSON, CSV, or Excel.

That's it! In under a minute, you'll have:

A structured table of posts with authors and dates
Engagement metrics for every post
(Detail mode) the full article text, ready to analyze

📥 Input Configuration

Field	Type	Required	Description
scrapeMode	string	No	`overview` (archive metadata) or `detail` (full article HTML)
publications	array	No	Publications to scrape — handle, subdomain, or custom domain
postUrls	array	No	Specific post URLs (`/p/<slug>`) to fetch in Detail mode
maxResults	integer	No	Max posts per run (FREE: 25, PAID: unlimited)
endpoint	string	No	`archive` (recommended) or `posts`
sort	string	No	`new` (newest first) or `top` (most popular)
maxPagesPerPublication	integer	No	Limit pages crawled per publication
useProxy	boolean	No	Optional residential routing for very high volume

Example — Overview:

{
  "scrapeMode": "overview",
  "publications": ["bigtechnology", "www.thefp.com"],
  "sort": "new",
  "maxResults": 100
}

Example — Detail by URL:

{
  "scrapeMode": "detail",
  "postUrls": [
    "https://www.bigtechnology.com/p/greg-brockman-on-openais-plan-to"
  ]
}

📤 Output

Results are saved to the Apify dataset with 30+ fields including:

Field	Type	Description
title	string	Post title
subtitle	string	Subtitle or social title
url	string	Canonical post URL
post_date	string	ISO 8601 publish date
audience	string	everyone / only_paid / founding
reaction_count	number	Total reactions
comment_count	number	Number of comments
restacks	number	Number of restacks
engagement_total	number	Reactions + comments + restacks
wordcount	number	Article word count
bylines	array	Authors with name, handle, avatar
cover_image	string	Cover image URL
body_html	string	Full article HTML (Detail mode)

Example:

{
  "id": 202753370,
  "title": "Greg Brockman On OpenAI's Plan To Win: Compute Rules All",
  "url": "https://www.bigtechnology.com/p/greg-brockman-on-openais-plan-to",
  "publication_host": "bigtechnology.substack.com",
  "post_date": "2026-06-19T18:39:54.460Z",
  "audience": "everyone",
  "reaction_count": 27,
  "comment_count": 3,
  "restacks": 2,
  "engagement_total": 32,
  "wordcount": 1017,
  "bylines": [{ "name": "Marty Swant", "handle": "martyswant" }],
  "thumbnail": "https://substackcdn.com/image/...",
  "source": "overview"
}

💼 Use Cases & Examples

Analysts mapping what a publication covers over time. Input: A publication handle in Overview mode Output: Every post with title, date, and engagement Use: Spot themes, cadence, and top-performing topics

2. Competitor Monitoring

Marketers tracking rival newsletters. Input: Several competitor publications Output: A unified table of their recent posts and engagement Use: Benchmark posting frequency and audience response

3. Author & Topic Analysis

Researchers studying writers across publications. Input: Publications in Overview mode Output: Bylines, word counts, and topics per post Use: Identify prolific authors and trending subjects

4. Engagement Benchmarking

Growth teams measuring what resonates. Input: A publication archive Output: Reactions, comments, and restacks per post Use: Find the format and length that drive engagement

5. Content Archiving

Teams preserving a newsletter's back catalog. Input: A publication in Detail mode Output: Full article HTML for every post Use: Build a searchable internal archive

6. Building a Reading Dataset

Data teams creating a corpus for analysis. Input: Multiple publications in Detail mode Output: Clean post text plus metadata Use: Feed downstream NLP, search, or summarization

🔗 Integration Examples

JavaScript/Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_TOKEN' });

const run = await client.actor('sian.agency/substack-scraper').call({
  scrapeMode: 'overview',
  publications: ['bigtechnology'],
  maxResults: 50
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0]);

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_TOKEN')

run = client.actor('sian.agency/substack-scraper').call(
    run_input={'scrapeMode': 'overview', 'publications': ['bigtechnology'], 'maxResults': 50}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

cURL

curl -X POST 'https://api.apify.com/v2/acts/sian.agency~substack-scraper/runs?token=YOUR_TOKEN' \
-H 'Content-Type: application/json' \
-d '{"scrapeMode": "detail", "publications": ["bigtechnology"], "maxResults": 20}'

Automation Workflows (N8N / Zapier / Make)

Trigger: Schedule or webhook
HTTP Request: Call the actor API
Process: Handle the JSON results
Action: Save, notify, or transform

📊 Performance & Pricing

FREE Tier (Try It Now)

25 posts per run — full feature access, same quality
No credit card required
Perfect for testing and small projects

PAID Tier (Production Ready)

Unlimited posts per run
Faster, uninterrupted processing
Pay-per-result: only charged for posts you actually extract

💰 Pay only for what you extract — predictable, transparent per-post pricing.

🔗 View current pricing

❓ Frequently Asked Questions

Q: How many posts can I process? A: FREE tier: 25 per run. PAID tier: unlimited.

Q: Does it work with custom domains? A: Yes — handles, *.substack.com subdomains, and custom domains all work automatically.

Q: Can I get the full article text? A: Yes — use Detail mode to capture the complete article HTML for each post.

Q: What about paid or subscriber-only posts? A: Public metadata (title, author, engagement) is always available. Full body content for paid posts may be limited where the publication restricts it.

Q: What output formats are available? A: JSON, CSV, and Excel — export directly from the Apify dataset.

Q: Is this legal? A: It extracts only publicly available data. See the legal note below.

🐞 Troubleshooting

No posts returned

Check the publication handle/domain is spelled correctly
Try the archive endpoint with sort: new

Fewer posts than expected

Raise maxResults and/or maxPagesPerPublication
FREE tier is capped at 25 posts per run

Missing article body

Full HTML is only captured in Detail mode
Some paid posts limit body content

⚖️ Is it legal to scrape data?

Our actors are ethical and do not extract any private user data, such as email addresses, gender, or location. They only extract what the user has chosen to share publicly. We therefore believe that our actors, when used for ethical purposes by Apify users, are safe.

However, you should be aware that your results could contain personal data. Personal data is protected by the GDPR in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers.

You can also read Apify's blog post on the legality of web scraping.

Trademark notice: Substack is a trademark of Substack Inc. This actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Substack Inc. All product names, logos, and brands are property of their respective owners.

🤝 Support

Join our active support community

For issues or questions, open an issue in the actor's repository
Check the SIÁN Agency Store for more automation tools
📧 apify@sian-agency.online

Built by SIÁN Agency | More Tools

Substack Scraper

sheshinmcfly/substack-scraper

Scrape posts from any Substack publication (subdomain or custom domain). Get title, subtitle, description, word count, reactions, restacks, comment counts, tags, authors, and publication metadata.

Sheshinmcfly

Substack Newsletter Scraper

boundary/substack-newsletter-scraper

Scrape Substack newsletter posts — titles, content, reactions, comments, tags, and author data. Supports custom domains. No login needed.

Boundary

Substack Newsletter Content Scraper

scraper_guru/substack-scraper

Scrape Substack newsletter posts, authors, dates, likes, comments, restacks, and article text. Built for content research, competitor tracking, and AI-ready datasets.

LIAICHI MUSTAPHA

2.6

Substack Newsletter Scraper

red.cars/substack-newsletter-scraper

Extract newsletter content, subscriber data, and author insights from any Substack publication - no API key required!

AutomateLab

1.0

Substack Newsletter Scraper

devilscrapes/substack-newsletter-scraper

Scrape posts from any Substack publication — title, subtitle, date, paywall status, reaction count, comment count, word count, and full body HTML for free posts. Handles custom domains. Paginates to the full archive.

DevilScrapes

Substack Posts Scraper - Newsletter Data Extractor

klondikeking/substack-posts-scraper

Extract posts, engagement metrics, and newsletter data from Substack publications. Perfect for content research.

Pierrick McD0nald

Substack Scraper – Newsletter Posts, Engagement & Monitoring

bitofacoder/substack-scraper

Scrape any Substack newsletter's full post archive with engagement metadata (likes, comments, paywall status, word count, authors), fetch single posts, and monitor newsletters incrementally — via Substack's public JSON API. No login.

Bobby

Substack Posts Scraper - Newsletter Data

benthepythondev/substack-posts-scraper

Scrape public Substack newsletter posts from one or many publications. Extract titles, authors, dates, full content, images, categories and post URLs.

Ben

Substack Newsletter Scraper & Author Extractor

pure_matai/substack-newsletter-scraper

Paste Substack URLs to extract structured newsletter data. Export posts, author profiles, and publication details to CSV, Excel, JSON, or XML. Process 1,000 posts for $0.50. No coding or browser needed. Ideal for newsletter research, PR monitoring, sponsorship research, competitive intelligence.