Substack Scraper — Posts, Authors & Newsletter Data avatar

Substack Scraper — Posts, Authors & Newsletter Data

Pricing

from $0.60 / 1,000 post extracted (overview)s

Go to Apify Store
Substack Scraper — Posts, Authors & Newsletter Data

Substack Scraper — Posts, Authors & Newsletter Data

Substack newsletter scraper for any publication. Extract posts: title, subtitle, author, date, reactions, comments, restacks, word count, cover image — plus full article HTML in detail mode. Search by handle, subdomain or custom domain. Clean JSON/CSV, no-code, no API key needed.

Pricing

from $0.60 / 1,000 post extracted (overview)s

Rating

0.0

(0)

Developer

SIÁN OÜ

SIÁN OÜ

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

Substack Scraper — Posts, Authors & Newsletter Data 🚀

SIÁN Agency Store Instagram AI Transcript TikTok AI Transcript Facebook AI Transcript

🎉 Turn any Substack newsletter into a clean, structured dataset — metadata, engagement, authors, and full article text.

Built for content researchers, marketers, and analysts who need newsletter data at scale.


📋 Overview

Point it at any Substack publication and get every post as clean structured data — no account, no API key, no copy-pasting. Works with handles, *.substack.com subdomains, and custom domains alike.

Why professionals choose this scraper:

  • Complete post data: title, subtitle, author(s), publish date, word count, cover image, and more — 30+ fields per post
  • Two modes, one tool: fast Overview for bulk archive metadata, or Detail for the full article HTML
  • 🎯 Engagement built in: reactions, comments, restacks, and a combined engagement score on every row
  • 💰 Pay-per-result: only pay for posts you actually extract — transparent and predictable
  • 💎 Any publication: handle, subdomain, or custom domain — all supported automatically
  • Clean export: JSON, CSV, or Excel straight from the Apify dataset

✨ Features

  • 📰 Publication Archive Scraping: page through a newsletter's entire back catalog of posts
  • 📄 Full Article HTML: pull the complete post body, ready for analysis or archiving
  • ✍️ Author Extraction: every byline with name, handle, and avatar
  • ❤️ Engagement Metrics: reactions (with emoji breakdown), comments, and restacks
  • 🖼️ Image Capture: cover image plus every in-body image URL, deduplicated
  • 🔓 Audience Tags: see which posts are free, paid, or founder-only at a glance
  • 🔢 Smart Limits: cap posts per run and pages per publication for precise, predictable runs
  • 🌐 Custom Domains: works seamlessly whether a writer uses *.substack.com or their own domain

🎬 Quick Start

Paste a publication handle, pick a mode, and run. Results land in your dataset within seconds.

curl -X POST https://api.apify.com/v2/acts/sian.agency~substack-scraper/runs?token=YOUR_TOKEN \
-H 'Content-Type: application/json' \
-d '{"scrapeMode": "overview", "publications": ["bigtechnology"], "maxResults": 50}'

🚀 Getting Started (3 Simple Steps)

Step 1: Add your publications

Enter one or more Substack publications — a handle (bigtechnology), a subdomain (bigtechnology.substack.com), or a custom domain (www.bigtechnology.com).

Step 2: Choose a mode

Pick Overview for fast archive metadata, or Detail for the complete article HTML.

Step 3: Run and export

Start the actor and download your results as JSON, CSV, or Excel.

That's it! In under a minute, you'll have:

  • A structured table of posts with authors and dates
  • Engagement metrics for every post
  • (Detail mode) the full article text, ready to analyze

📥 Input Configuration

FieldTypeRequiredDescription
scrapeModestringNooverview (archive metadata) or detail (full article HTML)
publicationsarrayNoPublications to scrape — handle, subdomain, or custom domain
postUrlsarrayNoSpecific post URLs (/p/<slug>) to fetch in Detail mode
maxResultsintegerNoMax posts per run (FREE: 25, PAID: unlimited)
endpointstringNoarchive (recommended) or posts
sortstringNonew (newest first) or top (most popular)
maxPagesPerPublicationintegerNoLimit pages crawled per publication
useProxybooleanNoOptional residential routing for very high volume

Example — Overview:

{
"scrapeMode": "overview",
"publications": ["bigtechnology", "www.thefp.com"],
"sort": "new",
"maxResults": 100
}

Example — Detail by URL:

{
"scrapeMode": "detail",
"postUrls": [
"https://www.bigtechnology.com/p/greg-brockman-on-openais-plan-to"
]
}

📤 Output

Results are saved to the Apify dataset with 30+ fields including:

FieldTypeDescription
titlestringPost title
subtitlestringSubtitle or social title
urlstringCanonical post URL
post_datestringISO 8601 publish date
audiencestringeveryone / only_paid / founding
reaction_countnumberTotal reactions
comment_countnumberNumber of comments
restacksnumberNumber of restacks
engagement_totalnumberReactions + comments + restacks
wordcountnumberArticle word count
bylinesarrayAuthors with name, handle, avatar
cover_imagestringCover image URL
body_htmlstringFull article HTML (Detail mode)

Example:

{
"id": 202753370,
"title": "Greg Brockman On OpenAI's Plan To Win: Compute Rules All",
"url": "https://www.bigtechnology.com/p/greg-brockman-on-openais-plan-to",
"publication_host": "bigtechnology.substack.com",
"post_date": "2026-06-19T18:39:54.460Z",
"audience": "everyone",
"reaction_count": 27,
"comment_count": 3,
"restacks": 2,
"engagement_total": 32,
"wordcount": 1017,
"bylines": [{ "name": "Marty Swant", "handle": "martyswant" }],
"thumbnail": "https://substackcdn.com/image/...",
"source": "overview"
}

💼 Use Cases & Examples

1. Newsletter Content Research

Analysts mapping what a publication covers over time. Input: A publication handle in Overview mode Output: Every post with title, date, and engagement Use: Spot themes, cadence, and top-performing topics

2. Competitor Monitoring

Marketers tracking rival newsletters. Input: Several competitor publications Output: A unified table of their recent posts and engagement Use: Benchmark posting frequency and audience response

3. Author & Topic Analysis

Researchers studying writers across publications. Input: Publications in Overview mode Output: Bylines, word counts, and topics per post Use: Identify prolific authors and trending subjects

4. Engagement Benchmarking

Growth teams measuring what resonates. Input: A publication archive Output: Reactions, comments, and restacks per post Use: Find the format and length that drive engagement

5. Content Archiving

Teams preserving a newsletter's back catalog. Input: A publication in Detail mode Output: Full article HTML for every post Use: Build a searchable internal archive

6. Building a Reading Dataset

Data teams creating a corpus for analysis. Input: Multiple publications in Detail mode Output: Clean post text plus metadata Use: Feed downstream NLP, search, or summarization


🔗 Integration Examples

JavaScript/Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('sian.agency/substack-scraper').call({
scrapeMode: 'overview',
publications: ['bigtechnology'],
maxResults: 50
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0]);

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_TOKEN')
run = client.actor('sian.agency/substack-scraper').call(
run_input={'scrapeMode': 'overview', 'publications': ['bigtechnology'], 'maxResults': 50}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)

cURL

curl -X POST 'https://api.apify.com/v2/acts/sian.agency~substack-scraper/runs?token=YOUR_TOKEN' \
-H 'Content-Type: application/json' \
-d '{"scrapeMode": "detail", "publications": ["bigtechnology"], "maxResults": 20}'

Automation Workflows (N8N / Zapier / Make)

  1. Trigger: Schedule or webhook
  2. HTTP Request: Call the actor API
  3. Process: Handle the JSON results
  4. Action: Save, notify, or transform

📊 Performance & Pricing

FREE Tier (Try It Now)

  • 25 posts per run — full feature access, same quality
  • No credit card required
  • Perfect for testing and small projects
  • Unlimited posts per run
  • Faster, uninterrupted processing
  • Pay-per-result: only charged for posts you actually extract

💰 Pay only for what you extract — predictable, transparent per-post pricing.

🔗 View current pricing


❓ Frequently Asked Questions

Q: How many posts can I process? A: FREE tier: 25 per run. PAID tier: unlimited.

Q: Does it work with custom domains? A: Yes — handles, *.substack.com subdomains, and custom domains all work automatically.

Q: Can I get the full article text? A: Yes — use Detail mode to capture the complete article HTML for each post.

Q: What about paid or subscriber-only posts? A: Public metadata (title, author, engagement) is always available. Full body content for paid posts may be limited where the publication restricts it.

Q: What output formats are available? A: JSON, CSV, and Excel — export directly from the Apify dataset.

Q: Is this legal? A: It extracts only publicly available data. See the legal note below.


🐞 Troubleshooting

No posts returned

  • Check the publication handle/domain is spelled correctly
  • Try the archive endpoint with sort: new

Fewer posts than expected

  • Raise maxResults and/or maxPagesPerPublication
  • FREE tier is capped at 25 posts per run

Missing article body

  • Full HTML is only captured in Detail mode
  • Some paid posts limit body content

Our actors are ethical and do not extract any private user data, such as email addresses, gender, or location. They only extract what the user has chosen to share publicly. We therefore believe that our actors, when used for ethical purposes by Apify users, are safe.

However, you should be aware that your results could contain personal data. Personal data is protected by the GDPR in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers.

You can also read Apify's blog post on the legality of web scraping.

Trademark notice: Substack is a trademark of Substack Inc. This actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Substack Inc. All product names, logos, and brands are property of their respective owners.


🤝 Support

Telegram Support

Join our active support community


Built by SIÁN Agency | More Tools