Substack Scraper — Posts, Authors & Newsletter Data
Pricing
from $0.60 / 1,000 post extracted (overview)s
Substack Scraper — Posts, Authors & Newsletter Data
Substack newsletter scraper for any publication. Extract posts: title, subtitle, author, date, reactions, comments, restacks, word count, cover image — plus full article HTML in detail mode. Search by handle, subdomain or custom domain. Clean JSON/CSV, no-code, no API key needed.
Pricing
from $0.60 / 1,000 post extracted (overview)s
Rating
0.0
(0)
Developer
SIÁN OÜ
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
Substack Scraper — Posts, Authors & Newsletter Data 🚀
🎉 Turn any Substack newsletter into a clean, structured dataset — metadata, engagement, authors, and full article text.
Built for content researchers, marketers, and analysts who need newsletter data at scale.
📋 Overview
Point it at any Substack publication and get every post as clean structured data — no account, no API key, no copy-pasting. Works with handles, *.substack.com subdomains, and custom domains alike.
Why professionals choose this scraper:
- ✅ Complete post data: title, subtitle, author(s), publish date, word count, cover image, and more — 30+ fields per post
- ⚡ Two modes, one tool: fast Overview for bulk archive metadata, or Detail for the full article HTML
- 🎯 Engagement built in: reactions, comments, restacks, and a combined engagement score on every row
- 💰 Pay-per-result: only pay for posts you actually extract — transparent and predictable
- 💎 Any publication: handle, subdomain, or custom domain — all supported automatically
- ✨ Clean export: JSON, CSV, or Excel straight from the Apify dataset
✨ Features
- 📰 Publication Archive Scraping: page through a newsletter's entire back catalog of posts
- 📄 Full Article HTML: pull the complete post body, ready for analysis or archiving
- ✍️ Author Extraction: every byline with name, handle, and avatar
- ❤️ Engagement Metrics: reactions (with emoji breakdown), comments, and restacks
- 🖼️ Image Capture: cover image plus every in-body image URL, deduplicated
- 🔓 Audience Tags: see which posts are free, paid, or founder-only at a glance
- 🔢 Smart Limits: cap posts per run and pages per publication for precise, predictable runs
- 🌐 Custom Domains: works seamlessly whether a writer uses
*.substack.comor their own domain
🎬 Quick Start
Paste a publication handle, pick a mode, and run. Results land in your dataset within seconds.
curl -X POST https://api.apify.com/v2/acts/sian.agency~substack-scraper/runs?token=YOUR_TOKEN \-H 'Content-Type: application/json' \-d '{"scrapeMode": "overview", "publications": ["bigtechnology"], "maxResults": 50}'
🚀 Getting Started (3 Simple Steps)
Step 1: Add your publications
Enter one or more Substack publications — a handle (bigtechnology), a subdomain (bigtechnology.substack.com), or a custom domain (www.bigtechnology.com).
Step 2: Choose a mode
Pick Overview for fast archive metadata, or Detail for the complete article HTML.
Step 3: Run and export
Start the actor and download your results as JSON, CSV, or Excel.
That's it! In under a minute, you'll have:
- A structured table of posts with authors and dates
- Engagement metrics for every post
- (Detail mode) the full article text, ready to analyze
📥 Input Configuration
| Field | Type | Required | Description |
|---|---|---|---|
| scrapeMode | string | No | overview (archive metadata) or detail (full article HTML) |
| publications | array | No | Publications to scrape — handle, subdomain, or custom domain |
| postUrls | array | No | Specific post URLs (/p/<slug>) to fetch in Detail mode |
| maxResults | integer | No | Max posts per run (FREE: 25, PAID: unlimited) |
| endpoint | string | No | archive (recommended) or posts |
| sort | string | No | new (newest first) or top (most popular) |
| maxPagesPerPublication | integer | No | Limit pages crawled per publication |
| useProxy | boolean | No | Optional residential routing for very high volume |
Example — Overview:
{"scrapeMode": "overview","publications": ["bigtechnology", "www.thefp.com"],"sort": "new","maxResults": 100}
Example — Detail by URL:
{"scrapeMode": "detail","postUrls": ["https://www.bigtechnology.com/p/greg-brockman-on-openais-plan-to"]}
📤 Output
Results are saved to the Apify dataset with 30+ fields including:
| Field | Type | Description |
|---|---|---|
| title | string | Post title |
| subtitle | string | Subtitle or social title |
| url | string | Canonical post URL |
| post_date | string | ISO 8601 publish date |
| audience | string | everyone / only_paid / founding |
| reaction_count | number | Total reactions |
| comment_count | number | Number of comments |
| restacks | number | Number of restacks |
| engagement_total | number | Reactions + comments + restacks |
| wordcount | number | Article word count |
| bylines | array | Authors with name, handle, avatar |
| cover_image | string | Cover image URL |
| body_html | string | Full article HTML (Detail mode) |
Example:
{"id": 202753370,"title": "Greg Brockman On OpenAI's Plan To Win: Compute Rules All","url": "https://www.bigtechnology.com/p/greg-brockman-on-openais-plan-to","publication_host": "bigtechnology.substack.com","post_date": "2026-06-19T18:39:54.460Z","audience": "everyone","reaction_count": 27,"comment_count": 3,"restacks": 2,"engagement_total": 32,"wordcount": 1017,"bylines": [{ "name": "Marty Swant", "handle": "martyswant" }],"thumbnail": "https://substackcdn.com/image/...","source": "overview"}
💼 Use Cases & Examples
1. Newsletter Content Research
Analysts mapping what a publication covers over time. Input: A publication handle in Overview mode Output: Every post with title, date, and engagement Use: Spot themes, cadence, and top-performing topics
2. Competitor Monitoring
Marketers tracking rival newsletters. Input: Several competitor publications Output: A unified table of their recent posts and engagement Use: Benchmark posting frequency and audience response
3. Author & Topic Analysis
Researchers studying writers across publications. Input: Publications in Overview mode Output: Bylines, word counts, and topics per post Use: Identify prolific authors and trending subjects
4. Engagement Benchmarking
Growth teams measuring what resonates. Input: A publication archive Output: Reactions, comments, and restacks per post Use: Find the format and length that drive engagement
5. Content Archiving
Teams preserving a newsletter's back catalog. Input: A publication in Detail mode Output: Full article HTML for every post Use: Build a searchable internal archive
6. Building a Reading Dataset
Data teams creating a corpus for analysis. Input: Multiple publications in Detail mode Output: Clean post text plus metadata Use: Feed downstream NLP, search, or summarization
🔗 Integration Examples
JavaScript/Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_TOKEN' });const run = await client.actor('sian.agency/substack-scraper').call({scrapeMode: 'overview',publications: ['bigtechnology'],maxResults: 50});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items[0]);
Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_TOKEN')run = client.actor('sian.agency/substack-scraper').call(run_input={'scrapeMode': 'overview', 'publications': ['bigtechnology'], 'maxResults': 50})for item in client.dataset(run['defaultDatasetId']).iterate_items():print(item)
cURL
curl -X POST 'https://api.apify.com/v2/acts/sian.agency~substack-scraper/runs?token=YOUR_TOKEN' \-H 'Content-Type: application/json' \-d '{"scrapeMode": "detail", "publications": ["bigtechnology"], "maxResults": 20}'
Automation Workflows (N8N / Zapier / Make)
- Trigger: Schedule or webhook
- HTTP Request: Call the actor API
- Process: Handle the JSON results
- Action: Save, notify, or transform
📊 Performance & Pricing
FREE Tier (Try It Now)
- 25 posts per run — full feature access, same quality
- No credit card required
- Perfect for testing and small projects
PAID Tier (Production Ready)
- Unlimited posts per run
- Faster, uninterrupted processing
- Pay-per-result: only charged for posts you actually extract
💰 Pay only for what you extract — predictable, transparent per-post pricing.
❓ Frequently Asked Questions
Q: How many posts can I process? A: FREE tier: 25 per run. PAID tier: unlimited.
Q: Does it work with custom domains?
A: Yes — handles, *.substack.com subdomains, and custom domains all work automatically.
Q: Can I get the full article text? A: Yes — use Detail mode to capture the complete article HTML for each post.
Q: What about paid or subscriber-only posts? A: Public metadata (title, author, engagement) is always available. Full body content for paid posts may be limited where the publication restricts it.
Q: What output formats are available? A: JSON, CSV, and Excel — export directly from the Apify dataset.
Q: Is this legal? A: It extracts only publicly available data. See the legal note below.
🐞 Troubleshooting
No posts returned
- Check the publication handle/domain is spelled correctly
- Try the
archiveendpoint withsort: new
Fewer posts than expected
- Raise
maxResultsand/ormaxPagesPerPublication - FREE tier is capped at 25 posts per run
Missing article body
- Full HTML is only captured in Detail mode
- Some paid posts limit body content
⚖️ Is it legal to scrape data?
Our actors are ethical and do not extract any private user data, such as email addresses, gender, or location. They only extract what the user has chosen to share publicly. We therefore believe that our actors, when used for ethical purposes by Apify users, are safe.
However, you should be aware that your results could contain personal data. Personal data is protected by the GDPR in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers.
You can also read Apify's blog post on the legality of web scraping.
Trademark notice: Substack is a trademark of Substack Inc. This actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Substack Inc. All product names, logos, and brands are property of their respective owners.
🤝 Support
Join our active support community
- For issues or questions, open an issue in the actor's repository
- Check the SIÁN Agency Store for more automation tools
- 📧 apify@sian-agency.online
Built by SIÁN Agency | More Tools