Pricing

from $0.50 / 1,000 results

Try for free

Go to Apify Store

Substack Scraper

Try for free

Scrape Substack posts and comments without a login. Pull a publication's archive by newest or top, a single post by URL, and full comment trees. Walks pagination up to your chosen limit. Built for newsletter research, content analysis and audience insight. Clean structured rows, ready for export.

Pricing

from $0.50 / 1,000 results

Rating

5.0

(1)

Developer

Goutam Soni

Actor stats

Bookmarked

Total users

Monthly active users

10 days ago

Last modified

What it does

Publication archive scraper. Give it a Substack username, address, or custom domain and it pulls the publication's posts, newest first or by top, paging automatically up to your chosen limit.
Single post scraper. Pass any post URL and get that post back with its full plain-text body.
Comment scraper. Optionally fetch the full comment tree for each post, flattened into clean rows with author, body, likes, and reply counts.
Rich metrics on every post. Like count, comment count, restacks, and word count, all in one row.
No login, no password, no API key. Just provide publications or post links.
Custom domains supported. Works with both *.substack.com addresses and publications on their own domain.
Tunable scale. Set how many posts per publication, how many comments per post, and how many publications to process in parallel.

Use cases

Newsletter market research. Track what topics, formats, and headlines are getting the most likes, comments, and restacks across the newsletters in your niche.
Lead generation. Build lists of active writers and publications, with author names and post engagement, for outreach or partnerships.
Content monitoring. Watch a set of publications and capture every new post with its metrics on a schedule.
Competitive and trend analysis. Compare engagement and posting cadence across publications you follow.
Audience and sentiment research. Pull comment threads to understand what readers actually say under top posts.
Dataset building. Export thousands of posts to CSV, JSON, or Excel for analysis, dashboards, or model training.

Input

Field	Type	Description
`publications`	array	Publications to pull posts from. Use a username (`example`), a full address (`example.substack.com`), or a custom domain (`news.example.com`).
`postUrls`	array	Specific post links to fetch. The full post body is always included for these.
`maxItemsPerSource`	integer	Cap on posts returned per publication. Pagination is walked across pages until this is reached or the archive runs out. Default 50.
`sort`	string	Order of posts in a publication archive. `new` (newest first) or `top`. Default `new`.
`includeContent`	boolean	When on, each post carries a plain-text body. Default off.
`includeComments`	boolean	When on, the comment tree is also fetched for each post. Default off (keeps runs fast and cheap).
`maxCommentsPerPost`	integer	Cap on comments returned per post when comments are included. Default 100.
`concurrency`	integer	How many publications to process in parallel. Default 5.
`proxyConfig`	object	Proxy configuration. Residential is the default and recommended setting for the most reliable results.

Example input

{
  "publications": ["example.substack.com", "news.example.com"],
  "maxItemsPerSource": 200,
  "sort": "new",
  "includeContent": true,
  "includeComments": false,
  "proxyConfig": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }
}

Output

Each post is one clean row. Columns are ordered by importance: identity first, then engagement metrics, then content, then media, then metadata.

{
  "type": "post",
  "id": 123456789,
  "url": "https://example.substack.com/p/an-example-post",
  "slug": "an-example-post",
  "authorName": "Jane Doe",
  "authorHandle": "janedoe",
  "likeCount": 240,
  "commentCount": 18,
  "restackCount": 12,
  "wordCount": 1200,
  "title": "An example post",
  "subtitle": "A short standfirst.",
  "description": "A one line summary.",
  "tags": ["example", "writing"],
  "content": "Full plain-text body when includeContent is on.",
  "coverImage": "https://example.com/cover.png",
  "podcastUrl": null,
  "audience": "everyone",
  "postType": "newsletter",
  "publishedAt": "2026-06-01T12:00:00.000Z"
}

Each comment is also one row:

{
  "type": "comment",
  "id": 987654321,
  "postId": 123456789,
  "authorName": "Reader Example",
  "authorHandle": "readerexample",
  "likeCount": 5,
  "restackCount": 0,
  "replyCount": 2,
  "body": "Great piece, thanks for writing this.",
  "createdAt": "2026-06-01T14:30:00.000Z",
  "editedAt": null
}

Key fields: type tells a post row from a comment row. likeCount, commentCount, and restackCount are the engagement metrics. audience is everyone for free posts or only_paid for paywalled ones. publishedAt is an ISO timestamp.

FAQ

Is it free? How is it priced? You pay only for what you scrape, billed per result. There is no separate start fee, so small test runs cost almost nothing and large pulls scale predictably.

Do I need a Substack login or API key? No. It reads only public data, so no account, password, or key is needed.

How many posts can it return per publication? As many as the publication has published. Set maxItemsPerSource to your target and the scraper pages through the archive until it reaches that number or runs out of posts.

Can it scrape paywalled (subscriber-only) post content? No. Subscriber-only posts return their public fields (title, author, metrics, description) but not the locked body text, because no login is used. Free posts return their full body when includeContent is on.

Does it work with custom domains? Yes. Enter the publication as a username, a *.substack.com address, or a custom domain. All three are accepted.

How fast is it? Multiple publications are processed in parallel (set by concurrency), and each archive is paged efficiently. A few hundred posts typically complete in well under a minute.

Notes

Counts (likes, comments, restacks) reflect their values at scrape time.
Only public posts and comments are returned.
Output can be exported as JSON, CSV, Excel, or HTML, or pulled via the Apify API.

Found this useful? Leave a quick review. It takes a few seconds and it genuinely helps other people find the actor.

Substack Posts Scraper — Newsletter Archive & Stats

darknezz/substack-posts-scraper

Scrape any Substack publication's post archive: titles, subtitles, publish dates, likes, comments, paywall status and (optionally) full post text. Works with custom domains. Perfect for newsletter research, content analysis and AI training data.

Oaida Adrian

Substack Newsletter Scraper

dataharvest/substack-scraper

Scrape Substack newsletters, posts and comments.

Alex v

Substack Scraper – Newsletter Posts, Engagement & Monitoring

bitofacoder/substack-scraper

Scrape any Substack newsletter's full post archive with engagement metadata (likes, comments, paywall status, word count, authors), fetch single posts, and monitor newsletters incrementally — via Substack's public JSON API. No login.

Bobby

Substack Scraper: Newsletter Posts, Archives & Subscribers

perconey/substack-scraper

Scrape any Substack publication: full post archive, single post detail with body, comment counts, reactions, paid/free audience, podcast metadata. No auth, no proxies, no cookies. Uses Substack official JSON API. Pay only per result.

Perconey

YouTube Comments Scraper

goat255/youtube-comments-scraper

Scrape comments from any public YouTube video without a login. Pass video URLs or IDs and get clean structured comment rows with author, like and reply counts, and text. Walks pagination up to your chosen limit.

Goutam Soni

5.0

Lemmy Scraper

goat255/lemmy-scraper

Scrape Lemmy communities, posts, comments, and search results from any instance without a login. Pull a community's posts, a single post with its comment thread, or keyword search results. Walks pagination up to your chosen limit.

Goutam Soni

Substack Scraper — Posts, Comments & Newsletter Intelligence

cryptosignals/substack-scraper

Turn the newsletter economy into a dataset. Scrape any Substack publication — posts with full text, likes, comments, paywall status, reader comments, and metadata — or search by keyword across all of Substack. No login. For competitive intel, sponsor research, and media monitoring. $0.005/result.

Web Data Labs

Substack Posts Scraper 📚

scrapers-hub/substack-posts-scraper

Substack Posts scraper extracts publicly available newsletter posts, titles, authors, publication dates, tags, post URLs, and metadata 📰📊 Perfect for content research, trend analysis, competitive intelligence, and newsletter monitoring.

Scrapers Hub

Substack Newsletter Content Scraper

scraper_guru/substack-scraper

Scrape Substack newsletter posts, authors, dates, likes, comments, restacks, and article text. Built for content research, competitor tracking, and AI-ready datasets.

LIAICHI MUSTAPHA

2.6

Substack Publication Scraper

parseforge/substack-publication-scraper

Pull every public post from any Substack publication with title, subtitle, body preview, author, publish date, podcast URL, audience type, comment count, and reactions. Filter by post type and date range. Export to JSON, CSV, or Excel for newsletter research and competitive intelligence.