Substack Scraper avatar

Substack Scraper

Pricing

Pay per usage

Go to Apify Store
Substack Scraper

Substack Scraper

Scrape Substack posts and comments without a login. Pull a publication's archive of posts by newest or top, a single post by URL, and full comment trees. Walks pagination up to your chosen limit.

Pricing

Pay per usage

Rating

5.0

(1)

Developer

Goutam Soni

Goutam Soni

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

Scrape Substack posts and comments from any publication, newsletter, or post URL with no login and no API key required. Extract titles, authors, like and comment counts, restacks, word counts, tags, cover images, full post text, and complete comment threads as clean structured data.

What it does

  • Publication archive scraper. Give it a Substack username, address, or custom domain and it pulls the publication's posts, newest first or by top, paging automatically up to your chosen limit.
  • Single post scraper. Pass any post URL and get that post back with its full plain-text body.
  • Comment scraper. Optionally fetch the full comment tree for each post, flattened into clean rows with author, body, likes, and reply counts.
  • Rich metrics on every post. Like count, comment count, restacks, and word count, all in one row.
  • No login, no password, no API key. Just provide publications or post links.
  • Custom domains supported. Works with both *.substack.com addresses and publications on their own domain.
  • Tunable scale. Set how many posts per publication, how many comments per post, and how many publications to process in parallel.

Use cases

  • Newsletter market research. Track what topics, formats, and headlines are getting the most likes, comments, and restacks across the newsletters in your niche.
  • Lead generation. Build lists of active writers and publications, with author names and post engagement, for outreach or partnerships.
  • Content monitoring. Watch a set of publications and capture every new post with its metrics on a schedule.
  • Competitive and trend analysis. Compare engagement and posting cadence across publications you follow.
  • Audience and sentiment research. Pull comment threads to understand what readers actually say under top posts.
  • Dataset building. Export thousands of posts to CSV, JSON, or Excel for analysis, dashboards, or model training.

Input

FieldTypeDescription
publicationsarrayPublications to pull posts from. Use a username (example), a full address (example.substack.com), or a custom domain (news.example.com).
postUrlsarraySpecific post links to fetch. The full post body is always included for these.
maxItemsPerSourceintegerCap on posts returned per publication. Pagination is walked across pages until this is reached or the archive runs out. Default 50.
sortstringOrder of posts in a publication archive. new (newest first) or top. Default new.
includeContentbooleanWhen on, each post carries a plain-text body. Default off.
includeCommentsbooleanWhen on, the comment tree is also fetched for each post. Default off (keeps runs fast and cheap).
maxCommentsPerPostintegerCap on comments returned per post when comments are included. Default 100.
concurrencyintegerHow many publications to process in parallel. Default 5.
proxyConfigobjectProxy configuration. Residential is the default and recommended setting for the most reliable results.

Example input

{
"publications": ["example.substack.com", "news.example.com"],
"maxItemsPerSource": 200,
"sort": "new",
"includeContent": true,
"includeComments": false,
"proxyConfig": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }
}

Output

Each post is one clean row. Columns are ordered by importance: identity first, then engagement metrics, then content, then media, then metadata.

{
"type": "post",
"id": 123456789,
"url": "https://example.substack.com/p/an-example-post",
"slug": "an-example-post",
"authorName": "Jane Doe",
"authorHandle": "janedoe",
"likeCount": 240,
"commentCount": 18,
"restackCount": 12,
"wordCount": 1200,
"title": "An example post",
"subtitle": "A short standfirst.",
"description": "A one line summary.",
"tags": ["example", "writing"],
"content": "Full plain-text body when includeContent is on.",
"coverImage": "https://example.com/cover.png",
"podcastUrl": null,
"audience": "everyone",
"postType": "newsletter",
"publishedAt": "2026-06-01T12:00:00.000Z"
}

Each comment is also one row:

{
"type": "comment",
"id": 987654321,
"postId": 123456789,
"authorName": "Reader Example",
"authorHandle": "readerexample",
"likeCount": 5,
"restackCount": 0,
"replyCount": 2,
"body": "Great piece, thanks for writing this.",
"createdAt": "2026-06-01T14:30:00.000Z",
"editedAt": null
}

Key fields: type tells a post row from a comment row. likeCount, commentCount, and restackCount are the engagement metrics. audience is everyone for free posts or only_paid for paywalled ones. publishedAt is an ISO timestamp.

FAQ

Is it free? How is it priced? You pay only for what you scrape, billed per result. There is no separate start fee, so small test runs cost almost nothing and large pulls scale predictably.

Do I need a Substack login or API key? No. It reads only public data, so no account, password, or key is needed.

How many posts can it return per publication? As many as the publication has published. Set maxItemsPerSource to your target and the scraper pages through the archive until it reaches that number or runs out of posts.

Can it scrape paywalled (subscriber-only) post content? No. Subscriber-only posts return their public fields (title, author, metrics, description) but not the locked body text, because no login is used. Free posts return their full body when includeContent is on.

Does it work with custom domains? Yes. Enter the publication as a username, a *.substack.com address, or a custom domain. All three are accepted.

How fast is it? Multiple publications are processed in parallel (set by concurrency), and each archive is paged efficiently. A few hundred posts typically complete in well under a minute.

Notes

  • Counts (likes, comments, restacks) reflect their values at scrape time.
  • Only public posts and comments are returned.
  • Output can be exported as JSON, CSV, Excel, or HTML, or pulled via the Apify API.