Beehiiv Newsletter Scraper
Pricing
Pay per event
Beehiiv Newsletter Scraper
Scrape posts from any beehiiv-powered newsletter. Input publication domains — the actor discovers post URLs via sitemap and extracts title, author, publish date, excerpt, cover image, tags, and word count. Supports multi-newsletter fan-out in a single run.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Scrape posts from any beehiiv-powered newsletter. Input a list of publication domains or subdomains — the actor discovers post URLs via sitemap and extracts title, author, publish date, excerpt, cover image, tags, and word count. Supports multi-newsletter fan-out in a single run.
What it does
The actor accepts a list of beehiiv publication domains (e.g. readthepeak.com, discover.beehiiv.com) and for each domain:
- Fetches
<domain>/sitemap.xmlto discover all public post URLs matching the/p/<slug>pattern. - Crawls each post page and extracts structured data from the embedded JSON-LD Article schema.
- Yields one record per post with all metadata fields.
Publications that sit behind Cloudflare or other anti-bot measures are gracefully skipped with a warning. Free posts are scraped; paywalled posts (where isAccessibleForFree: false in JSON-LD) are automatically skipped.
Input
| Parameter | Type | Description |
|---|---|---|
domains | array | List of publication domains. Accepts bare domains (readthepeak.com), subdomains (mybrand.beehiiv.com), or full URLs (https://readthepeak.com). |
maxItems | integer | Maximum posts to scrape per publication (0 = unlimited). Default: 10. |
Example input:
{"domains": ["readthepeak.com", "discover.beehiiv.com"],"maxItems": 50}
Output
Each record contains:
| Field | Description |
|---|---|
publication_domain | Input domain (e.g. readthepeak.com) |
publication_name | Newsletter name from JSON-LD publisher |
post_url | Canonical post URL |
post_title | Post headline |
post_subtitle | Post subtitle / description |
author | Author name |
publish_date | ISO 8601 publish timestamp |
excerpt | Short description (up to 300 chars) |
cover_image_url | Cover image URL |
word_count | Estimated word count of post body |
tags | Comma-separated tags |
full_text | Full post body text (empty unless include_full_text is set) |
scraped_at | ISO 8601 scrape timestamp |
Example output record:
{"publication_domain": "readthepeak.com","publication_name": "The Peak","post_url": "https://www.readthepeak.com/p/canadian-universities-are-falling-behind","post_title": "Canadian universities are falling behind","post_subtitle": "Canada's post-secondary schools are losing their edge.","author": "Lucas Arender","publish_date": "2026-06-02T10:00:00.000Z","excerpt": "Canada's post-secondary schools are losing their edge.","cover_image_url": "https://beehiiv-images-production.s3.amazonaws.com/...","word_count": 291,"tags": "Water Cooler, Perspectives","full_text": "","scraped_at": "2026-06-02T20:39:48.116Z"}
Limitations
- Publications behind Cloudflare or PerimeterX (e.g. some high-traffic custom domains) will return a warning and be skipped. Use a different domain format if the publication has a
*.beehiiv.comsubdomain that is not CF-walled. - Paywalled posts (subscriber-only) are detected via JSON-LD and automatically skipped.
- Publications without a
sitemap.xmlor with no/p/posts in their sitemap are skipped. full_textextraction is best-effort — post body selectors may vary slightly across beehiiv themes.