Ghost Newsletter Scraper avatar
Ghost Newsletter Scraper

Pricing

Pay per event

Go to Apify Store
Ghost Newsletter Scraper

Ghost Newsletter Scraper

Developed by

BarriereFix

BarriereFix

Maintained by Community

Extract structured data from any Ghost-powered newsletter - track posts, monitor pricing, analyze publishing patterns, and research the creator economy.

0.0 (0)

Pricing

Pay per event

0

1

1

Last modified

19 days ago

Extract structured data from any Ghost-powered newsletter - track posts, monitor pricing, analyze publishing patterns, and research the creator economy.

Perfect For

  • Content Strategists - Monitor competitor newsletters and track content trends
  • Market Researchers - Analyze the creator economy and identify publishing patterns
  • Business Development Teams - Find sponsorship opportunities and track pricing changes
  • Newsletter Creators - Research successful strategies and analyze posting frequency
  • Agencies - Monitor multiple client newsletters and generate performance reports

Quick Start

The simplest way to get started - just add a newsletter URL:

{
"startUrls": [
{ "url": "https://blog.ghost.org/" }
]
}

That's it! The scraper will automatically:

  • ✅ Detect that it's a Ghost site
  • ✅ Find all posts via RSS feed (fastest method)
  • ✅ Extract titles, authors, publish dates, and content metadata
  • ✅ Get site info like posting frequency and pricing

What You'll Get

Newsletter Sites

Information about each newsletter:

  • Title and description
  • Publishing frequency (posts per 30 days)
  • Subscription pricing (if available)
  • Social media links
  • Last post date

Articles & Posts

Full metadata for each article:

  • Title, URL, and excerpt
  • Author(s) with profile links
  • Tags and categories
  • Publish and update dates
  • Word count and reading time estimate
  • OpenGraph and Twitter Card data

Writers & Contributors

Author profiles including:

  • Name and bio
  • Profile page URL
  • Social media links (Twitter, LinkedIn, GitHub, website)

Common Use Cases

1. Competitive Intelligence

Scenario: Track what your competitors are publishing

{
"startUrls": [
{ "url": "https://competitor1.com" },
{ "url": "https://competitor2.com" }
],
"lookbackDays": 7,
"outputLevel": "posts"
}

Get only new posts from the last week. Combine with n8n to get Slack alerts when competitors publish.

2. Content Research

Scenario: Analyze successful newsletters in your niche

{
"startUrls": [
{ "url": "https://popular-newsletter.com" }
],
"limitPerSite": 100,
"emitAuthors": true,
"fetchPricingSignals": true
}

Get the last 100 posts, author info, and pricing to understand their strategy.

3. Sponsorship Prospecting

Scenario: Find newsletters that accept sponsors

{
"domains": ["newsletter1.com", "newsletter2.com", "newsletter3.com"],
"fetchPricingSignals": true,
"outputLevel": "publication",
"limitPerSite": 1
}

Quickly scan multiple newsletters for pricing pages and subscription info.

4. Publishing Pattern Analysis

Scenario: Track how often newsletters in a category post

{
"startUrls": [
{ "url": "https://tech-newsletter.com" }
],
"lookbackDays": 90,
"deltaCrawl": false
}

Get 3 months of data to analyze posting frequency and consistency.

Input Options

Start URLs (Required)

Choose one method to specify which newsletters to scrape:

  • Start URLs: Full URLs like https://blog.ghost.org/
  • Domains: Just the domain like blog.ghost.org (https:// added automatically)

What to Extract

OptionWhat it does
What to extractChoose "Everything" (posts + site info), "Posts only", or "Site info only"
Extract author profilesGet author bios and social links (recommended: ON)
Extract pricingFind subscription prices from /subscribe pages (recommended: ON)
Extract tagsGet article tags/categories (experimental, slower)

Crawl Settings

SettingDescriptionRecommended
How to find postsRSS first (fastest), Sitemap first (most complete), or HybridRSS first
Max posts per newsletterStop after this many posts200
Only posts from last X daysFilter by date (0 = all posts)0 or 30
Skip already-seen postsDelta crawling saves costs!ON

Filters (Optional)

Use regex patterns to include/exclude specific URLs:

  • Include: [".*/tag/tech/.*"] - Only posts tagged "tech"
  • Exclude: [".*/author/.*"] - Skip author pages

Output Format

All data is saved to a dataset with multiple views for easy filtering:

View 1: All Records

Complete data with everything mixed together

View 2: Newsletter Sites

{
"type": "publication",
"domain": "blog.ghost.org",
"title": "Ghost Blog",
"post_velocity_30d": 12,
"last_post_at": "2025-09-28T10:00:00Z",
"pricing": {
"has_subscribe": true,
"plan_cards": [...]
}
}

View 3: Articles & Posts

{
"type": "post",
"domain": "blog.ghost.org",
"title": "How to Build a Newsletter",
"url": "https://blog.ghost.org/how-to-build/",
"authors": [{"name": "Jane Doe"}],
"tags": ["guides", "tutorials"],
"published_at": "2025-09-15T08:00:00Z",
"word_count_est": 1240,
"reading_time_min_est": 6
}

View 4: Writers & Contributors

{
"type": "author",
"name": "Jane Doe",
"profile_url": "https://blog.ghost.org/author/jane/",
"bio": "Writer and creator",
"social": {
"twitter": "https://twitter.com/jane",
"website": "https://janedoe.com"
}
}

Integrations

n8n Workflows

New Post Alert

  1. Trigger: Apify Dataset Item webhook (filter: type=post)
  2. OpenAI: Summarize the post
  3. Slack: Post to #content-monitoring
  4. Notion: Add to content calendar

Pricing Change Alert

  1. Trigger: Schedule (daily)
  2. Get latest dataset (filter: type=publication)
  3. Compare pricing with previous run
  4. Email: Alert team if prices changed

Weekly Digest

  1. Trigger: Schedule (Monday 9 AM)
  2. Get posts from last 7 days
  3. Group by newsletter
  4. Email: Send digest to team

Make.com / Zapier

The actor works with any automation tool that supports webhooks or API calls. Use Apify's integration to trigger workflows when new data is found.

How It Works

  1. Detects Ghost - Automatically identifies Ghost-powered sites using meta tags, Portal scripts, and RSS feeds
  2. Finds Posts - Uses RSS feeds (fastest), sitemaps, or HTML pagination to discover articles
  3. Extracts Data - Parses JSON-LD, OpenGraph tags, and HTML to get complete metadata
  4. Saves Results - Stores everything in a structured dataset with easy-to-use views
  5. Tracks Changes - Delta crawling means you only pay for new content (saves costs!)

Pricing & Performance

Pricing Model: Pay per item extracted (posts, authors, sites)

Typical Costs:

  • Small newsletter (20 posts) = ~25 items
  • 10 newsletters monitored daily = ~200 items/day
  • Delta crawling reduces repeat costs by 80%+

Speed:

  • RSS mode: ~5-10 seconds per newsletter
  • Sitemap mode: ~10-20 seconds per newsletter
  • Faster than headless browser scrapers by 5-10x

Ethical & Compliant

This scraper:

  • ✅ Only accesses public content
  • ✅ Respects robots.txt rules
  • ✅ Implements rate limiting
  • ✅ Identifies itself clearly
  • ❌ Never bypasses paywalls
  • ❌ Never accesses private content
  • ❌ Never logs in or authenticates

Advanced Settings

For power users, we offer:

  • Concurrency control - Adjust speed vs. politeness
  • Circuit breakers - Auto-stop on errors to save costs
  • Proxy support - Use Apify proxy for large-scale scraping
  • Browser mode - Enable Playwright for JavaScript-heavy sites (rarely needed)
  • Custom User-Agent - Identify your scraper however you want

Most users can ignore these - the defaults work great!

Limitations

  • Ghost only - Only works with Ghost-powered sites (use our detector or check for "Powered by Ghost")
  • Public content - Cannot access members-only or premium content
  • Static pricing - Pricing detection works on static pages only (not Portal overlay)
  • No authentication - Doesn't support logged-in scraping

Troubleshooting

"No posts found"

  • Check if the site has an RSS feed at /rss/ or /feed/
  • Try changing discovery mode to "Sitemap first" or "Hybrid"
  • Verify it's actually a Ghost site (look for "Powered by Ghost" footer)

"Site isn't Ghost"

  • The detector looks for Ghost-specific signals
  • Some Ghost sites are heavily customized - try anyway, it might still work
  • Turn off "Stop if site isn't Ghost" to scrape anyway

"Too slow"

  • Use "RSS first" mode (fastest)
  • Reduce "Max posts per newsletter"
  • Enable "Skip already-seen posts" for repeat runs

"Hitting rate limits"

  • Reduce "Max requests per site" (try 2)
  • Enable "Respect robots.txt"
  • Add delays by reducing concurrency

Support

  • Email: kontakt@barrierefix.de
  • Issues: Report bugs via Apify Console
  • Documentation: This README + input field tooltips

Version History

1.0.0 (2025-10-01)

  • Initial release
  • Ghost detection with multi-signal verification
  • RSS, sitemap, and HTML discovery modes
  • Post, site, and author extraction
  • Pricing detection for subscription newsletters
  • Delta crawling with hash-based deduplication
  • Circuit breakers and smart error handling
  • n8n integration ready

License

MIT - Use commercially, modify freely, no attribution required.


Made by Barrierefix - Building tools for the creator economy.