
Ghost Newsletter Scraper
Pricing
Pay per event

Ghost Newsletter Scraper
Extract structured data from any Ghost-powered newsletter - track posts, monitor pricing, analyze publishing patterns, and research the creator economy.
0.0 (0)
Pricing
Pay per event
0
1
1
Last modified
19 days ago
Extract structured data from any Ghost-powered newsletter - track posts, monitor pricing, analyze publishing patterns, and research the creator economy.
Perfect For
- Content Strategists - Monitor competitor newsletters and track content trends
- Market Researchers - Analyze the creator economy and identify publishing patterns
- Business Development Teams - Find sponsorship opportunities and track pricing changes
- Newsletter Creators - Research successful strategies and analyze posting frequency
- Agencies - Monitor multiple client newsletters and generate performance reports
Quick Start
The simplest way to get started - just add a newsletter URL:
{"startUrls": [{ "url": "https://blog.ghost.org/" }]}
That's it! The scraper will automatically:
- ✅ Detect that it's a Ghost site
- ✅ Find all posts via RSS feed (fastest method)
- ✅ Extract titles, authors, publish dates, and content metadata
- ✅ Get site info like posting frequency and pricing
What You'll Get
Newsletter Sites
Information about each newsletter:
- Title and description
- Publishing frequency (posts per 30 days)
- Subscription pricing (if available)
- Social media links
- Last post date
Articles & Posts
Full metadata for each article:
- Title, URL, and excerpt
- Author(s) with profile links
- Tags and categories
- Publish and update dates
- Word count and reading time estimate
- OpenGraph and Twitter Card data
Writers & Contributors
Author profiles including:
- Name and bio
- Profile page URL
- Social media links (Twitter, LinkedIn, GitHub, website)
Common Use Cases
1. Competitive Intelligence
Scenario: Track what your competitors are publishing
{"startUrls": [{ "url": "https://competitor1.com" },{ "url": "https://competitor2.com" }],"lookbackDays": 7,"outputLevel": "posts"}
Get only new posts from the last week. Combine with n8n to get Slack alerts when competitors publish.
2. Content Research
Scenario: Analyze successful newsletters in your niche
{"startUrls": [{ "url": "https://popular-newsletter.com" }],"limitPerSite": 100,"emitAuthors": true,"fetchPricingSignals": true}
Get the last 100 posts, author info, and pricing to understand their strategy.
3. Sponsorship Prospecting
Scenario: Find newsletters that accept sponsors
{"domains": ["newsletter1.com", "newsletter2.com", "newsletter3.com"],"fetchPricingSignals": true,"outputLevel": "publication","limitPerSite": 1}
Quickly scan multiple newsletters for pricing pages and subscription info.
4. Publishing Pattern Analysis
Scenario: Track how often newsletters in a category post
{"startUrls": [{ "url": "https://tech-newsletter.com" }],"lookbackDays": 90,"deltaCrawl": false}
Get 3 months of data to analyze posting frequency and consistency.
Input Options
Start URLs (Required)
Choose one method to specify which newsletters to scrape:
- Start URLs: Full URLs like
https://blog.ghost.org/
- Domains: Just the domain like
blog.ghost.org
(https:// added automatically)
What to Extract
Option | What it does |
---|---|
What to extract | Choose "Everything" (posts + site info), "Posts only", or "Site info only" |
Extract author profiles | Get author bios and social links (recommended: ON) |
Extract pricing | Find subscription prices from /subscribe pages (recommended: ON) |
Extract tags | Get article tags/categories (experimental, slower) |
Crawl Settings
Setting | Description | Recommended |
---|---|---|
How to find posts | RSS first (fastest), Sitemap first (most complete), or Hybrid | RSS first |
Max posts per newsletter | Stop after this many posts | 200 |
Only posts from last X days | Filter by date (0 = all posts) | 0 or 30 |
Skip already-seen posts | Delta crawling saves costs! | ON |
Filters (Optional)
Use regex patterns to include/exclude specific URLs:
- Include:
[".*/tag/tech/.*"]
- Only posts tagged "tech" - Exclude:
[".*/author/.*"]
- Skip author pages
Output Format
All data is saved to a dataset with multiple views for easy filtering:
View 1: All Records
Complete data with everything mixed together
View 2: Newsletter Sites
{"type": "publication","domain": "blog.ghost.org","title": "Ghost Blog","post_velocity_30d": 12,"last_post_at": "2025-09-28T10:00:00Z","pricing": {"has_subscribe": true,"plan_cards": [...]}}
View 3: Articles & Posts
{"type": "post","domain": "blog.ghost.org","title": "How to Build a Newsletter","url": "https://blog.ghost.org/how-to-build/","authors": [{"name": "Jane Doe"}],"tags": ["guides", "tutorials"],"published_at": "2025-09-15T08:00:00Z","word_count_est": 1240,"reading_time_min_est": 6}
View 4: Writers & Contributors
{"type": "author","name": "Jane Doe","profile_url": "https://blog.ghost.org/author/jane/","bio": "Writer and creator","social": {"twitter": "https://twitter.com/jane","website": "https://janedoe.com"}}
Integrations
n8n Workflows
New Post Alert
- Trigger: Apify Dataset Item webhook (filter:
type=post
) - OpenAI: Summarize the post
- Slack: Post to #content-monitoring
- Notion: Add to content calendar
Pricing Change Alert
- Trigger: Schedule (daily)
- Get latest dataset (filter:
type=publication
) - Compare pricing with previous run
- Email: Alert team if prices changed
Weekly Digest
- Trigger: Schedule (Monday 9 AM)
- Get posts from last 7 days
- Group by newsletter
- Email: Send digest to team
Make.com / Zapier
The actor works with any automation tool that supports webhooks or API calls. Use Apify's integration to trigger workflows when new data is found.
How It Works
- Detects Ghost - Automatically identifies Ghost-powered sites using meta tags, Portal scripts, and RSS feeds
- Finds Posts - Uses RSS feeds (fastest), sitemaps, or HTML pagination to discover articles
- Extracts Data - Parses JSON-LD, OpenGraph tags, and HTML to get complete metadata
- Saves Results - Stores everything in a structured dataset with easy-to-use views
- Tracks Changes - Delta crawling means you only pay for new content (saves costs!)
Pricing & Performance
Pricing Model: Pay per item extracted (posts, authors, sites)
Typical Costs:
- Small newsletter (20 posts) = ~25 items
- 10 newsletters monitored daily = ~200 items/day
- Delta crawling reduces repeat costs by 80%+
Speed:
- RSS mode: ~5-10 seconds per newsletter
- Sitemap mode: ~10-20 seconds per newsletter
- Faster than headless browser scrapers by 5-10x
Ethical & Compliant
This scraper:
- ✅ Only accesses public content
- ✅ Respects robots.txt rules
- ✅ Implements rate limiting
- ✅ Identifies itself clearly
- ❌ Never bypasses paywalls
- ❌ Never accesses private content
- ❌ Never logs in or authenticates
Advanced Settings
For power users, we offer:
- Concurrency control - Adjust speed vs. politeness
- Circuit breakers - Auto-stop on errors to save costs
- Proxy support - Use Apify proxy for large-scale scraping
- Browser mode - Enable Playwright for JavaScript-heavy sites (rarely needed)
- Custom User-Agent - Identify your scraper however you want
Most users can ignore these - the defaults work great!
Limitations
- Ghost only - Only works with Ghost-powered sites (use our detector or check for "Powered by Ghost")
- Public content - Cannot access members-only or premium content
- Static pricing - Pricing detection works on static pages only (not Portal overlay)
- No authentication - Doesn't support logged-in scraping
Troubleshooting
"No posts found"
- Check if the site has an RSS feed at
/rss/
or/feed/
- Try changing discovery mode to "Sitemap first" or "Hybrid"
- Verify it's actually a Ghost site (look for "Powered by Ghost" footer)
"Site isn't Ghost"
- The detector looks for Ghost-specific signals
- Some Ghost sites are heavily customized - try anyway, it might still work
- Turn off "Stop if site isn't Ghost" to scrape anyway
"Too slow"
- Use "RSS first" mode (fastest)
- Reduce "Max posts per newsletter"
- Enable "Skip already-seen posts" for repeat runs
"Hitting rate limits"
- Reduce "Max requests per site" (try 2)
- Enable "Respect robots.txt"
- Add delays by reducing concurrency
Support
- Email: kontakt@barrierefix.de
- Issues: Report bugs via Apify Console
- Documentation: This README + input field tooltips
Version History
1.0.0 (2025-10-01)
- Initial release
- Ghost detection with multi-signal verification
- RSS, sitemap, and HTML discovery modes
- Post, site, and author extraction
- Pricing detection for subscription newsletters
- Delta crawling with hash-based deduplication
- Circuit breakers and smart error handling
- n8n integration ready
License
MIT - Use commercially, modify freely, no attribution required.
Made by Barrierefix - Building tools for the creator economy.
On this page
Share Actor: