Ghost Newsletter Scraper
Pricing
Pay per event
Ghost Newsletter Scraper
Extract structured data from any Ghost-powered newsletter - track posts, monitor pricing, analyze publishing patterns, and research the creator economy.
5.0 (2)
Pricing
Pay per event
0
1
1
Last modified
a day ago
Extract structured data from any Ghost-powered newsletter - track posts, monitor pricing, analyze publishing patterns, and research the creator economy.
Perfect For
- Content Strategists - Monitor competitor newsletters and track content trends
- Market Researchers - Analyze the creator economy and identify publishing patterns
- Business Development Teams - Find sponsorship opportunities and track pricing changes
- Newsletter Creators - Research successful strategies and analyze posting frequency
- Agencies - Monitor multiple client newsletters and generate performance reports
Quick Start
The simplest way to get started - just add a newsletter URL:
{"startUrls": [{ "url": "https://blog.ghost.org/" }]}
That's it! The scraper will automatically:
- ✅ Detect that it's a Ghost site
- ✅ Find all posts via RSS feed (fastest method)
- ✅ Extract titles, authors, publish dates, and content metadata
- ✅ Get site info like posting frequency and pricing
What You'll Get
Newsletter Sites
Information about each newsletter:
- Title and description
- Publishing frequency (posts per 30 days)
- Subscription pricing (if available)
- Social media links
- Last post date
Articles & Posts
Full metadata for each article:
- Title, URL, and excerpt
- Author(s) with profile links
- Tags and categories
- Publish and update dates
- Word count and reading time estimate
- OpenGraph and Twitter Card data
Writers & Contributors
Author profiles including:
- Name and bio
- Profile page URL
- Social media links (Twitter, LinkedIn, GitHub, website)
Common Use Cases
1. Competitive Intelligence
Scenario: Track what your competitors are publishing
{"startUrls": [{ "url": "https://competitor1.com" },{ "url": "https://competitor2.com" }],"lookbackDays": 7,"outputLevel": "posts"}
Get only new posts from the last week. Combine with n8n to get Slack alerts when competitors publish.
2. Content Research
Scenario: Analyze successful newsletters in your niche
{"startUrls": [{ "url": "https://popular-newsletter.com" }],"limitPerSite": 100,"emitAuthors": true,"fetchPricingSignals": true}
Get the last 100 posts, author info, and pricing to understand their strategy.
3. Sponsorship Prospecting
Scenario: Find newsletters that accept sponsors
{"domains": ["newsletter1.com", "newsletter2.com", "newsletter3.com"],"fetchPricingSignals": true,"outputLevel": "publication","limitPerSite": 1}
Quickly scan multiple newsletters for pricing pages and subscription info.
4. Publishing Pattern Analysis
Scenario: Track how often newsletters in a category post
{"startUrls": [{ "url": "https://tech-newsletter.com" }],"lookbackDays": 90,"deltaCrawl": false}
Get 3 months of data to analyze posting frequency and consistency.
Input Options
Start URLs (Required)
Choose one method to specify which newsletters to scrape:
- Start URLs: Full URLs like
https://blog.ghost.org/ - Domains: Just the domain like
blog.ghost.org(https:// added automatically)
What to Extract
| Option | What it does |
|---|---|
| What to extract | Choose "Everything" (posts + site info), "Posts only", or "Site info only" |
| Extract author profiles | Get author bios and social links (recommended: ON) |
| Extract pricing | Find subscription prices from /subscribe pages (recommended: ON) |
| Extract tags | Get article tags/categories (experimental, slower) |
Crawl Settings
| Setting | Description | Recommended |
|---|---|---|
| How to find posts | RSS first (fastest), Sitemap first (most complete), or Hybrid | RSS first |
| Max posts per newsletter | Stop after this many posts | 200 |
| Only posts from last X days | Filter by date (0 = all posts) | 0 or 30 |
| Skip already-seen posts | Delta crawling saves costs! | ON |
Filters (Optional)
Use regex patterns to include/exclude specific URLs:
- Include:
[".*/tag/tech/.*"]- Only posts tagged "tech" - Exclude:
[".*/author/.*"]- Skip author pages
Output Format
All data is saved to a dataset with multiple views for easy filtering:
View 1: All Records
Complete data with everything mixed together
View 2: Newsletter Sites
{"type": "publication","domain": "blog.ghost.org","title": "Ghost Blog","post_velocity_30d": 12,"last_post_at": "2025-09-28T10:00:00Z","pricing": {"has_subscribe": true,"plan_cards": [...]}}
View 3: Articles & Posts
{"type": "post","domain": "blog.ghost.org","title": "How to Build a Newsletter","url": "https://blog.ghost.org/how-to-build/","authors": [{"name": "Jane Doe"}],"tags": ["guides", "tutorials"],"published_at": "2025-09-15T08:00:00Z","word_count_est": 1240,"reading_time_min_est": 6}
View 4: Writers & Contributors
{"type": "author","name": "Jane Doe","profile_url": "https://blog.ghost.org/author/jane/","bio": "Writer and creator","social": {"twitter": "https://twitter.com/jane","website": "https://janedoe.com"}}
Integrations
n8n Workflows
New Post Alert
- Trigger: Apify Dataset Item webhook (filter:
type=post) - OpenAI: Summarize the post
- Slack: Post to #content-monitoring
- Notion: Add to content calendar
Pricing Change Alert
- Trigger: Schedule (daily)
- Get latest dataset (filter:
type=publication) - Compare pricing with previous run
- Email: Alert team if prices changed
Weekly Digest
- Trigger: Schedule (Monday 9 AM)
- Get posts from last 7 days
- Group by newsletter
- Email: Send digest to team
Make.com / Zapier
The actor works with any automation tool that supports webhooks or API calls. Use Apify's integration to trigger workflows when new data is found.
How It Works
- Detects Ghost - Automatically identifies Ghost-powered sites using meta tags, Portal scripts, and RSS feeds
- Finds Posts - Uses RSS feeds (fastest), sitemaps, or HTML pagination to discover articles
- Extracts Data - Parses JSON-LD, OpenGraph tags, and HTML to get complete metadata
- Saves Results - Stores everything in a structured dataset with easy-to-use views
- Tracks Changes - Delta crawling means you only pay for new content (saves costs!)
Pricing & Performance
Pricing Model: This actor uses pay-per-item pricing - you only pay for successfully extracted items (posts, authors, sites). Delta crawling significantly reduces costs on repeat runs by avoiding duplicate processing.
See current pricing in the Apify Console when starting a run.
Speed:
- RSS mode: ~5-10 seconds per newsletter
- Sitemap mode: ~10-20 seconds per newsletter
- Faster than headless browser scrapers by 5-10x
Ethical & Compliant
This scraper:
- ✅ Only accesses public content
- ✅ Respects robots.txt rules
- ✅ Implements rate limiting
- ✅ Identifies itself clearly
- ❌ Never bypasses paywalls
- ❌ Never accesses private content
- ❌ Never logs in or authenticates
Advanced Settings
For power users, we offer:
- Concurrency control - Adjust speed vs. politeness
- Circuit breakers - Auto-stop on errors to save costs
- Proxy support - Use Apify proxy for large-scale scraping
- Browser mode - Enable Playwright for JavaScript-heavy sites (rarely needed)
- Custom User-Agent - Identify your scraper however you want
Most users can ignore these - the defaults work great!
Limitations
- Ghost only - Only works with Ghost-powered sites (use our detector or check for "Powered by Ghost")
- Public content - Cannot access members-only or premium content
- Static pricing - Pricing detection works on static pages only (not Portal overlay)
- No authentication - Doesn't support logged-in scraping
Troubleshooting
"No posts found"
- Check if the site has an RSS feed at
/rss/or/feed/ - Try changing discovery mode to "Sitemap first" or "Hybrid"
- Verify it's actually a Ghost site (look for "Powered by Ghost" footer)
"Site isn't Ghost"
- The detector looks for Ghost-specific signals
- Some Ghost sites are heavily customized - try anyway, it might still work
- Turn off "Stop if site isn't Ghost" to scrape anyway
"Too slow"
- Use "RSS first" mode (fastest)
- Reduce "Max posts per newsletter"
- Enable "Skip already-seen posts" for repeat runs
"Hitting rate limits"
- Reduce "Max requests per site" (try 2)
- Enable "Respect robots.txt"
- Add delays by reducing concurrency
Support
- Email: kontakt@barrierefix.de
- Issues: Report bugs via Apify Console
- Documentation: This README + input field tooltips
Version History
1.0.0 (2025-10-01)
- Initial release
- Ghost detection with multi-signal verification
- RSS, sitemap, and HTML discovery modes
- Post, site, and author extraction
- Pricing detection for subscription newsletters
- Delta crawling with hash-based deduplication
- Circuit breakers and smart error handling
- n8n integration ready
🔗 Explore More of Our Actors
📰 Content & Publishing
| Actor | Description |
|---|---|
| Notion Marketplace Scraper | Scrape Notion templates and marketplace listings |
| Farcaster Hub Scraper | Scrape Farcaster decentralized social network data |
| Google Play Reviews Scraper | Extract app reviews from Google Play Store |
💬 Social Media & Community
| Actor | Description |
|---|---|
| Reddit Scraper Pro | Monitor subreddits and track keywords with sentiment analysis |
| Discord Scraper Pro | Extract Discord messages and chat history for community insights |
| YouTube Comments Harvester | Comprehensive YouTube comments scraper with channel-wide enumeration |
| YouTube Contact Scraper | Extract YouTube channel contact information for outreach |
| YouTube Shorts Scraper | Scrape YouTube Shorts for viral content research |
License
MIT - Use commercially, modify freely, no attribution required.
Made by Barrierefix - Building tools for the creator economy.
On this page
Share Actor:
