Website To Rss avatar
Website To Rss

Pricing

Pay per usage

Go to Apify Store
Website To Rss

Website To Rss

Convert any website into RSS, Atom, or JSON feeds. Auto-detects articles, tracks changes, and sends notifications. Works with WordPress, Ghost, Medium, Substack, and any blog.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Gabriel Antony Xaviour

Gabriel Antony Xaviour

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

๐Ÿ“ก Website to RSS

Transform any website into a standards-compliant RSS, Atom, or JSON feed with automatic article detection and change monitoring.

Features โ€ข Quick Start โ€ข Configuration โ€ข Output โ€ข Monitoring โ€ข Use Cases


What is Website to RSS?

Website to RSS converts any website into a subscribable feed, even if the site doesn't offer one. It automatically detects article patterns, extracts content using multiple strategies, and outputs valid RSS 2.0, Atom 1.0, or JSON Feed formats.

Key Benefits

FeatureDescription
๐Ÿ” Auto-DiscoveryAutomatically detects site structure and article patterns
๐ŸŽฏ Platform PresetsOptimized extraction for WordPress, Ghost, Medium, Substack, Hugo
๐Ÿ“Š Smart ExtractionUses OpenGraph, JSON-LD, and semantic HTML for reliable content
๐Ÿ”” Change DetectionTrack new posts and content changes between runs
๐Ÿ“‹ Multiple FormatsGenerate RSS 2.0, Atom 1.0, JSON Feed, and HTML preview
๐Ÿ”— NotificationsSend webhooks or Slack alerts when new content is detected

Features

Auto-Discovery Engine

The actor automatically analyzes websites to find articles:

  1. Platform Detection โ€” Checks for WordPress, Ghost, Medium signatures
  2. Structure Analysis โ€” Identifies article patterns from page structure
  3. Link Filtering โ€” Excludes author, tag, and category pages
  4. Article Scoring โ€” Uses heuristics to identify real articles
  5. Content Extraction โ€” Tries OpenGraph โ†’ JSON-LD โ†’ Semantic HTML โ†’ Fallback

Platform Presets

Pre-configured extraction rules for popular platforms:

PlatformWhat's Optimized
WordPressPost selectors, date formats, category extraction
GhostCard content, member content handling
MediumStory pages, clap counts, reading time
SubstackNewsletter posts, subscriber content
HugoFront matter, taxonomy handling
GenericUniversal patterns for unknown sites

Output Formats

FormatContent TypeDescription
RSS 2.0application/rss+xmlMost compatible feed format
Atom 1.0application/atom+xmlModern feed standard
JSON Feedapplication/jsonDeveloper-friendly format
HTMLtext/htmlVisual preview page

Quick Start

Basic Usage

Just provide a URL โ€” the actor handles the rest:

{
"websiteUrl": "https://blog.example.com",
"maxItems": 20
}

With Platform Preset

Optimize extraction for known platforms:

{
"websiteUrl": "https://my-ghost-blog.com",
"platformPreset": "ghost",
"maxItems": 20
}

With Manual Selectors

Full control for custom sites:

{
"websiteUrl": "https://custom-site.com",
"discoveryMode": "manual",
"linkSelector": ".article-link",
"titleSelector": ".article-title",
"contentSelector": ".article-body",
"maxItems": 20
}

With Change Monitoring

Get notified when new content appears:

{
"websiteUrl": "https://blog.example.com",
"stateStoreName": "my-blog-monitor",
"monitorMode": "new_pages",
"slackWebhook": "https://hooks.slack.com/services/..."
}

Input Configuration

Core Options

ParameterTypeDefaultDescription
websiteUrlstringrequiredURL of the website to convert
maxItemsnumber50Maximum items in the feed
outputFormatsarray["rss", "json"]Formats: rss, atom, json, html

Discovery Options

ParameterTypeDefaultDescription
discoveryModestringautoauto, preset, or manual
platformPresetstringautowordpress, ghost, medium, substack, hugo, generic
autoDetectArticlesbooleantrueUse heuristics to filter non-articles

Manual Selectors

ParameterTypeDescription
linkSelectorstringCSS selector for article links
titleSelectorstringCSS selector for article title
contentSelectorstringCSS selector for article body
dateSelectorstringCSS selector for publication date
urlExcludePatternsarrayURL patterns to skip (e.g., /author/)

Monitoring Options

ParameterTypeDefaultDescription
stateStoreNamestringโ€”Named store to persist state between runs
monitorModestringbothnew_pages, content_changes, or both
webhookUrlstringโ€”URL to POST when new content found
slackWebhookstringโ€”Slack webhook for notifications

Output

Key-Value Store Files

KeyContent TypeDescription
feed.xmlapplication/rss+xmlRSS 2.0 feed
feed.atomapplication/atom+xmlAtom 1.0 feed
feed.jsonapplication/jsonJSON Feed 1.1
feed.htmltext/htmlHTML preview page
OUTPUTapplication/jsonRun summary and statistics

Dataset Item Schema

{
"url": "https://blog.example.com/post-1",
"title": "My First Post",
"description": "A short description...",
"content": "Full article content...",
"date": "2024-01-15T10:00:00Z",
"author": "John Doe",
"image": "https://blog.example.com/image.jpg",
"categories": ["tech", "tutorial"],
"changeStatus": "new"
}

Run Summary (OUTPUT)

{
"feedTitle": "My Blog",
"feedUrl": "https://blog.example.com",
"itemCount": 20,
"newItems": 3,
"changedItems": 1,
"formats": ["rss", "json"],
"crawlDurationSec": 45
}

Change Detection

Enable persistent monitoring to track content changes:

{
"websiteUrl": "https://blog.example.com",
"stateStoreName": "my-blog-monitor",
"webhookUrl": "https://my-webhook.com/notify"
}

How It Works

  1. First Run โ€” Captures all current items as baseline
  2. Subsequent Runs โ€” Compares current items with stored state
  3. Detection โ€” Identifies new pages and content changes
  4. Notification โ€” Sends webhook/Slack with changes
  5. State Update โ€” Stores new state for next run

Webhook Payload

{
"event": "new_items",
"feedTitle": "My Blog",
"itemCount": 3,
"items": [
{
"title": "New Post",
"url": "https://blog.example.com/new-post",
"date": "2024-01-15T10:00:00Z"
}
]
}

Use Cases

Use CaseDescription
RSS for RSS-less SitesCreate feeds for websites that don't offer them
Content MonitoringTrack competitors, news sources, or blogs for updates
AggregationCombine multiple sites into monitoring workflows
ArchivingCapture content changes over time
Automation TriggersUse webhooks to trigger downstream workflows

Integrations

Schedule Regular Updates

Use Apify Scheduler for periodic monitoring:

Update FrequencyBest For
Every hourNews sites, high-frequency publishers
Every 6 hoursActive blogs, company announcements
DailyPersonal blogs, slow-updating sites

Feed Readers

Generated feeds work with any RSS reader:

  • Feedly
  • Inoreader
  • NewsBlur
  • Feedbin
  • Any RSS-compatible app

Automation Platforms

Connect via webhooks to:

  • Zapier
  • Make (Integromat)
  • n8n
  • Custom backends

Troubleshooting

No articles detected

ProblemSolution
Site uses JavaScriptThis actor uses HTTP (not browser). Try a Playwright-based scraper
Custom structureUse discoveryMode: "manual" with specific selectors
Site blocks requestsEnable proxy configuration

Wrong content extracted

ProblemSolution
Grabbing navigationProvide specific contentSelector for article body
Missing datesAdd dateSelector for date elements
Extra pages includedAdd patterns to urlExcludePatterns

FAQ

Q: Does this work with JavaScript-heavy sites?

This actor uses HTTP requests (CheerioCrawler), not a browser. For sites that require JavaScript rendering, consider a Playwright-based scraper.

Q: How often should I run monitoring?

Depends on the site's update frequency. News sites: hourly. Blogs: every 6-24 hours. Over-polling wastes resources and may trigger rate limits.

Q: Can I monitor multiple sites?

Run the actor separately for each site with different stateStoreName values. Use Apify's scheduling to orchestrate multiple monitors.

Q: What if the site changes its structure?

Auto-discovery adapts to many changes. For manual selectors, you'll need to update them if the site's HTML structure changes.


Support

  • Issues: Report bugs or request features on GitHub
  • Documentation: See full README in Actor source
  • API: Use Apify API to run this Actor programmatically

License

Apache 2.0