Convert Any Website to RSS Feed avatar

Convert Any Website to RSS Feed

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Convert Any Website to RSS Feed

Convert Any Website to RSS Feed

Turn blogs, news pages, job boards, product listings, directories, and sitemaps into RSS feeds, JSON Feed, and structured datasets with change detection.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Inus Grobler

Inus Grobler

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

Any Website to RSS Feed

Website to RSS Feed Generator for blogs, news pages, job boards, product listings, directories, and sitemaps.

Turn almost any public website into an RSS feed, JSON Feed, and structured Apify dataset.

Any Website to RSS Feed is a lightweight website to RSS feed generator for blogs, news pages, job boards, product listings, directories, category pages, and sitemap-backed websites. Use it when a site does not offer a clean RSS feed, when you want a normalized feed from many different websites, or when you need to detect new and changed items over time.

The Actor is designed to be cost-aware by default. It looks for existing RSS, Atom, and JSON feeds first, then tries static HTML extraction, and only uses browser rendering or optional AI selector discovery when needed.

SEO Title

Convert Any Website to RSS Feed | Blogs, News, Jobs

SEO Description

Create RSS feeds from blogs, news pages, job boards, product listings, directories, and sitemaps. Export JSON Feed and structured datasets with lightweight change detection on Apify.

Why People Use It

  • Turn websites without RSS into feeds for Slack, email, Zapier, Make, or custom automation
  • Monitor job boards, product catalogs, newsroom sections, and directory pages
  • Publish normalized RSS and JSON Feed outputs from inconsistent websites
  • Detect new and changed content across scheduled runs without building a custom scraper

What You Can Do With It

  • Create an RSS feed from almost any public website
  • Convert blogs, listings, directories, and job boards into structured feed items
  • Generate both RSS 2.0 and JSON Feed from the same run
  • Export feed items to an Apify dataset for automation and analysis
  • Detect new, changed, and unchanged items across repeated runs
  • Use custom CSS selectors for difficult websites
  • Use optional OpenRouter selector discovery when normal extraction is weak

Common Use Cases

  • Website to RSS feed conversion
  • Blog to RSS feed generation
  • Job board monitoring
  • Product listing monitoring
  • News and article aggregation
  • Directory and category page tracking
  • Sitemap-based content discovery
  • Lightweight website change monitoring for feed publishing
  • RSS feed generation for automation tools, newsletters, and dashboards

How It Works

For each URL you provide, the Actor tries these approaches:

  1. Finds existing RSS, Atom, or JSON feeds
  2. Checks common feed URLs such as /feed, /rss.xml, and /atom.xml
  3. Discovers sitemap URLs from robots.txt and sitemap.xml
  4. Extracts repeated cards or listing items from static HTML
  5. Uses your custom CSS selectors when provided
  6. Falls back to browser rendering when JavaScript is required
  7. Optionally uses OpenRouter to infer selectors for repeated listings
  8. Stores RSS, JSON Feed, dataset rows, and a run summary

Quick Start

Paste one or more public URLs into startUrls.

{
"startUrls": [
{ "url": "https://example.com/blog" }
]
}

Best Starting Setup

For most websites, this is the best place to start:

{
"startUrls": [
{ "url": "https://example.com/blog" }
],
"mode": "auto",
"maxPages": 25,
"maxItems": 100
}

Use pageList for category pages, search results, job boards, and product grids. Use customSelectors only when a site has an unusual layout and automatic extraction is weak.

For a listing page, such as jobs, products, articles, or directory cards:

{
"startUrls": [
{ "url": "https://example.com/jobs" }
],
"mode": "pageList",
"maxPages": 25,
"maxItems": 100
}

For a page where you already know the selectors:

{
"startUrls": [
{ "url": "https://example.com/products" }
],
"mode": "customSelectors",
"customSelectors": {
"itemSelector": ".product-card",
"titleSelector": ".product-title",
"urlSelector": "a",
"summarySelector": ".product-summary",
"imageSelector": "img"
}
}

Input Options

  • startUrls: public website URLs to convert into feed items
  • mode: choose automatic discovery, RSS-only discovery, sitemap mode, page-list extraction, or custom selectors
  • maxPages: maximum pages to fetch
  • maxItems: maximum items to return
  • renderJavaScript: use browser rendering for JavaScript-heavy pages
  • includeUrlPatterns: keep only URLs matching these patterns
  • excludeUrlPatterns: remove URLs matching these patterns
  • customSelectors: CSS selectors for manual extraction
  • changedItemPolicy: include changed items, exclude them, or output only changed items

Technical settings such as crawl depth, browser fallback, deduplication, and debug reporting stay on safe built-in defaults so the input stays simple.

Output

The Actor writes feed items to the dataset and stores feed files in the run key-value store.

Dataset items include:

{
"title": "Frontend Engineer",
"url": "https://example.com/jobs/frontend-engineer",
"canonicalUrl": "https://example.com/jobs/frontend-engineer",
"summary": "Frontend role for a fast-growing product team.",
"publishedAt": "2026-05-13T09:00:00.000Z",
"updatedAt": null,
"imageUrl": "https://example.com/images/frontend.png",
"sourcePageUrl": "https://example.com/jobs",
"sourceSite": "https://example.com",
"contentHash": "sha256...",
"isNew": true,
"isChanged": false,
"previousHash": null,
"discoveryMethod": "html_repeated",
"extractionMethod": "static_html",
"confidence": 0.86,
"scrapedAt": "2026-05-13T12:00:00.000Z"
}

Key-value store records:

  • RSS_XML: generated RSS 2.0 feed
  • JSON_FEED: generated JSON Feed
  • RUN_SUMMARY: totals, failed URLs, and extraction flags
  • DEBUG_REPORT: optional debug details when troubleshooting is enabled internally

RSS and JSON Feed

The generated RSS feed includes channel metadata and item fields such as title, link, guid, publication date, and description when available.

The generated JSON Feed includes feed metadata and item fields such as id, URL, title, summary, publication date, and image when available.

Descriptions and summaries are optional because many listing pages do not expose full excerpts. When possible, the Actor tries to enrich missing summaries from page metadata and detail pages while staying within your page limits.

New and Changed Items

The Actor can track items across runs using a named state store. This lets you tell whether an item is:

  • new
  • changed
  • unchanged

This is useful for scheduled RSS generation, content alerts, product monitoring, job board monitoring, and feed publishing workflows.

Example Saved Tasks

These are good starting points for production tasks in Apify.

Monitor a News Section

{
"startUrls": [
{ "url": "https://example.com/news" }
],
"mode": "auto",
"maxPages": 20,
"maxItems": 50,
"changedItemPolicy": "include"
}

Track a Job Board

{
"startUrls": [
{ "url": "https://example.com/jobs" }
],
"mode": "pageList",
"maxPages": 30,
"maxItems": 100,
"changedItemPolicy": "onlyChanged"
}

Follow a Product Listing Page

{
"startUrls": [
{ "url": "https://example.com/products" }
],
"mode": "pageList",
"maxPages": 20,
"maxItems": 100,
"includeUrlPatterns": [
"/products/"
]
}

Optional AI Selector Discovery

Some websites have unusual markup. When needed, the Actor can use OpenRouter to infer CSS selectors for repeated cards or listing items.

The AI is only used for selector discovery. It is not used to summarize content, rewrite text, or analyze the full website.

To use this feature, set OPENROUTER_API_KEY as a secret environment variable in Apify. Do not put the key in your input JSON.

Pricing and Resource Tips

  • Start with mode: "auto"
  • Keep renderJavaScript off unless the site requires it
  • Use maxPages and maxItems to control cost
  • Use includeUrlPatterns for large websites and sitemaps
  • Use customSelectors for pages with predictable layouts
  • Use OpenRouter only as a fallback unless a website clearly needs selector inference

Practical guidance:

  • Small blog or feed-backed website: usually maxPages: 10-25
  • News section or job board: usually maxPages: 20-40
  • Large sitemap or directory: start with maxPages: 25 and narrow with includeUrlPatterns
  • Turn on renderJavaScript only for sites that clearly depend on client-side rendering

If you are launching this as a scheduled production task, start conservative, review the first few runs, and only increase page limits when the extra items are worth the extra cost.

Limitations

  • Works on public pages only
  • Does not log in to websites
  • Does not perform screenshot comparison
  • Does not generate AI summaries
  • Does not crawl deeply by default
  • Some complex websites may require custom selectors

Troubleshooting

  • Empty dataset: try mode: "customSelectors" with manual selectors
  • Missing summaries: the source page may not expose descriptions or excerpts
  • JavaScript-heavy site: enable renderJavaScript
  • Large sitemap: reduce maxPages or use includeUrlPatterns
  • LLM fallback not running: check that OPENROUTER_API_KEY is set as a secret environment variable

Launch Checklist

Before going live with a scheduled task:

  1. Run the Actor once with mode: "auto" and confirm the dataset items look correct.
  2. Check RSS_XML, JSON_FEED, and RUN_SUMMARY in the key-value store.
  3. Make sure maxPages and maxItems are low enough for the budget you want.
  4. Add includeUrlPatterns if the site is broad and you only want one section.
  5. If the page is JavaScript-heavy, enable renderJavaScript.
  6. If extraction is weak, switch to customSelectors before reaching for AI fallback.
  7. Schedule the task and review the second run to confirm isNew and isChanged behave as expected.

Python API Example

Use the Apify API from Python with apify_client:

import os
from apify_client import ApifyClient
client = ApifyClient(os.environ["APIFY_API_TOKEN"])
run_input = {
"startUrls": [{"url": "https://example.com/blog"}],
"mode": "auto",
"maxPages": 25,
"maxItems": 100,
}
run = client.actor("TheScrapeLab/any-website-to-rss-feed").call(run_input=run_input)
dataset = client.dataset(run["defaultDatasetId"])
store = client.key_value_store(run["defaultKeyValueStoreId"])
items = list(dataset.iterate_items())
rss_xml = store.get_record("RSS_XML")["value"]
json_feed = store.get_record("JSON_FEED")["value"]
run_summary = store.get_record("RUN_SUMMARY")["value"]
print(f"Run ID: {run['id']}")
print(f"Items found: {len(items)}")
print(f"New items: {run_summary['newItems']}")
print(f"Changed items: {run_summary['changedItems']}")
print(items[0]["title"] if items else "No items found")
print(rss_xml[:200])
print(json_feed["title"])

Who This Actor Is For

This website to RSS feed generator is useful for marketers, growth teams, recruiters, publishers, analysts, developers, automation builders, and anyone who needs reliable feed data from public websites.