Convert Any Website to RSS Feed avatar

Convert Any Website to RSS Feed

Pricing

from $0.60 / 1,000 results

Go to Apify Store
Convert Any Website to RSS Feed

Convert Any Website to RSS Feed

Turn blogs, news pages, job boards, product listings, directories, and sitemaps into RSS feeds, JSON Feed, and structured datasets with change detection.

Pricing

from $0.60 / 1,000 results

Rating

0.0

(0)

Developer

Inus Grobler

Inus Grobler

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

1

Monthly active users

12 days ago

Last modified

Share

Any Website to RSS Feed

At a glance: what it does is convert public websites into RSS, JSON Feed, and dataset items; input examples include public start URLs and optional selectors; output examples are feed item rows plus RSS_XML, JSON_FEED, and RUN_SUMMARY records; use cases include monitoring and automation; limitations, troubleshooting, and pricing/cost notes are covered below.

Any Website to RSS Feed converts public blogs, news sections, job boards, product listings, directories, category pages, and sitemap-backed websites into RSS feeds, JSON Feed files, and structured Apify dataset items.

Use it when a site has no useful RSS feed, when you need a normalized feed from several websites, or when you want scheduled monitoring for new and changed content.

Main Use Cases

  • Create an RSS feed from a website that does not publish one
  • Convert blogs, newsroom pages, job boards, product grids, and directories into feed items
  • Monitor websites for new or changed posts, jobs, listings, products, or pages
  • Export structured content to automation tools, dashboards, newsletters, or data pipelines
  • Generate both RSS 2.0 and JSON Feed output from the same run
  • Use custom CSS selectors when automatic extraction needs help

What It Extracts

Each dataset item can include:

  • Title
  • URL and canonical URL
  • Summary or excerpt, when available
  • Published and updated dates, when available
  • Image URL, when available
  • Source page and source website
  • Content hash for change detection
  • New/changed status across repeated runs
  • Discovery method, extraction method, confidence score, and scrape timestamp

The Actor also stores:

  • RSS_XML: generated RSS 2.0 feed
  • JSON_FEED: generated JSON Feed
  • RUN_SUMMARY: totals, failed URLs, and extraction flags
  • DEBUG_REPORT: optional troubleshooting details when debug mode is enabled

How It Works

For each start URL, the Actor uses a cost-conscious extraction order:

  1. Look for existing RSS, Atom, and JSON feeds.
  2. Check common feed URLs such as /feed, /rss.xml, and /atom.xml.
  3. Extract repeated listing cards from static HTML.
  4. Read structured metadata such as JSON-LD and Open Graph tags.
  5. Follow selected same-site detail pages when page limits allow it.
  6. Use browser rendering only when you enable it.
  7. Use AI selector discovery only when you enable it.

High-confidence feeds are accepted without extra fallback work, which keeps routine feed-backed runs cheap.

Input Configuration

Most users only need four settings:

  • startUrls: The public website pages to convert into feed items.
  • preset: How deep the Actor should scan.
  • maxItems: The maximum number of feed items to return.
  • maxPages: The maximum number of pages to fetch.

Leave the other defaults alone for the cheapest first run.

Required

  • startUrls: One or more public web pages to convert into feed items.

Basic Options

  • preset: quick, balanced, deep, or javascript.
  • maxPages: Maximum pages to fetch. Default is 25.
  • maxItems: Maximum feed items to return. Default is 100.
  • includeUrlPatterns: Keep only item URLs matching these text or regex-like patterns.
  • excludeUrlPatterns: Drop item URLs matching these patterns.

Advanced Options

  • customSelectors: Manual CSS selectors for pages where automatic extraction needs help.

Older JSON/API inputs can still use advanced fields such as:

  • mode: auto, rssDiscoveryOnly, sitemap, pageList, or customSelectors.
  • crawlDepth: Internal link depth for page-list and sitemap modes. Default is 1.
  • renderJavaScript: Use browser rendering for JavaScript-heavy pages. Keep off unless needed.
  • playwrightFallback: Try one browser fallback if static extraction finds no useful items. Default is off.
  • useLLM: Optional AI selector discovery. Default is off.
  • changedItemPolicy: Include all items, exclude changed items, or output only changed items. On a first run, onlyChanged usually returns no dataset rows because there is no previous state.
  • stateMaxItems: Maximum historical item fingerprints to keep per start URL. Default is 10000.

Static and feed-backed runs are designed for low memory. Browser-rendered runs should be launched with 1024 MB memory.

Example Input

{
"startUrls": [
{ "url": "https://example.com/blog" }
],
"preset": "balanced",
"maxItems": 100,
"maxPages": 25
}

Listing Page Example

{
"startUrls": [
{ "url": "https://example.com/jobs" }
],
"preset": "balanced",
"maxPages": 25,
"maxItems": 100,
"includeUrlPatterns": ["/jobs/"]
}

Custom Selectors Example

{
"startUrls": [
{ "url": "https://example.com/products" }
],
"preset": "balanced",
"customSelectors": {
"itemSelector": ".product-card",
"titleSelector": ".product-title",
"urlSelector": "a",
"summarySelector": ".product-summary",
"imageSelector": "img"
}
}

Example Output

{
"title": "Frontend Engineer",
"url": "https://example.com/jobs/frontend-engineer",
"canonicalUrl": "https://example.com/jobs/frontend-engineer",
"summary": "Frontend role for a fast-growing product team.",
"publishedAt": "2026-05-13T09:00:00.000Z",
"updatedAt": null,
"imageUrl": "https://example.com/images/frontend.png",
"sourcePageUrl": "https://example.com/jobs",
"sourceSite": "https://example.com",
"contentHash": "sha256...",
"isNew": true,
"isChanged": false,
"previousHash": null,
"discoveryMethod": "html_repeated",
"extractionMethod": "static_html",
"confidence": 0.86,
"scrapedAt": "2026-05-13T12:00:00.000Z"
}

How to Run on Apify

  1. Open the Actor in Apify Console.
  2. Add one or more startUrls.
  3. Keep mode as auto for the first run.
  4. Set maxPages and maxItems to a small number for testing.
  5. Run the Actor.
  6. Review the dataset and the RSS_XML, JSON_FEED, and RUN_SUMMARY records.
  7. Increase limits only when the extra results are useful.

Exporting Results

After a run finishes:

  • Download dataset rows as JSON, CSV, Excel, XML, RSS, or HTML from the Dataset tab.
  • Open the Key-value store to copy or download RSS_XML and JSON_FEED.
  • Use the API to fetch dataset items and feed records programmatically.

Pricing and Resource Tips

Recommended pricing model: pay per dataset result.

Measured test runs showed low static-run platform cost at 256 MB, with browser rendering reserved for opt-in 1024 MB runs. For most users, a simple result-based price is easier to understand than charging separately for every internal fetch.

Recommended Store pricing:

  • Event: apify-default-dataset-item
  • User-facing event title: result
  • Suggested price: $0.001 per result for FREE/BRONZE users
  • Suggested discounts: $0.0008 for SILVER and $0.0006 for GOLD and higher
  • Keep the tiny Actor-start event only if needed to discourage empty spam runs
  • Include platform usage in the event price rather than asking users to reason about memory and compute units

Cost control tips:

  • Use 256 MB for normal feed/static runs.
  • Use 1024 MB when renderJavaScript is enabled.
  • Keep renderJavaScript, playwrightFallback, and useLLM off unless needed.
  • Use includeUrlPatterns for broad websites.
  • Start with maxPages: 10-25 and increase gradually.

Limits and Caveats

  • Works on public pages only.
  • Does not log in to websites.
  • Does not bypass paywalls or private content.
  • Does not guarantee every website can be converted automatically.
  • JavaScript-heavy websites may need browser rendering.
  • Some unusual layouts may need custom CSS selectors.
  • Very broad websites should be narrowed with URL patterns and page limits.
  • AI selector discovery uses an external AI provider only when you explicitly enable it.

Troubleshooting

  • Empty dataset: try pageList mode or add custom selectors.
  • Wrong section extracted: add includeUrlPatterns or excludeUrlPatterns.
  • Missing summaries: the source page may not expose excerpts; increase maxPages if detail-page enrichment is worth the cost.
  • JavaScript-heavy page: enable renderJavaScript and run with 1024 MB memory.
  • Too many irrelevant links: lower crawlDepth or switch to customSelectors.
  • Scheduled run shows no new items: check changedItemPolicy and the state store name.

Python API Example

import os
from apify_client import ApifyClient
client = ApifyClient(os.environ["APIFY_TOKEN"])
run_input = {
"startUrls": [{"url": "https://example.com/blog"}],
"mode": "auto",
"maxPages": 25,
"maxItems": 100,
}
run = client.actor("thescrapelab/any-website-to-rss-feed").call(run_input=run_input)
dataset = client.dataset(run["defaultDatasetId"])
store = client.key_value_store(run["defaultKeyValueStoreId"])
items = list(dataset.iterate_items())
rss_xml = store.get_record("RSS_XML")["value"]
json_feed = store.get_record("JSON_FEED")["value"]
summary = store.get_record("RUN_SUMMARY")["value"]
print(f"Run ID: {run['id']}")
print(f"Items: {len(items)}")
print(f"New items: {summary['newItems']}")
print(f"Changed items: {summary['changedItems']}")
print(items[0]["title"] if items else "No items found")
print(rss_xml[:200])
print(json_feed["title"])

FAQ

Can I create an RSS feed from any website?

You can create feeds from many public websites, especially blogs, news pages, listings, directories, and sitemap-backed sites. Some complex or protected websites may need custom selectors or may not be suitable.

Does this Actor find existing RSS feeds?

Yes. It checks feed links on the page and common RSS, Atom, and JSON Feed paths before doing page extraction.

Can it monitor job boards and product listings?

Yes. Use pageList mode with includeUrlPatterns that match the job or product URLs.

Does it detect changed pages?

Yes. It stores content fingerprints in a state store and marks each item as new, changed, or unchanged on later runs.

Should I enable browser rendering?

Only when static extraction does not see the content. Browser rendering costs more and should use 1024 MB memory.

Does it use AI?

Not by default. AI selector discovery is optional and only runs when you set useLLM to a non-off value.

How do I reduce cost?

Use the default static mode, keep browser and AI options off, lower maxPages, use URL patterns, and start at 256 MB memory.

Can I use the RSS output directly?

Yes. Open the RSS_XML key-value store record after the run and use that URL or record content in your feed reader or automation.

Suggested Keywords

website to RSS feed, RSS feed generator, create RSS from website, blog to RSS, news RSS scraper, job board monitoring, product listing monitor, sitemap to RSS, JSON Feed generator, Apify RSS scraper.