Pricing

Pay per event

📡 RSS & Atom Feed Extractor

Aggregate public RSS feeds into structured JSON datasets to discover fresh website URLs from blogs and newsrooms for downstream web scrapers.

Pricing

Pay per event

Rating

0.0

(0)

Developer

naoki anzai

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

📡 RSS Feed Aggregator

Parse, filter, and extract data from any public RSS or Atom XML feed to automate your web scraping pipelines. This RSS and Atom Feed Extractor acts as a highly efficient discovery surface, designed specifically to aggregate new URLs from known publisher blogs, company update logs, and tech newsrooms before passing them to specialized downstream scrapers. Developers and data engineers rely on this tool to build reliable content intelligence workflows without relying on complex browser automation. Instead of crawling entire website structures to find new pages, you simply provide a list of trusted feed URLs. The scraper parses the XML, applies your custom keyword filters, and returns only the web pages that match your themes. It is the perfect lightweight feeder mechanism for daily or weekly monitoring of targeted web sources. The resulting dataset includes the discovered URLs, the original feed source, publication dates, and the specific keyword matches that triggered the rule. Because the output is pre-formatted and filtered, you can seamlessly integrate the results into an article content extractor or generic web scraper to fetch the full DOM, text content, and metadata. By isolating the URL discovery phase into this dedicated feed aggregator, you optimize your scraping costs, reduce blocked requests, and ensure your data pipelines only run on relevant, fresh content.

Store Quickstart

Start with Quickstart (2 publisher feeds) for a reliable first run.
Use Multi-Source Monitoring to watch several feeds with keyword filters.
Use RSS → Article Cleanup when the next step is article extraction.

Where this actor fits

Surface	Best for
RSS Feed Aggregator	Discover fresh URLs from known publishers and blogs
Google News Scraper	Discover fresh URLs from query-based Google News searches
Article Content Extractor	Clean discovered article/news/blog pages
Website Content Extractor	Clean discovered docs, pricing, policy, or product pages

Key Features

📡 Feed discovery — Aggregate multiple public RSS/Atom feeds in one run
🔍 Keyword filtering — Keep only the rows that match the themes you care about
🏷️ Match visibility — Returns matchedKeywords for filtered rows
🔄 Deduplication — Remove duplicate links across feeds
⚡ Low-friction first run — Great for recurring monitoring of known sources

Use Cases

Who	Why
PR / comms teams	Track publisher and company newsroom feeds
Competitive intelligence	Watch competitor blogs and product update feeds
Content ops	Build filtered story queues from trusted sources
AI / RAG teams	Maintain a fresh URL stream before deeper extraction

Input

Field	Type	Default	Description
`feedUrls`	`string[]`	required	Public RSS/Atom URLs (max 50)
`keywords`	`string[]`	`[]`	Optional include-list filter
`maxItemsPerFeed`	`integer`	`25`	Max items to keep from each feed
`deduplicate`	`boolean`	`true`	Remove duplicate links across feeds
`timeoutMs`	`integer`	`15000`	Request timeout
`delivery`	`string`	`dataset`	`dataset` or `webhook`
`webhookUrl`	`string`	—	Webhook target when `delivery=webhook`
`dryRun`	`boolean`	`false`	Run without saving

Input Example

{
  "feedUrls": [
    "https://blog.google/rss/",
    "https://openai.com/news/rss.xml"
  ],
  "keywords": ["AI", "agents"],
  "maxItemsPerFeed": 10,
  "deduplicate": true
}

Input Examples

Example: Single feed

{
  "feeds": [
    "https://example.com/feed.xml"
  ]
}

Example: Multi-feed daily digest

{
  "feeds": [
    "https://hnrss.org/frontpage",
    "https://www.theverge.com/rss/index.xml"
  ],
  "maxItemsPerFeed": 50
}

Example: Delta-only run

{
  "feeds": [
    "https://example.com/feed.xml"
  ],
  "onlyNewSinceLastRun": true,
  "snapshotKey": "example-feed-state"
}

Output

Field	Type	Description
`source`	string	Feed URL that produced the row
`title`	string	Feed item title
`link`	string	Item URL for downstream extraction
`pubDate`	string	Original feed date
`pubDateISO`	string	ISO timestamp version of `pubDate`
`description`	string	Summary text from the feed
`content`	string	Encoded content when available
`categories`	array	Categories / tags from the feed
`matchedKeywords`	array	Keywords that matched the row

Output Example

{
  "source": "https://openai.com/news/rss.xml",
  "title": "The next evolution of the Agents SDK",
  "link": "https://openai.com/index/the-next-evolution-of-the-agents-sdk",
  "pubDate": "Wed, 15 Apr 2026 10:00:00 GMT",
  "pubDateISO": "2026-04-15T10:00:00.000Z",
  "description": "OpenAI updates the Agents SDK with native sandbox execution...",
  "matchedKeywords": ["ai", "agents"]
}

First-run buyer experience

Run Quickstart (2 publisher feeds).
Confirm the actor returns recent item URLs plus matchedKeywords.
Send article/news/blog links to Article Content Extractor.
Send docs/product/policy links to Website Content Extractor.

Tips & Limitations

Start with a small set of high-trust feeds.
Keyword filtering is OR-based; any matched keyword keeps the item.
This actor is a feed discovery layer, not a full-content extractor.

FAQ

How is this different from Google News Scraper?

Use RSS Feed Aggregator when you already know the publishers you trust. Use Google News Scraper when you want broader query-based discovery.

Can I see why an item matched?

Yes — filtered rows include a matchedKeywords array.

Can I get full article text here?

No. Use Article Content Extractor or Website Content Extractor on the returned links.

Content Intelligence Pack handoffs:

📰 Article Content Extractor — clean discovered article/news/blog pages
📄 Website Content Extractor — clean discovered non-article pages
📰 Google News Scraper — query-based discovery when you do not have feed URLs yet

Cost

Pay Per Event:

actor-start: $0.01
dataset-item: $0.002 per output item

⭐ Was this helpful?

If this actor saved you time, please leave a ★ rating on Apify Store.

RSS Feed Scraper - RSS & Atom Data

benthepythondev/rss-feed-scraper

Scrape RSS and Atom feeds into structured records with title, URL, author, publish date, categories, image and summary.

Ben

RSS & Atom Feed Aggregator

mahogany_songbird/rss-feed-aggregator

Parse RSS/Atom feeds into structured items.

Britton Furness

Website to RSS Feed Generator — RSS, Atom & JSON

junipr/website-to-rss

Turn websites without feeds into RSS, Atom, or JSON feeds. Auto-detect article lists, customize selectors, and export feed files for automation.

junipr

RSS / Atom Feed Finder & Validator

zucram/rss-feed-finder

Find RSS/Atom feeds from public website URLs and validate latest items.

Marcus Harlid Davin

RSS & Atom Feed to JSON Scraper

andok/rss-parser

Monitor blogs, news sites, and podcasts. Convert any RSS or Atom feed into structured JSON data for instant content syndication.

Andok

Convert Any Website to RSS Feed

thescrapelab/any-website-to-rss-feed

Turn blogs, news pages, job boards, product listings, directories, and sitemaps into RSS feeds, JSON Feed, and structured datasets with change detection.

Inus Grobler

RSS to JSON API | RSS, Atom & Podcast Feeds

xtech/feed-extractor

Convert RSS, Atom, JSON Feed, and podcast feeds to structured JSON and spreadsheet-ready entries for monitoring and automation.

Xtech

RSS Feed Discovery — Find RSS & Atom Feeds

q_services/rss-feed-discovery

Find RSS, Atom and JSON Feed URLs from websites using HTML discovery tags and common feed paths.

Q Services

RSS Feed Reader - Bulk RSS & Atom Feed Parser

logiover/bulk-rss-feed-reader

Read and parse RSS, Atom and RDF feeds in bulk, or auto-discover feeds from any website. Extract thousands of articles with full metadata for news monitoring, content aggregation and AI/RAG pipelines. No API key, export to CSV or JSON.