📡 RSS & Atom Feed Extractor avatar

📡 RSS & Atom Feed Extractor

Pricing

Pay per event

Go to Apify Store
📡 RSS & Atom Feed Extractor

📡 RSS & Atom Feed Extractor

Aggregate public RSS feeds into structured JSON datasets to discover fresh website URLs from blogs and newsrooms for downstream web scrapers.

Pricing

Pay per event

Rating

0.0

(0)

Developer

太郎 山田

太郎 山田

Maintained by Community

Actor stats

1

Bookmarked

6

Total users

2

Monthly active users

4 days ago

Last modified

Share

📡 RSS Feed Aggregator

Parse, filter, and extract data from any public RSS or Atom XML feed to automate your web scraping pipelines. This RSS and Atom Feed Extractor acts as a highly efficient discovery surface, designed specifically to aggregate new URLs from known publisher blogs, company update logs, and tech newsrooms before passing them to specialized downstream scrapers. Developers and data engineers rely on this tool to build reliable content intelligence workflows without relying on complex browser automation. Instead of crawling entire website structures to find new pages, you simply provide a list of trusted feed URLs. The scraper parses the XML, applies your custom keyword filters, and returns only the web pages that match your themes. It is the perfect lightweight feeder mechanism for daily or weekly monitoring of targeted web sources. The resulting dataset includes the discovered URLs, the original feed source, publication dates, and the specific keyword matches that triggered the rule. Because the output is pre-formatted and filtered, you can seamlessly integrate the results into an article content extractor or generic web scraper to fetch the full DOM, text content, and metadata. By isolating the URL discovery phase into this dedicated feed aggregator, you optimize your scraping costs, reduce blocked requests, and ensure your data pipelines only run on relevant, fresh content.

Store Quickstart

  • Start with Quickstart (2 publisher feeds) for a reliable first run.
  • Use Multi-Source Monitoring to watch several feeds with keyword filters.
  • Use RSS → Article Cleanup when the next step is article extraction.

Where this actor fits

SurfaceBest for
RSS Feed AggregatorDiscover fresh URLs from known publishers and blogs
Google News ScraperDiscover fresh URLs from query-based Google News searches
Article Content ExtractorClean discovered article/news/blog pages
Website Content ExtractorClean discovered docs, pricing, policy, or product pages

Key Features

  • 📡 Feed discovery — Aggregate multiple public RSS/Atom feeds in one run
  • 🔍 Keyword filtering — Keep only the rows that match the themes you care about
  • 🏷️ Match visibility — Returns matchedKeywords for filtered rows
  • 🔄 Deduplication — Remove duplicate links across feeds
  • Low-friction first run — Great for recurring monitoring of known sources

Use Cases

WhoWhy
PR / comms teamsTrack publisher and company newsroom feeds
Competitive intelligenceWatch competitor blogs and product update feeds
Content opsBuild filtered story queues from trusted sources
AI / RAG teamsMaintain a fresh URL stream before deeper extraction

Input

FieldTypeDefaultDescription
feedUrlsstring[]requiredPublic RSS/Atom URLs (max 50)
keywordsstring[][]Optional include-list filter
maxItemsPerFeedinteger25Max items to keep from each feed
deduplicatebooleantrueRemove duplicate links across feeds
timeoutMsinteger15000Request timeout
deliverystringdatasetdataset or webhook
webhookUrlstringWebhook target when delivery=webhook
dryRunbooleanfalseRun without saving

Input Example

{
"feedUrls": [
"https://blog.google/rss/",
"https://openai.com/news/rss.xml"
],
"keywords": ["AI", "agents"],
"maxItemsPerFeed": 10,
"deduplicate": true
}

Input Examples

Example: Single feed

{
"feeds": [
"https://example.com/feed.xml"
]
}

Example: Multi-feed daily digest

{
"feeds": [
"https://hnrss.org/frontpage",
"https://www.theverge.com/rss/index.xml"
],
"maxItemsPerFeed": 50
}

Example: Delta-only run

{
"feeds": [
"https://example.com/feed.xml"
],
"onlyNewSinceLastRun": true,
"snapshotKey": "example-feed-state"
}

Output

FieldTypeDescription
sourcestringFeed URL that produced the row
titlestringFeed item title
linkstringItem URL for downstream extraction
pubDatestringOriginal feed date
pubDateISOstringISO timestamp version of pubDate
descriptionstringSummary text from the feed
contentstringEncoded content when available
categoriesarrayCategories / tags from the feed
matchedKeywordsarrayKeywords that matched the row

Output Example

{
"source": "https://openai.com/news/rss.xml",
"title": "The next evolution of the Agents SDK",
"link": "https://openai.com/index/the-next-evolution-of-the-agents-sdk",
"pubDate": "Wed, 15 Apr 2026 10:00:00 GMT",
"pubDateISO": "2026-04-15T10:00:00.000Z",
"description": "OpenAI updates the Agents SDK with native sandbox execution...",
"matchedKeywords": ["ai", "agents"]
}

First-run buyer experience

  1. Run Quickstart (2 publisher feeds).
  2. Confirm the actor returns recent item URLs plus matchedKeywords.
  3. Send article/news/blog links to Article Content Extractor.
  4. Send docs/product/policy links to Website Content Extractor.

Tips & Limitations

  • Start with a small set of high-trust feeds.
  • Keyword filtering is OR-based; any matched keyword keeps the item.
  • This actor is a feed discovery layer, not a full-content extractor.

FAQ

How is this different from Google News Scraper?

Use RSS Feed Aggregator when you already know the publishers you trust. Use Google News Scraper when you want broader query-based discovery.

Can I see why an item matched?

Yes — filtered rows include a matchedKeywords array.

Can I get full article text here?

No. Use Article Content Extractor or Website Content Extractor on the returned links.

Content Intelligence Pack handoffs:

Cost

Pay Per Event:

  • actor-start: $0.01
  • dataset-item: $0.002 per output item

⭐ Was this helpful?

If this actor saved you time, please leave a ★ rating on Apify Store.