📡 RSS & Atom Feed Extractor
Pricing
Pay per event
📡 RSS & Atom Feed Extractor
Aggregate public RSS feeds into structured JSON datasets to discover fresh website URLs from blogs and newsrooms for downstream web scrapers.
Pricing
Pay per event
Rating
0.0
(0)
Developer
太郎 山田
Maintained by CommunityActor stats
1
Bookmarked
6
Total users
2
Monthly active users
4 days ago
Last modified
Categories
Share
📡 RSS Feed Aggregator
Parse, filter, and extract data from any public RSS or Atom XML feed to automate your web scraping pipelines. This RSS and Atom Feed Extractor acts as a highly efficient discovery surface, designed specifically to aggregate new URLs from known publisher blogs, company update logs, and tech newsrooms before passing them to specialized downstream scrapers. Developers and data engineers rely on this tool to build reliable content intelligence workflows without relying on complex browser automation. Instead of crawling entire website structures to find new pages, you simply provide a list of trusted feed URLs. The scraper parses the XML, applies your custom keyword filters, and returns only the web pages that match your themes. It is the perfect lightweight feeder mechanism for daily or weekly monitoring of targeted web sources. The resulting dataset includes the discovered URLs, the original feed source, publication dates, and the specific keyword matches that triggered the rule. Because the output is pre-formatted and filtered, you can seamlessly integrate the results into an article content extractor or generic web scraper to fetch the full DOM, text content, and metadata. By isolating the URL discovery phase into this dedicated feed aggregator, you optimize your scraping costs, reduce blocked requests, and ensure your data pipelines only run on relevant, fresh content.
Store Quickstart
- Start with Quickstart (2 publisher feeds) for a reliable first run.
- Use Multi-Source Monitoring to watch several feeds with keyword filters.
- Use RSS → Article Cleanup when the next step is article extraction.
Where this actor fits
| Surface | Best for |
|---|---|
| RSS Feed Aggregator | Discover fresh URLs from known publishers and blogs |
| Google News Scraper | Discover fresh URLs from query-based Google News searches |
| Article Content Extractor | Clean discovered article/news/blog pages |
| Website Content Extractor | Clean discovered docs, pricing, policy, or product pages |
Key Features
- 📡 Feed discovery — Aggregate multiple public RSS/Atom feeds in one run
- 🔍 Keyword filtering — Keep only the rows that match the themes you care about
- 🏷️ Match visibility — Returns
matchedKeywordsfor filtered rows - 🔄 Deduplication — Remove duplicate links across feeds
- ⚡ Low-friction first run — Great for recurring monitoring of known sources
Use Cases
| Who | Why |
|---|---|
| PR / comms teams | Track publisher and company newsroom feeds |
| Competitive intelligence | Watch competitor blogs and product update feeds |
| Content ops | Build filtered story queues from trusted sources |
| AI / RAG teams | Maintain a fresh URL stream before deeper extraction |
Input
| Field | Type | Default | Description |
|---|---|---|---|
feedUrls | string[] | required | Public RSS/Atom URLs (max 50) |
keywords | string[] | [] | Optional include-list filter |
maxItemsPerFeed | integer | 25 | Max items to keep from each feed |
deduplicate | boolean | true | Remove duplicate links across feeds |
timeoutMs | integer | 15000 | Request timeout |
delivery | string | dataset | dataset or webhook |
webhookUrl | string | — | Webhook target when delivery=webhook |
dryRun | boolean | false | Run without saving |
Input Example
{"feedUrls": ["https://blog.google/rss/","https://openai.com/news/rss.xml"],"keywords": ["AI", "agents"],"maxItemsPerFeed": 10,"deduplicate": true}
Input Examples
Example: Single feed
{"feeds": ["https://example.com/feed.xml"]}
Example: Multi-feed daily digest
{"feeds": ["https://hnrss.org/frontpage","https://www.theverge.com/rss/index.xml"],"maxItemsPerFeed": 50}
Example: Delta-only run
{"feeds": ["https://example.com/feed.xml"],"onlyNewSinceLastRun": true,"snapshotKey": "example-feed-state"}
Output
| Field | Type | Description |
|---|---|---|
source | string | Feed URL that produced the row |
title | string | Feed item title |
link | string | Item URL for downstream extraction |
pubDate | string | Original feed date |
pubDateISO | string | ISO timestamp version of pubDate |
description | string | Summary text from the feed |
content | string | Encoded content when available |
categories | array | Categories / tags from the feed |
matchedKeywords | array | Keywords that matched the row |
Output Example
{"source": "https://openai.com/news/rss.xml","title": "The next evolution of the Agents SDK","link": "https://openai.com/index/the-next-evolution-of-the-agents-sdk","pubDate": "Wed, 15 Apr 2026 10:00:00 GMT","pubDateISO": "2026-04-15T10:00:00.000Z","description": "OpenAI updates the Agents SDK with native sandbox execution...","matchedKeywords": ["ai", "agents"]}
First-run buyer experience
- Run Quickstart (2 publisher feeds).
- Confirm the actor returns recent item URLs plus
matchedKeywords. - Send article/news/blog links to Article Content Extractor.
- Send docs/product/policy links to Website Content Extractor.
Tips & Limitations
- Start with a small set of high-trust feeds.
- Keyword filtering is OR-based; any matched keyword keeps the item.
- This actor is a feed discovery layer, not a full-content extractor.
FAQ
How is this different from Google News Scraper?
Use RSS Feed Aggregator when you already know the publishers you trust. Use Google News Scraper when you want broader query-based discovery.
Can I see why an item matched?
Yes — filtered rows include a matchedKeywords array.
Can I get full article text here?
No. Use Article Content Extractor or Website Content Extractor on the returned links.
Related Actors
Content Intelligence Pack handoffs:
- 📰 Article Content Extractor — clean discovered article/news/blog pages
- 📄 Website Content Extractor — clean discovered non-article pages
- 📰 Google News Scraper — query-based discovery when you do not have feed URLs yet
Cost
Pay Per Event:
actor-start: $0.01dataset-item: $0.002 per output item
⭐ Was this helpful?
If this actor saved you time, please leave a ★ rating on Apify Store.