RSS & News Feed Aggregator — Multi-Source Article Scraper
Pricing
Pay per usage
RSS & News Feed Aggregator — Multi-Source Article Scraper
Aggregate and parse RSS/Atom feeds from any source. Extract articles with titles, descriptions, authors, dates, images. Optionally fetch full article content. Perfect for news monitoring and AI pipelines. $0.0005/article.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Ken Digital
Actor stats
0
Bookmarked
8
Total users
6
Monthly active users
12 hours ago
Last modified
Categories
Share
Aggregate and parse multiple RSS 2.0 and Atom feeds into clean, structured data. Built for monitoring, curation, and intelligence workflows.
Features
- Multi-format support — RSS 2.0, RSS 1.0 (RDF), and Atom feeds
- Namespace handling — Parses
media:,dc:,content:encoded, and standard Atom namespaces - Full content extraction — Optionally follows article links and strips HTML to plain text
- Robust parsing — Handles encoding issues, CDATA blocks, malformed dates, and missing fields
- Structured output — Consistent schema across all feed types
Output Schema
Each article in the dataset contains:
| Field | Type | Description |
|---|---|---|
feedUrl | string | Source feed URL |
feedTitle | string | Feed/channel title |
title | string | Article title |
link | string | Article permalink |
description | string | Article summary (HTML stripped, max 5000 chars) |
author | string | Author name |
publishedDate | string | ISO 8601 publication date |
categories | string[] | Tags/categories from the feed |
imageUrl | string | Thumbnail or featured image URL |
guid | string | Unique identifier (GUID or permalink) |
fullContent | string | Full article text (only when fetchFullContent is enabled) |
Input Parameters
{"feedUrls": ["https://feeds.bbci.co.uk/news/rss.xml","https://rss.nytimes.com/services/xml/rss/index.xml","https://hnrss.org/frontpage"],"maxArticles": 100,"fetchFullContent": false}
- feedUrls (required) — Array of RSS/Atom feed URLs to aggregate
- maxArticles (default: 100) — Maximum articles to output. Set to 0 for unlimited.
- fetchFullContent (default: false) — Follow article links and extract full text content
Use Cases
📡 News Monitoring & Media Intelligence
Track coverage across dozens of news outlets. Monitor specific topics by aggregating topic-specific RSS feeds from major publishers. Feed results into sentiment analysis or trend detection pipelines.
📋 Content Curation & Newsletters
Aggregate content from niche blogs, industry publications, and thought leaders into a single dataset. Use as the data source for automated newsletter generation or content recommendation systems.
🔍 Competitive Intelligence
Subscribe to competitor blogs, press release feeds, and industry news. Get structured alerts when new content is published. Combine with keyword filtering for targeted monitoring.
📊 Research & Dataset Building
Build timestamped article datasets for NLP research, media studies, or training data collection. The consistent schema makes downstream processing straightforward.
🤖 AI Pipeline Input
Use as a data source for LLM-powered summarization, classification, or knowledge base updates. The structured output integrates directly with vector databases and RAG pipelines.
⏰ Scheduled Monitoring
Run on a schedule (hourly, daily) with Apify's scheduling feature. Combine with deduplication logic downstream to maintain a continuously updated article database.
Technical Notes
- Uses Python stdlib
xml.etree.ElementTreefor XML parsing (no lxml dependency) - HTTP requests via
httpxwith async support and configurable timeouts - Handles BOM-prefixed feeds and common encoding edge cases
- Date parsing supports RFC 822 (RSS) and ISO 8601 (Atom) formats
- Full content extraction removes
<script>,<style>, and<noscript>blocks before stripping HTML
Pricing
$0.0005 per article parsed and pushed to the dataset.
Example Feeds to Get Started
| Feed | URL |
|---|---|
| BBC News | https://feeds.bbci.co.uk/news/rss.xml |
| Hacker News | https://hnrss.org/frontpage |
| TechCrunch | https://techcrunch.com/feed/ |
| ArXiv CS.AI | http://arxiv.org/rss/cs.AI |
| Reddit r/technology | https://www.reddit.com/r/technology/.rss |
🔗 More Scrapers by Ken Digital
| Scraper | What it does | Price |
|---|---|---|
| YouTube Channel Scraper | Videos, stats, metadata | $0.001/video |
| France Job Scraper | WTTJ + France Travail + Hellowork | $0.005/job |
| France Real Estate Scraper | 5 sources + DVF price analysis | $0.008/listing |
| Website Content Crawler | HTML → Markdown for AI/RAG | $0.001/page |
| Google Trends Scraper | Keywords, regions, related queries | $0.002/keyword |
| GitHub Repo Scraper | Stars, forks, languages, topics | $0.002/repo |
| RSS News Aggregator | Multi-source feed parsing | $0.0005/article |
| Instagram Profile Scraper | Followers, bio, posts | $0.0015/profile |
| Google Maps Scraper | Businesses, reviews, contacts | $0.002/result |
| TikTok Scraper | Videos, likes, shares | $0.001/video |
| Google SERP Scraper | Search results, PAA, snippets | $0.003/search |
| Trustpilot Scraper | Reviews, ratings, sentiment | $0.001/review |
🔗 Quick Integration
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("joyouscam35875/rss-news-aggregator").call(run_input={...})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('joyouscam35875/rss-news-aggregator').call({...});const { items } = await client.dataset(run.defaultDatasetId).listItems();
No-code: Make / Zapier / n8n
Search for this actor in the Apify connector. No code needed.