RSS & News Feed Aggregator — Multi-Source Article Scraper
Pricing
Pay per usage
RSS & News Feed Aggregator — Multi-Source Article Scraper
Aggregate and parse RSS/Atom feeds from any source. Extract articles with titles, descriptions, authors, dates, images. Optionally fetch full article content. Perfect for news monitoring and AI pipelines. $0.0005/article.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Ken Digital
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 hours ago
Last modified
Categories
Share
Aggregate and parse multiple RSS 2.0 and Atom feeds into clean, structured data. Built for monitoring, curation, and intelligence workflows.
Features
- Multi-format support — RSS 2.0, RSS 1.0 (RDF), and Atom feeds
- Namespace handling — Parses
media:,dc:,content:encoded, and standard Atom namespaces - Full content extraction — Optionally follows article links and strips HTML to plain text
- Robust parsing — Handles encoding issues, CDATA blocks, malformed dates, and missing fields
- Structured output — Consistent schema across all feed types
Output Schema
Each article in the dataset contains:
| Field | Type | Description |
|---|---|---|
feedUrl | string | Source feed URL |
feedTitle | string | Feed/channel title |
title | string | Article title |
link | string | Article permalink |
description | string | Article summary (HTML stripped, max 5000 chars) |
author | string | Author name |
publishedDate | string | ISO 8601 publication date |
categories | string[] | Tags/categories from the feed |
imageUrl | string | Thumbnail or featured image URL |
guid | string | Unique identifier (GUID or permalink) |
fullContent | string | Full article text (only when fetchFullContent is enabled) |
Input Parameters
{"feedUrls": ["https://feeds.bbci.co.uk/news/rss.xml","https://rss.nytimes.com/services/xml/rss/index.xml","https://hnrss.org/frontpage"],"maxArticles": 100,"fetchFullContent": false}
- feedUrls (required) — Array of RSS/Atom feed URLs to aggregate
- maxArticles (default: 100) — Maximum articles to output. Set to 0 for unlimited.
- fetchFullContent (default: false) — Follow article links and extract full text content
Use Cases
📡 News Monitoring & Media Intelligence
Track coverage across dozens of news outlets. Monitor specific topics by aggregating topic-specific RSS feeds from major publishers. Feed results into sentiment analysis or trend detection pipelines.
📋 Content Curation & Newsletters
Aggregate content from niche blogs, industry publications, and thought leaders into a single dataset. Use as the data source for automated newsletter generation or content recommendation systems.
🔍 Competitive Intelligence
Subscribe to competitor blogs, press release feeds, and industry news. Get structured alerts when new content is published. Combine with keyword filtering for targeted monitoring.
📊 Research & Dataset Building
Build timestamped article datasets for NLP research, media studies, or training data collection. The consistent schema makes downstream processing straightforward.
🤖 AI Pipeline Input
Use as a data source for LLM-powered summarization, classification, or knowledge base updates. The structured output integrates directly with vector databases and RAG pipelines.
⏰ Scheduled Monitoring
Run on a schedule (hourly, daily) with Apify's scheduling feature. Combine with deduplication logic downstream to maintain a continuously updated article database.
Technical Notes
- Uses Python stdlib
xml.etree.ElementTreefor XML parsing (no lxml dependency) - HTTP requests via
httpxwith async support and configurable timeouts - Handles BOM-prefixed feeds and common encoding edge cases
- Date parsing supports RFC 822 (RSS) and ISO 8601 (Atom) formats
- Full content extraction removes
<script>,<style>, and<noscript>blocks before stripping HTML
Pricing
$0.0005 per article parsed and pushed to the dataset.
Example Feeds to Get Started
| Feed | URL |
|---|---|
| BBC News | https://feeds.bbci.co.uk/news/rss.xml |
| Hacker News | https://hnrss.org/frontpage |
| TechCrunch | https://techcrunch.com/feed/ |
| ArXiv CS.AI | http://arxiv.org/rss/cs.AI |
| Reddit r/technology | https://www.reddit.com/r/technology/.rss |
🔗 More Scrapers by Ken Digital
| Scraper | What it does | Price |
|---|---|---|
| YouTube Channel Scraper | Videos, stats, metadata | $0.001/video |
| France Job Scraper | WTTJ + France Travail + Hellowork | $0.005/job |
| France Real Estate Scraper | 5 sources + DVF price analysis | $0.008/listing |
| Website Content Crawler | HTML → Markdown for AI/RAG | $0.001/page |
| Google Trends Scraper | Keywords, regions, related queries | $0.002/keyword |
| GitHub Repo Scraper | Stars, forks, languages, topics | $0.002/repo |
| RSS News Aggregator | Multi-source feed parsing | $0.0005/article |
| Instagram Profile Scraper | Followers, bio, posts | $0.0015/profile |
| Google Maps Scraper | Businesses, reviews, contacts | $0.002/result |
| TikTok Scraper | Videos, likes, shares | $0.001/video |
| Google SERP Scraper | Search results, PAA, snippets | $0.003/search |
| Trustpilot Scraper | Reviews, ratings, sentiment | $0.001/review |
🔗 Quick Integration
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("joyouscam35875/rss-news-aggregator").call(run_input={...})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('joyouscam35875/rss-news-aggregator').call({...});const { items } = await client.dataset(run.defaultDatasetId).listItems();
No-code: Make / Zapier / n8n
Search for this actor in the Apify connector. No code needed.