RSS & News Feed Aggregator — Multi-Source Article Scraper avatar

RSS & News Feed Aggregator — Multi-Source Article Scraper

Pricing

Pay per usage

Go to Apify Store
RSS & News Feed Aggregator — Multi-Source Article Scraper

RSS & News Feed Aggregator — Multi-Source Article Scraper

Aggregate and parse RSS/Atom feeds from any source. Extract articles with titles, descriptions, authors, dates, images. Optionally fetch full article content. Perfect for news monitoring and AI pipelines. $0.0005/article.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Ken Digital

Ken Digital

Maintained by Community

Actor stats

0

Bookmarked

8

Total users

6

Monthly active users

12 hours ago

Last modified

Categories

Share

Aggregate and parse multiple RSS 2.0 and Atom feeds into clean, structured data. Built for monitoring, curation, and intelligence workflows.

Features

  • Multi-format support — RSS 2.0, RSS 1.0 (RDF), and Atom feeds
  • Namespace handling — Parses media:, dc:, content:encoded, and standard Atom namespaces
  • Full content extraction — Optionally follows article links and strips HTML to plain text
  • Robust parsing — Handles encoding issues, CDATA blocks, malformed dates, and missing fields
  • Structured output — Consistent schema across all feed types

Output Schema

Each article in the dataset contains:

FieldTypeDescription
feedUrlstringSource feed URL
feedTitlestringFeed/channel title
titlestringArticle title
linkstringArticle permalink
descriptionstringArticle summary (HTML stripped, max 5000 chars)
authorstringAuthor name
publishedDatestringISO 8601 publication date
categoriesstring[]Tags/categories from the feed
imageUrlstringThumbnail or featured image URL
guidstringUnique identifier (GUID or permalink)
fullContentstringFull article text (only when fetchFullContent is enabled)

Input Parameters

{
"feedUrls": [
"https://feeds.bbci.co.uk/news/rss.xml",
"https://rss.nytimes.com/services/xml/rss/index.xml",
"https://hnrss.org/frontpage"
],
"maxArticles": 100,
"fetchFullContent": false
}
  • feedUrls (required) — Array of RSS/Atom feed URLs to aggregate
  • maxArticles (default: 100) — Maximum articles to output. Set to 0 for unlimited.
  • fetchFullContent (default: false) — Follow article links and extract full text content

Use Cases

📡 News Monitoring & Media Intelligence

Track coverage across dozens of news outlets. Monitor specific topics by aggregating topic-specific RSS feeds from major publishers. Feed results into sentiment analysis or trend detection pipelines.

📋 Content Curation & Newsletters

Aggregate content from niche blogs, industry publications, and thought leaders into a single dataset. Use as the data source for automated newsletter generation or content recommendation systems.

🔍 Competitive Intelligence

Subscribe to competitor blogs, press release feeds, and industry news. Get structured alerts when new content is published. Combine with keyword filtering for targeted monitoring.

📊 Research & Dataset Building

Build timestamped article datasets for NLP research, media studies, or training data collection. The consistent schema makes downstream processing straightforward.

🤖 AI Pipeline Input

Use as a data source for LLM-powered summarization, classification, or knowledge base updates. The structured output integrates directly with vector databases and RAG pipelines.

⏰ Scheduled Monitoring

Run on a schedule (hourly, daily) with Apify's scheduling feature. Combine with deduplication logic downstream to maintain a continuously updated article database.

Technical Notes

  • Uses Python stdlib xml.etree.ElementTree for XML parsing (no lxml dependency)
  • HTTP requests via httpx with async support and configurable timeouts
  • Handles BOM-prefixed feeds and common encoding edge cases
  • Date parsing supports RFC 822 (RSS) and ISO 8601 (Atom) formats
  • Full content extraction removes <script>, <style>, and <noscript> blocks before stripping HTML

Pricing

$0.0005 per article parsed and pushed to the dataset.

Example Feeds to Get Started

FeedURL
BBC Newshttps://feeds.bbci.co.uk/news/rss.xml
Hacker Newshttps://hnrss.org/frontpage
TechCrunchhttps://techcrunch.com/feed/
ArXiv CS.AIhttp://arxiv.org/rss/cs.AI
Reddit r/technologyhttps://www.reddit.com/r/technology/.rss

🔗 More Scrapers by Ken Digital

ScraperWhat it doesPrice
YouTube Channel ScraperVideos, stats, metadata$0.001/video
France Job ScraperWTTJ + France Travail + Hellowork$0.005/job
France Real Estate Scraper5 sources + DVF price analysis$0.008/listing
Website Content CrawlerHTML → Markdown for AI/RAG$0.001/page
Google Trends ScraperKeywords, regions, related queries$0.002/keyword
GitHub Repo ScraperStars, forks, languages, topics$0.002/repo
RSS News AggregatorMulti-source feed parsing$0.0005/article
Instagram Profile ScraperFollowers, bio, posts$0.0015/profile
Google Maps ScraperBusinesses, reviews, contacts$0.002/result
TikTok ScraperVideos, likes, shares$0.001/video
Google SERP ScraperSearch results, PAA, snippets$0.003/search
Trustpilot ScraperReviews, ratings, sentiment$0.001/review

👉 View all scrapers

🔗 Quick Integration

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("joyouscam35875/rss-news-aggregator").call(run_input={...})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('joyouscam35875/rss-news-aggregator').call({...});
const { items } = await client.dataset(run.defaultDatasetId).listItems();

No-code: Make / Zapier / n8n

Search for this actor in the Apify connector. No code needed.