RSS & News Feed Aggregator — Multi-Source Article Scraper avatar

RSS & News Feed Aggregator — Multi-Source Article Scraper

Pricing

Pay per usage

Go to Apify Store
RSS & News Feed Aggregator — Multi-Source Article Scraper

RSS & News Feed Aggregator — Multi-Source Article Scraper

Aggregate and parse RSS/Atom feeds from any source. Extract articles with titles, descriptions, authors, dates, images. Optionally fetch full article content. Perfect for news monitoring and AI pipelines. $0.0005/article.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Ken Digital

Ken Digital

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 hours ago

Last modified

Categories

Share

Aggregate and parse multiple RSS 2.0 and Atom feeds into clean, structured data. Built for monitoring, curation, and intelligence workflows.

Features

  • Multi-format support — RSS 2.0, RSS 1.0 (RDF), and Atom feeds
  • Namespace handling — Parses media:, dc:, content:encoded, and standard Atom namespaces
  • Full content extraction — Optionally follows article links and strips HTML to plain text
  • Robust parsing — Handles encoding issues, CDATA blocks, malformed dates, and missing fields
  • Structured output — Consistent schema across all feed types

Output Schema

Each article in the dataset contains:

FieldTypeDescription
feedUrlstringSource feed URL
feedTitlestringFeed/channel title
titlestringArticle title
linkstringArticle permalink
descriptionstringArticle summary (HTML stripped, max 5000 chars)
authorstringAuthor name
publishedDatestringISO 8601 publication date
categoriesstring[]Tags/categories from the feed
imageUrlstringThumbnail or featured image URL
guidstringUnique identifier (GUID or permalink)
fullContentstringFull article text (only when fetchFullContent is enabled)

Input Parameters

{
"feedUrls": [
"https://feeds.bbci.co.uk/news/rss.xml",
"https://rss.nytimes.com/services/xml/rss/index.xml",
"https://hnrss.org/frontpage"
],
"maxArticles": 100,
"fetchFullContent": false
}
  • feedUrls (required) — Array of RSS/Atom feed URLs to aggregate
  • maxArticles (default: 100) — Maximum articles to output. Set to 0 for unlimited.
  • fetchFullContent (default: false) — Follow article links and extract full text content

Use Cases

📡 News Monitoring & Media Intelligence

Track coverage across dozens of news outlets. Monitor specific topics by aggregating topic-specific RSS feeds from major publishers. Feed results into sentiment analysis or trend detection pipelines.

📋 Content Curation & Newsletters

Aggregate content from niche blogs, industry publications, and thought leaders into a single dataset. Use as the data source for automated newsletter generation or content recommendation systems.

🔍 Competitive Intelligence

Subscribe to competitor blogs, press release feeds, and industry news. Get structured alerts when new content is published. Combine with keyword filtering for targeted monitoring.

📊 Research & Dataset Building

Build timestamped article datasets for NLP research, media studies, or training data collection. The consistent schema makes downstream processing straightforward.

🤖 AI Pipeline Input

Use as a data source for LLM-powered summarization, classification, or knowledge base updates. The structured output integrates directly with vector databases and RAG pipelines.

⏰ Scheduled Monitoring

Run on a schedule (hourly, daily) with Apify's scheduling feature. Combine with deduplication logic downstream to maintain a continuously updated article database.

Technical Notes

  • Uses Python stdlib xml.etree.ElementTree for XML parsing (no lxml dependency)
  • HTTP requests via httpx with async support and configurable timeouts
  • Handles BOM-prefixed feeds and common encoding edge cases
  • Date parsing supports RFC 822 (RSS) and ISO 8601 (Atom) formats
  • Full content extraction removes <script>, <style>, and <noscript> blocks before stripping HTML

Pricing

$0.0005 per article parsed and pushed to the dataset.

Example Feeds to Get Started

FeedURL
BBC Newshttps://feeds.bbci.co.uk/news/rss.xml
Hacker Newshttps://hnrss.org/frontpage
TechCrunchhttps://techcrunch.com/feed/
ArXiv CS.AIhttp://arxiv.org/rss/cs.AI
Reddit r/technologyhttps://www.reddit.com/r/technology/.rss

🔗 More Scrapers by Ken Digital

ScraperWhat it doesPrice
YouTube Channel ScraperVideos, stats, metadata$0.001/video
France Job ScraperWTTJ + France Travail + Hellowork$0.005/job
France Real Estate Scraper5 sources + DVF price analysis$0.008/listing
Website Content CrawlerHTML → Markdown for AI/RAG$0.001/page
Google Trends ScraperKeywords, regions, related queries$0.002/keyword
GitHub Repo ScraperStars, forks, languages, topics$0.002/repo
RSS News AggregatorMulti-source feed parsing$0.0005/article
Instagram Profile ScraperFollowers, bio, posts$0.0015/profile
Google Maps ScraperBusinesses, reviews, contacts$0.002/result
TikTok ScraperVideos, likes, shares$0.001/video
Google SERP ScraperSearch results, PAA, snippets$0.003/search
Trustpilot ScraperReviews, ratings, sentiment$0.001/review

👉 View all scrapers

🔗 Quick Integration

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("joyouscam35875/rss-news-aggregator").call(run_input={...})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('joyouscam35875/rss-news-aggregator').call({...});
const { items } = await client.dataset(run.defaultDatasetId).listItems();

No-code: Make / Zapier / n8n

Search for this actor in the Apify connector. No code needed.