AI-Powered RSS Aggregator & Summarizer avatar

AI-Powered RSS Aggregator & Summarizer

Pricing

from $0.50 / 1,000 results

Go to Apify Store
AI-Powered RSS Aggregator & Summarizer

AI-Powered RSS Aggregator & Summarizer

Enterprise-grade RSS aggregator with AI-powered summarization. Collects, filters, and processes feeds from any source. Ideal for content analysis, news monitoring, and AI training. Features keyword filtering, metadata extraction, and structured output in JSON/CSV. Built with Hugging Face.

Pricing

from $0.50 / 1,000 results

Rating

5.0

(1)

Developer

PrimeParse

PrimeParse

Maintained by Community

Actor stats

0

Bookmarked

7

Total users

1

Monthly active users

2 months ago

Last modified

Share

🌐 RSS Aggregator: AI-Powered RSS Aggregator & Summarizer

Enterprise-grade RSS aggregator with AI-powered summarization. Collects, filters, and processes feeds from any source. Ideal for content analysis, news monitoring, and AI training. Features keyword filtering, metadata extraction, and structured output in JSON/CSV. Built with Hugging Face for advanced summaries.

High-quality RSS Feed Aggregator & Processor for Content Teams, Researchers, and AI Engineers

Automatically aggregates RSS feeds, filters by keywords, extracts summaries, and optionally generates AI-powered summaries β€” clean, structured, ready for analysis or AI.

Built for:

  • Content aggregators & news monitoring teams
  • Researchers tracking academic papers and publications
  • AI/ML engineers building content datasets
  • Marketing teams monitoring industry trends
  • Data analysts processing feed data

βœ… Smart keyword filtering
βœ… AI-powered summarization (Hugging Face transformers)
βœ… Multiple feed support (1-5 feeds recommended)
βœ… Rich metadata extraction (date, author, tags, description)
βœ… Rate limiting & respectful crawling
βœ… AI-ready structured output

πŸ‘‰ Runs on Apify β€’ No code required

πŸš€ Why This Aggregator

βœ” Purpose-Built for RSS Processing
Intelligently aggregates and processes RSS feeds from any source β€” news sites, academic journals, blogs, corporate feeds.

βœ” AI Summarization Ready
Optional integration with Hugging Face transformers (BART, Pegasus) for advanced AI-powered summarization of feed entries.

βœ” Clean & Structured Output
Extracts only meaningful content β€” title, link, summary, author, tags, publication date β€” ready for analysis.

βœ” Smart Keyword Filtering
Filter entries by custom keywords (case-insensitive) across title, summary, and tags for relevance.

βœ” AI & ML Ready
Structured JSON/CSV output perfect for RAG systems, LLM fine-tuning, or training datasets.

βœ” Fast & Efficient
Powered by feedparser β€” excellent for RSS/Atom feeds. Lightweight and fast processing.

βœ” Safe & Controlled Processing
Configurable rate limiting, entry limits per feed, and graceful error handling.

πŸ’Ό Use Cases

  • News monitoring β€” Track industry news and trends from multiple sources
  • Academic research β€” Aggregate papers from arXiv, PubMed, and other academic feeds
  • Content curation β€” Collect and filter relevant content for newsletters or blogs
  • AI training data β€” Generate clean datasets for LLM fine-tuning or RAG systems
  • Competitive intelligence β€” Monitor competitor blogs and news feeds
  • Market research β€” Track product announcements and industry updates

πŸ“Š Supported Sources

  • News feeds β€” TechCrunch, Reuters, BBC, Guardian, etc.
  • Academic feeds β€” arXiv, PubMed, academic journals
  • Blog feeds β€” Medium, WordPress, custom blog RSS
  • Corporate feeds β€” Company blogs, press releases, announcements
  • Any RSS/Atom feed β€” Standard-compliant feeds

βš™οΈ How It Works

  1. Provide RSS feed URLs (1-5 feeds recommended)
  2. Set custom keywords and processing options
  3. Optionally enable AI summarization
  4. Run the Actor
  5. Download clean, structured RSS datasets

🧩 Input Configuration

Example JSON Input

{
"rssFeeds": [
"https://arxiv.org/rss/cs.AI",
"https://techcrunch.com/feed/"
],
"maxEntriesPerFeed": 10,
"keywords": [
"AI",
"machine learning",
"artificial intelligence"
],
"enableSummarization": true,
"enableAISummarization": true,
"aiModelName": "facebook/bart-large-cnn",
"aiMaxLength": 1024,
"aiMinLength": 50,
"aiMaxSummaryLength": 150,
"delayBetweenFeeds": 1.0
}

Key Options

  • rssFeeds β€” List of RSS feed URLs to aggregate (required, 1-5 recommended)
  • maxEntriesPerFeed β€” Maximum entries per feed (0 = unlimited, default: 10)
  • keywords β€” Custom keywords for filtering entries (case-insensitive, empty = all entries)
  • enableSummarization β€” Extract summary/description from feeds (default: true)
  • enableAISummarization β€” Use Hugging Face AI for advanced summarization (default: false)
  • aiModelName β€” Hugging Face model identifier (default: "facebook/bart-large-cnn")
  • aiMaxLength β€” Maximum input length for AI model (default: 1024 tokens)
  • aiMinLength β€” Minimum summary length (default: 50 tokens)
  • aiMaxSummaryLength β€” Maximum summary length (default: 150 tokens)
  • delayBetweenFeeds β€” Delay in seconds between feeds for rate limiting (default: 1.0)

πŸ“‚ Output Dataset

All entries are stored in the default Apify dataset with the following structure:

Example Output Record

{
"title": "Adobe hit with proposed class-action, accused of misusing authors' work in AI training",
"link": "https://techcrunch.com/2025/12/17/adobe-hit-with-proposed-class-action-accused-of-misusing-authors-work-in-ai-training/",
"published": "2025-12-18T00:44:55",
"summary": "The lawsuit is just the latest in a string of copyright-related legal complaints aimed at the AI industry.",
"feedTitle": "TechCrunch",
"feedUrl": "https://techcrunch.com/feed/",
"author": "Lucas Ropek",
"tags": [
"AI",
"Adobe",
"Anthropic",
"artificial intelligence"
]
}

With AI Summarization

When enableAISummarization: true, the summary field contains AI-generated summaries:

{
"title": "Breakthrough in Quantum Computing",
"link": "https://example.com/quantum-breakthrough",
"published": "2025-12-15T10:30:00",
"summary": "Researchers achieve significant milestone in quantum error correction, bringing practical quantum computing closer to reality. The new method reduces error rates by 50%...",
"feedTitle": "Science News",
"feedUrl": "https://example.com/feed.xml",
"author": "Dr. Jane Smith",
"tags": ["quantum computing", "research", "technology"]
}

πŸ€– AI Summarization Models

Supported Hugging Face models for summarization:

  • facebook/bart-large-cnn (default) β€” Best for news articles and general content
  • google/pegasus-xsum β€” Optimized for news summaries
  • Any summarization model β€” Compatible with Hugging Face transformers

The Actor automatically falls back to basic extraction if AI summarization fails or is unavailable.

🏁 Getting Started

Quick Start on Apify

  1. Click "Try for free" on Apify
  2. Paste RSS feed URLs (e.g., https://techcrunch.com/feed/)
  3. Customize keywords and options
  4. Optionally enable AI summarization
  5. Run and download your dataset

πŸ“ˆ Performance

  • Processing Speed β€” ~1-2 seconds per feed (depending on entries)
  • Rate Limiting β€” Configurable delay between feeds (default: 1s)
  • Memory Efficient β€” Processes feeds sequentially
  • Scalability β€” Handles 1-5 feeds optimally (can process more)

πŸ”§ Advanced Configuration

Custom AI Models

You can use any Hugging Face summarization model:

{
"enableAISummarization": true,
"aiModelName": "google/pegasus-xsum",
"aiMaxLength": 2048,
"aiMinLength": 100,
"aiMaxSummaryLength": 200
}

πŸ“§ Support

Tags: RSS, feed aggregator, content processing, AI summarization, Hugging Face, news aggregation, feed parser, content analysis, RAG, LLM training, data extraction


Built with ❀️ on Apify