Multi-Source News & Content Scraper avatar

Multi-Source News & Content Scraper

Pricing

from $10.00 / 1,000 results

Go to Apify Store
Multi-Source News & Content Scraper

Multi-Source News & Content Scraper

Multi-Source News & Content Scraper. Aggregates articles from multiple RSS/Atom feeds simultaneously. Includes 60+ pre-built news source presets and supports custom feed URLs. No API key required.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

Jamshaid Arif

Jamshaid Arif

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

πŸ“° RSS Feeds β€” Multi-Source News & Content Scraper β€” Apify Actor

Aggregates articles from multiple RSS/Atom feeds simultaneously. Includes 60+ pre-built news source presets and supports custom feed URLs. No API key required.

Features

  • 60+ preset feeds β€” TechCrunch, BBC, Reuters, Wired, NYT, The Guardian, Hacker News, arXiv, NASA, ESPN, Reddit, and many more.
  • Custom feed URLs β€” add any RSS/Atom feed alongside presets.
  • Keyword filtering β€” include articles matching specific keywords, or exclude unwanted terms.
  • Date filtering β€” only articles from the last N hours, or published after a specific date.
  • Category/tag filtering β€” filter by RSS categories or tags.
  • Deduplication β€” removes duplicate articles appearing across multiple feeds.
  • 3 output formats β€” enriched, minimal, raw.
  • Auto-retry β€” each feed retried up to 2 times on failure.
  • Per-feed stats β€” summary shows fetched/kept counts and status per source.

Preset Categories

CategorySources
Tech NewsTechCrunch, Wired, Ars Technica, The Verge, Mashable, Engadget, ZDNet, VentureBeat, MIT Tech Review
World NewsBBC World, Reuters, Al Jazeera, NPR, CNN, NYT, The Guardian, WSJ, The Economist
DeveloperHacker News, Lobsters, DEV.to, HackerNoon, CSS-Tricks, Smashing Magazine, freeCodeCamp
Company BlogsGitHub, AWS, Google AI, OpenAI, Anthropic
LanguagesPython Insider, Go Blog, Rust Blog
ScienceNature, Science Daily, Phys.org, NASA, Space.com, arXiv CS/AI
SportsESPN, BBC Sport
BusinessForbes, Bloomberg Markets, CoinDesk, The Block
SocialReddit r/technology, r/programming, r/worldnews, Product Hunt, Medium

Input Examples

AI News Aggregator

{
"presetFeeds": ["techcrunch", "wired", "mit_tech_review", "openai_blog", "anthropic_blog", "google_ai_blog", "arxiv_cs_ai"],
"keywordFilter": "AI, artificial intelligence, LLM, GPT, machine learning",
"maxAgeHours": 72,
"sortBy": "date_desc",
"outputFormat": "enriched"
}

Developer News Feed

{
"presetFeeds": ["hacker_news", "lobsters", "dev_to", "github_blog", "python_insider", "rust_blog"],
"excludeKeywords": "sponsored, hiring",
"maxArticlesPerFeed": 20,
"sortBy": "date_desc"
}

World News Monitor

{
"presetFeeds": ["bbc_world", "reuters_world", "al_jazeera", "nyt_world", "guardian_world"],
"maxAgeHours": 24,
"deduplicate": true,
"sortBy": "date_desc"
}

Custom Feeds + Presets

{
"presetFeeds": ["techcrunch"],
"customFeedUrls": "https://example.com/feed.xml\nhttps://myblog.com/rss",
"outputFormat": "enriched"
}

Enriched Output Fields

FieldExample
id1
titleOpenAI Announces GPT-5 with Reasoning
linkhttps://techcrunch.com/2025/…
domaintechcrunch.com
sourceTechCrunch
source_categoryTech
authorJane Smith
published2025-04-05T10:30:00+00:00
published_date2025-04-05
published_time10:30:00
summaryOpenAI has unveiled its latest…
content_previewThe full article text preview…
categoriesAI, Machine Learning, OpenAI
image_urlhttps://…/header.jpg
word_count342
guidhttps://techcrunch.com/?p=12345
feed_urlhttps://techcrunch.com/feed/

Minimal Format

title, link, source, category, published

Raw Format

Full feedparser entry structure with _source, _source_category, _feed_url added.

Filtering

Keywords (Include)

Comma-separated, case-insensitive. Articles kept if title OR summary contains ANY keyword.

"AI, machine learning, GPT" β†’ keeps articles mentioning any of these.

Keywords (Exclude)

Articles removed if title OR summary contains ANY excluded keyword.

"sponsored, advertisement, hiring" β†’ removes promotional content.

Date Filtering

  • Max Age Hours: only articles from the last N hours (e.g. 24 = last day)
  • Published After: only articles after a date (2025-01-01)

Category Filter

Comma-separated RSS tags/categories. Articles kept if tagged with ANY matching category.

Run Summary

Saved to key-value store as summary:

{
"total_feeds": 7,
"total_articles": 142,
"feeds_status": {
"TechCrunch": {"fetched": 30, "kept": 12, "status": "ok"},
"Wired": {"fetched": 25, "kept": 8, "status": "ok"},
"OpenAI Blog": {"fetched": 0, "kept": 0, "status": "failed"}
},
"source_breakdown": {"TechCrunch": 12, "Wired": 8, ...},
"category_breakdown": {"Tech": 45, "AI": 28, ...}
}