News Intelligence Scraper — AI Agent Real-Time News API
Pricing
from $3.50 / 1,000 results
News Intelligence Scraper — AI Agent Real-Time News API
Multi-source real-time news aggregator for AI agents: Google News, Bing News and DuckDuckGo News merged, deduplicated, source-ranked and sentiment-scored. One topic or company to clean structured news feed. No API key, no browser.
Pricing
from $3.50 / 1,000 results
Rating
0.0
(0)
Developer
Logiover
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Multi-source real-time news aggregator for AI agents. Drop in a topic, company name or keyword and get back a clean, deduplicated, sentiment-scored news feed merged from Google News, Bing News and DuckDuckGo News — all in one Apify Actor run. No API key, no headless browser, no per-source scrapers to maintain.
Built for the new wave of AI agents that need fresh, grounded information — analyst agents tracking a market, brand-monitoring agents watching sentiment, research agents summarizing "what's happening with X this week", and RAG pipelines that must cite current sources instead of relying on training-data knowledge with a cutoff date.
🎯 What this Actor is for
Large language models don't know what happened yesterday. When an AI agent is asked "what's the latest on OpenAI?" or "summarize this week's electric-vehicle news", it needs structured, current, multi-source news — not a single publisher's RSS or a raw HTML page to re-parse. news-intelligence-scraper is that real-time grounding layer:
- One topic → many sources. A single query is fanned out to Google News, Bing News and DuckDuckGo News RSS feeds in parallel, then merged.
- Deduplicated. The same story syndicated across outlets (or the same wire story on multiple aggregators) is collapsed into one row with a
duplicateCountand the list of source feeds that carried it. - Sentiment-scored. A lightweight lexicon model tags each headline + snippet with a
-1..+1score and apositive/negative/neutrallabel — no ML dependencies, fast, and good enough for trend signals. - Time-filtered. Keep only the last N days; sort newest-first.
- AI-agent friendly schema. Predictable fields, ISO dates, nullable values, per-item source attribution. Drop straight into a prompt or a vector store.
- Batch by default. Feed 50 topics and get 50 merged feeds back in one run — perfect for monitoring dashboards and trend reports.
- No keys, no browser. Pure HTTP + RSS parsing on a small Node 20 container. Cheap, fast, resilient.
✨ Key features
- 🌐 Three source feeds — Google News RSS, Bing News RSS, DuckDuckGo News RSS, fetched in parallel per query. Pick any subset.
- 🔀 Cross-source deduplication — exact key dedup (source-domain + normalized title) plus fuzzy token-Jaccard title similarity (>0.72) to catch syndicated/wire copies across outlets.
- 📈 Sentiment scoring — AFINN-style lexicon (~250 weighted terms + negation handling) producing a normalized
-1..+1score andpositive/negative/neutrallabel on every item. - 📅 Time filtering —
daysBackkeeps only items published within a window (0 = no filter). Items without a parseable date are kept (sorted last). - 🏷️ Source attribution — every item carries the outlet name, the source domain, and the list of feeds (
sourceFeeds) that surfaced it.duplicateCountshows how many raw copies were merged. - 🌍 Localization — Google News
hl/glparams for language + country targeting (en-US, tr-TR, de-DE, fr-FR, …). - 📰 Company mode — pass a company name or domain to track brand news specifically.
- 📚 Bulk mode — many topics in one run, each producing its own merged feed, tagged with the originating
query. - 🌐 Proxy-aware — Apify datacenter proxy by default to avoid per-IP rate limits on news RSS endpoints.
- 💰 Pay-per-result — charged per saved news item, not per run. Empty results (no matches) are free.
🤖 Why AI agents need this
News is one of the highest-value grounding tasks for agentic systems. The reasons are simple: news is time-sensitive (yesterday's answer is wrong today), fragmented (no single source has everything), and noisy (the same story is republished dozens of times). An agent that browses one publisher gets a biased, partial view. An agent that hits a single news API gets rate-limited or charged per call. news-intelligence-scraper solves all three at once:
- Brand & reputation monitoring. A comms agent watches a company name across three feeds, deduplicates syndications, and surfaces the sentiment trend over 30 days.
- Market intelligence. An analyst agent queries a basket of 20 industry keywords weekly and builds a sentiment-weighted news index.
- Event grounding. A research agent answering "why did X stock move?" pulls this week's deduped news for the ticker's company, sorted by sentiment, and summarizes the negative cluster.
- Competitor tracking. A GTM agent monitors competitor names and surfaces only the genuinely new items (dedup kills the wire echo chamber).
- RAG freshness. A support/analyst agent embeds the latest N news items per topic into a vector store so its answers cite current events instead of stale training data.
- Crisis detection. A monitoring agent runs every hour on a watchlist and alerts when the negative-sentiment item count crosses a threshold.
Each of these is one Actor call (or a scheduled run). The output is a clean table of articles ready for an LLM to read, summarize, or cite.
📦 What you get (output schema)
Every run streams one news article per row to the default dataset. An item looks like:
{"query": "openai","title": "OpenAI announces new reasoning model","url": "https://techcrunch.com/2026/07/01/openai-...","snippet": "The company said the new model improves on... (first 500 chars)","source": "TechCrunch","sourceDomain": "techcrunch.com","sourceFeeds": ["googleNews", "bingNews"],"publishedAt": "Tue, 01 Jul 2026 14:30:00 GMT","publishedDate": "2026-07-01","language": "en-US","sentimentScore": 0.42,"sentimentLabel": "positive","duplicateCount": 3,"scrapedAt": "2026-07-02T12:00:00.000Z"}
Use the Overview view to scan all items newest-first with sentiment, or the By query view to pivot on the originating topic.
🚀 How to use
1. Aggregate news for one topic
{"mode": "topic","query": "openai","sources": ["googleNews", "bingNews", "duckduckgoNews"],"maxPerSource": 50,"maxResults": 100,"daysBack": 7,"sentiment": true}
2. Track a company's news
{"mode": "company","query": "stripe.com","daysBack": 30,"maxResults": 200}
3. Bulk: many topics in one run
{"mode": "bulk","queries": ["openai", "anthropic", "mistral ai", "electric vehicles", "AI regulation"],"daysBack": 7,"maxResults": 40}
From code (Apify SDK)
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: process.env.APIFY_TOKEN });const run = await client.actor('logiover/news-intelligence-scraper').call({mode: 'topic',query: 'openai',daysBack: 7,sentiment: true,});const { items } = await client.dataset(run.defaultDatasetId).listItems();const positive = items.filter(i => i.sentimentLabel === 'positive');console.log(`${positive.length} positive items of ${items.length}`);
As an MCP tool for AI agents
Wrap this Actor in an MCP server. An agent calls the tool with a topic and receives a clean, deduplicated, sentiment-tagged news feed in its context — no browsing, no HTML parsing, no per-source API juggling on the agent side.
🔧 Input fields
| Field | Type | Default | Description |
|---|---|---|---|
mode | enum | topic | topic (one topic), company (company news), bulk (many topics). |
query | string | — | Topic/keyword/company for topic & company modes. Quoted phrases respected. |
queries | array | — | Topics for bulk mode. |
sources | array | all | Which feeds to aggregate: googleNews, bingNews, duckduckgoNews. |
maxPerSource | int | 50 | Cap pulled from each source per query (1–200). |
maxResults | int | 200 | Final cap on deduplicated items saved per query (1–2000). |
daysBack | int | 30 | Keep only items within N days. 0 = no filter (0–365). |
language | string | en-US | Google News hl (e.g. tr-TR, de-DE). |
country | string | US | Google News gl (e.g. GB, DE). |
dedup | bool | true | Merge near-duplicates across sources (URL + title similarity). |
sentiment | bool | true | Run lexicon sentiment on title+snippet. |
useApifyProxy | bool | true | Route through Apify datacenter proxy. |
🧩 How it works
- Build feed URLs. For the query, construct the RSS URL for each enabled source: Google News (
/rss/search?q=…&hl=…&gl=…&ceid=…), Bing News (/news/search?q=…&format=rss), DuckDuckGo News (/?q=…&iar=news&format=rss). - Fetch in parallel. All sources for one query are fetched concurrently over the Apify proxy with a browser-like User-Agent and a retry/backoff policy for transient errors.
- Parse RSS. A source-agnostic regex parser extracts
<item>blocks and readstitle,link,description,pubDate/dc:date, and<source>(name + url). HTML entities and tags are stripped. - Normalize. Each item is mapped to a flat record with
title,url,snippet,source(outlet name),sourceDomain,sourceFeed,publishedAt. - Exact dedup. Items are keyed by
sourceDomain + normalized-title-prefix. Collisions merge:duplicateCountincrements,sourceFeedsunions, the richer snippet/earliest date wins. - Fuzzy dedup. Across different domains, a token-Jaccard similarity on normalized titles (>0.72) collapses syndicated/wire copies (e.g. the same AP story on 12 outlets) into one row.
- Time filter. If
daysBack > 0, items with a parseable date older than the cutoff are dropped; items without a date are kept (sorted last). - Sort. Newest-first by
publishedAt. - Sentiment. The title + snippet are tokenized; each token is looked up in the lexicon (with negation handling), and the score is normalized to
-1..+1and bucketed intopositive/negative/neutral. - Stream. Each item is pushed to the dataset and one
resultevent is charged.
💡 Tips & best practices
- Use all three sources for coverage. Google News is broadest; Bing and DuckDuckGo catch outlets Google deprioritizes. The dedup step makes more sources strictly better (up to your
maxResults). - Set
daysBackfor freshness. For dashboards, 7 days; for trend reports, 30; for historical deep-dives, raisemaxResultsand widen the window. - Bulk mode for watchlists. Pass 20–50 topics and let the Actor loop. Each topic's items are tagged with
queryso you can pivot downstream. - Sentiment is a signal, not a verdict. Lexicon sentiment is fast and cheap but misses sarcasm and context. Use it to rank and filter, not to make final judgments — let the LLM read the items for nuance.
- Localize for non-English markets. Set
language: "de-DE",country: "DE"for German news;tr-TR/TRfor Turkish, etc. The sentiment lexicon is English-centric, so consider disablingsentimentfor non-English or treating labels as approximate. - Schedule recurring runs. News changes hourly. Schedule a run every few hours over your watchlist and diff datasets to detect new items.
- Combine with related Actors. Pair with
company-deep-research-scraper(for company context),discussion-intelligence-scraper(for social/forum opinion), andbulk-rss-feed-reader(for direct publisher feeds).
❓ FAQ
Does this Actor need any API keys?
No. It reads public RSS feeds from Google News, Bing News and DuckDuckGo News. Just an Apify account.
Why three sources instead of just Google News?
Single-source news is biased and incomplete. Different aggregators surface different outlets and different rankings. Merging three and deduplicating gives broader coverage and a duplicateCount signal (how widely a story was syndicated) that's itself useful.
How does deduplication work?
Two stages: (1) exact key dedup on source-domain + normalized-title-prefix catches the same article republished; (2) fuzzy token-Jaccard title similarity (>0.72) catches wire/syndicated stories phrased slightly differently across outlets. duplicateCount records how many raw copies merged into the saved row.
Is the sentiment accurate?
It's a fast lexicon model (~250 weighted terms + negation), not a transformer. It's good for trends and ranking (e.g. "show me the most negative items"), less reliable on sarcasm or domain-specific jargon. For production-grade sentiment, post-process the items with an LLM.
How far back can I get news?
The RSS feeds return recent items (typically the last few days to weeks depending on the source and query volume). daysBack filters within that window. For months/years of history, combine with wayback-machine-url-extractor or a dedicated archive Actor.
Why do some items have no publishedDate?
Some feeds omit <pubDate>. Those items are kept (they may still be relevant) but sorted last. The publishedAt raw string is always preserved when available.
Can I get the full article text?
This Actor returns title + snippet (the RSS <description>). For full article bodies, pass the url field into a content extractor like website-text-markdown-crawler.
How is this priced?
Pay-per-result: one result event per saved (deduplicated) news item. Runs that yield zero items (no matches) are free.
Will I get rate-limited?
The Actor uses the Apify datacenter proxy and polite delays. News RSS endpoints are lenient. For very high-frequency runs, lower maxPerSource and increase the delay between bulk queries.
Can AI agents call this directly?
Yes. Expose it through an MCP server or Apify tool integration; the agent passes a topic and gets a clean JSON news feed back. This is the primary design target.
🔗 Related Actors
- company-deep-research-scraper — company dossier (tech stack, socials, contacts) for context.
- discussion-intelligence-scraper — Reddit + Hacker News + Product Hunt + Stack Exchange opinion.
- bulk-rss-feed-reader — read specific publisher RSS feeds directly.
- substack-newsletter-scraper — Substack newsletter posts.
- google-news-scraper — single-source Google News.
- website-text-markdown-crawler — extract full article body from a news URL.
- hacker-news-search-scraper — HN-specific search.
- reddit-subreddit-scraper / reddit-search-scraper — Reddit-specific.
📝 Changelog
2026-07-02 — v1.0
- Initial release.
- 3 modes:
topic,company,bulk. - 3 sources: Google News, Bing News, DuckDuckGo News (any subset).
- Two-stage dedup (exact key + fuzzy title Jaccard).
- Lexicon sentiment (-1..+1, positive/negative/neutral).
- Time filtering (
daysBack), localization (hl/gl). - Apify datacenter proxy default.
- Pay-per-result (
resultevent per saved item).
⚖️ Disclaimer
This Actor reads publicly available RSS feeds. It does not authenticate, bypass access controls, or scrape behind paywalls. News content is owned by the respective publishers; respect their Terms of Service. Use for monitoring, research and AI-agent grounding on data that is already public.