Universal News Article Intelligence Agent avatar

Universal News Article Intelligence Agent

Pricing

$25.00 / 1,000 successful research extractions

Go to Apify Store
Universal News Article Intelligence Agent

Universal News Article Intelligence Agent

High-fidelity news normalization for AI & Agentic RAG. Extract clean Markdown, full-text, and metadata from premium domains (Bloomberg, Wall Street Journal, Financial Times, New York Times, Washington Post, etc.). Success-only billing, only pay when full-text is verified.

Pricing

$25.00 / 1,000 successful research extractions

Rating

5.0

(11)

Developer

WorkHard3000

WorkHard3000

Maintained by Community

Actor stats

11

Bookmarked

14

Total users

6

Monthly active users

2 days ago

Last modified

Categories

Share

Universal News Article Intelligence Agent — High-Fidelity RAG Content Connector

Retrieve structured metadata and normalized full-text content from high-complexity global news domains. Optimized for LLMs, Agentic RAG, market research pipelines, and automated intelligence.

What does this Agent do?

This Actor is a professional-grade Content Normalization Agent designed to bridge the gap between complex web architectures and AI systems. It transforms unstructured data from premium financial and global news domains into clean, standardized Markdown, ready for immediate use in RAG (Retrieval-Augmented Generation) pipelines and LLMs.

Using a proprietary multi-step extraction engine, this Agent ensures that you receive the full research-grade text required for deep analysis, rather than the truncated snippets or "Subscription Required" notices returned by standard scrapers.

Input: A list of article URLs (one or many). Output: Structured JSON with title, author, date, full text, cleaned Markdown, and high-resolution metadata.


Success-Only Pricing (Verified Research Extraction)

We operate on a Quality-First billing model. You are only billed when we successfully deliver research-ready data.

ScenarioActor Fee
Verified Research Extraction (Full text, 500+ characters)$0.025
Incomplete Retrieval (Formatting error, blocked, or snippet)$0.00
Insufficient Content (Under 500 characters)$0.00

The Math of Value: Standard "pay-per-result" tools charge their full markup for every item, even if it's a 403 error or a paywall snippet. With this Agent, if the extraction is not successful, you pay $0.00 in Actor fees, incurring only the nominal Apify platform usage cost for the compute time (typically less than a penny).


Strategic Capabilities

  • High-Fidelity Content Retrieval: Optimized for high-complexity research domains (Bloomberg, WSJ, Financial Times, The Economist, NYT, and more).
  • AI-Ready Markdown: Automatically normalizes content by removing non-essential elements (ads, nav-bars, scripts), reducing LLM token consumption by up to 80%.
  • Market Intelligence Ready: Parses structured metadata (Byline, ISO Date, Featured Images) for immediate database ingestion.
  • Real-Time Stream Support: Results are pushed to the dataset as they complete, making it ideal for 24/7 monitoring pipelines.
  • Automated Resilience: Advanced internal logic handles difficult-to-render architectures to ensure consistent delivery.

Enterprise Use Cases

Financial Intelligence & Quantitative Analysis

Feed high-fidelity market news directly into sentiment models or trading algorithms. Monitor global financial publications with zero maintenance overhead.

RAG & Knowledge Base Construction

Build a high-quality "News Memory" for AI Agents. Our clean Markdown output ensures your vector database contains only the core analysis, saving costs and improving accuracy.

Competitive Intelligence

Track industry shifts across multiple premium publications with a single API key. Standardize all sources into one unified JSON schema for cross-platform comparison.


What data can you extract?

FieldDescriptionExample
urlOriginal article URLhttps://www.bloomberg.com/news/articles/...
titleArticle headline"What to Watch as China's Leaders Hash Out Plan"
domainSource domainbloomberg.com
bylineAuthor name(s)"Jennifer Schuessler"
publishedDateISO 8601 publication date"2026-03-07T10:03:00.000Z"
textFull article as clean plain text"The National Endowment for the Humanities..."
markdownFull article as Markdown"# Article Title\n\nFull text here..."
excerptArticle summary/description"The agency used AI to flag grants..."
imageFeatured/OG image URL"https://static01.nyt.com/images/..."
siteNamePublication name"bloomberg.com"
elapsedMsExtraction time in milliseconds5090

Verified High-Complexity Research Domains

This Agent features specialized extraction logic for the following global institutions (tested March 2026). It also supports hundreds of additional news domains via its universal normalization engine.

Financial & Market Intelligence: Bloomberg, Wall Street Journal (WSJ), Financial Times (FT), Australian Financial Review, Handelsblatt.

Global Policy & Analysis: The Economist, New York Times, Washington Post, Foreign Affairs, Politico, The Hill.

Innovation & Strategy: Wired, MIT Technology Review, Harvard Business Review, Fortune, Time.

International Perspectives: Le Monde, Der Spiegel, Nikkei Asia, South China Morning Post, Japan Times, The Straits Times, El Pais, Corriere della Sera, Haaretz, Irish Times.

Commonwealth & UK: The Telegraph, The Times, The Guardian, The Independent, New Statesman, The Australian, Globe and Mail.

US Regional & Culture: Los Angeles Times, Chicago Tribune, Boston Globe, SF Chronicle, Seattle Times, The Atlantic, The New Yorker, Vanity Fair, Business Insider, Salon, Slate, The Daily Beast.


How to Use

  1. Input URLs: Paste your target research links into the articleUrls field.
  2. Execute: Click Start. The Agent will begin high-fidelity extraction.
  3. Export: Download your data in JSON, CSV, or feed it via Webhook to your AI pipeline.

API Implementation

curl -X POST "https://api.apify.com/v2/acts/workhard3000~news-intelligence-rag-extractor/runs?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"articleUrls": ["https://www.bloomberg.com/news/articles/..."]}'

Output Example

{
"url": "https://www.bloomberg.com/news/articles/2026-03-03/what-to-watch-as-china-s-leaders-hash-out-plan-for-economic-path",
"domain": "bloomberg.com",
"title": "What to Watch as China's Leaders Hash Out Plan for Economic Path",
"text": "China's annual legislative meetings are set to kick off this week...",
"markdown": "# What to Watch as China's Leaders Hash Out Plan\n\nChina's annual legislative meetings...",
"excerpt": "The National People's Congress opens amid uncertainty over trade tensions.",
"byline": "Bloomberg News",
"publishedDate": "2026-03-03T08:00:00.000Z",
"image": "https://assets.bwbx.io/images/...",
"siteName": "bloomberg.com",
"extractedAt": "2026-03-08T15:30:00.000Z",
"elapsedMs": 3973
}

Input Parameters

ParameterTypeDefaultDescription
articleUrlsArray of stringsrequiredList of article URLs to extract
autoArchiveBooleantrueTry web archives as a last resort if direct extraction fails
maxRetriesInteger3Number of retry attempts per URL (1–10)
proxyConfigurationObjectResidentialProxy settings — residential proxies are used by default

Integrations

Results are available via the Apify API and can be connected to:

  • Webhooks — trigger downstream processing when a run completes
  • Google Sheets — export results directly to a spreadsheet
  • Slack / Email — get notifications with extracted article summaries
  • Zapier / Make — connect to 5,000+ apps
  • Amazon S3 / Google Cloud Storage — store results in your cloud bucket
  • Custom API — fetch results programmatically via the Apify dataset API

  • Research Intent: This tool is a technical instrument intended for authorized academic research, internal data analysis, and interoperability testing between web formats and AI systems.
  • Content Neutrality: This Actor does not host, cache, or redistribute copyrighted material. It acts as a format converter (HTML to Markdown) to facilitate data portability for research environments.
  • User Responsibility: Users are solely responsible for ensuring their data acquisition complies with the source's Terms of Service and local laws. Use of this tool constitutes agreement that the developer is not liable for any third-party misuse.

FAQ

Why is the price $0.025 for these specific sites?

High-complexity domains like Bloomberg and the Financial Times require significant compute resources to normalize into clean Markdown. We only trigger this premium charge when the full text is successfully retrieved, ensuring you never pay for an incomplete or blocked request.

What if the content cannot be retrieved?

If the Agent encounters a page it cannot normalize to our quality standards, it returns an error field and you are not charged. You only pay for successful, full-text delivery.

Is this safe for real-time monitoring?

Yes. Since there is no "Base Fee," you can schedule this Actor to check for new links frequently. You will only be billed when the Agent successfully delivers a new, full-text article.

Can I extract articles in languages other than English?

Yes. The Agent successfully normalizes French (Le Monde), German (Der Spiegel, Handelsblatt), Japanese (Nikkei Asia, Japan Times), Spanish (El Pais), Italian (Corriere della Sera), Hebrew (Haaretz), and Chinese (SCMP) content. The extraction engine is language-agnostic.

How fast is the extraction?

Most articles are extracted in 2–8 seconds. Some sites with aggressive protection may take 15–40 seconds due to retry logic. The elapsedMs field in the output tells you exactly how long each article took.