Pricing

from $10.00 / 1,000 successful research extractions

Try for free

Go to Apify Store

Universal News Article Intelligence Agent

Try for free

High-fidelity news normalization for AI & Agentic RAG. Extract clean Markdown, full-text, and metadata from premium domains (Bloomberg, Wall Street Journal, Financial Times, New York Times, Washington Post, etc.). Success-only billing, only pay when full-text is verified.

Pricing

from $10.00 / 1,000 successful research extractions

Rating

5.0

(11)

Developer

WorkHard3000

Actor stats

Bookmarked

Total users

Monthly active users

2.1 days

Issues response

14 days ago

Last modified

Universal News Article Intelligence Agent — High-Fidelity RAG Content Connector

Retrieve structured metadata and normalized full-text content from high-complexity global news domains. Optimized for LLMs, Agentic RAG, market research pipelines, and automated intelligence.

What does this Agent do?

This Actor is a professional-grade Content Normalization Agent designed to bridge the gap between complex web architectures and AI systems. It transforms unstructured data from premium financial and global news domains into clean, standardized Markdown, ready for immediate use in RAG (Retrieval-Augmented Generation) pipelines and LLMs.

Using a proprietary multi-step extraction engine, this Agent ensures that you receive the full research-grade text required for deep analysis, rather than the truncated snippets or "Subscription Required" notices returned by standard scrapers.

Input: A list of article URLs (one or many). Output: Structured JSON with title, author, date, full text, cleaned Markdown, and high-resolution metadata.

Success-Only Pricing (Verified Research Extraction)

We operate on a Quality-First billing model. You are only billed when we successfully deliver research-ready data. Higher Apify subscription plans unlock progressively lower per-extraction rates.

Scenario	FREE	BRONZE	SILVER	GOLD
Verified Research Extraction (Full text, 500+ chars)	$0.025	$0.020	$0.015	$0.010
Per 1,000 extractions	$25.00	$20.00	$15.00	$10.00
Incomplete Retrieval (blocked or snippet)	$0.00	$0.00	$0.00	$0.00
Insufficient Content (Under 500 characters)	$0.00	$0.00	$0.00	$0.00

Discount tiers are determined by your Apify subscription plan: Free ($0/mo), Starter/BRONZE ($29/mo), Scale/SILVER ($199/mo), Business/GOLD ($999/mo).

The Math of Value: Standard "pay-per-result" tools charge their full markup for every item, even if it's a 403 error or a paywall snippet. With this Agent, if the extraction is not successful, you pay $0.00 in Actor fees, incurring only the nominal Apify platform usage cost for the compute time (typically less than a penny).

Strategic Capabilities

High-Fidelity Content Retrieval: Optimized for high-complexity research domains (Bloomberg, WSJ, Financial Times, The Economist, NYT, and more).
AI-Ready Markdown: Automatically normalizes content by removing non-essential elements (ads, nav-bars, scripts), reducing LLM token consumption by up to 80%.
Market Intelligence Ready: Parses structured metadata (Byline, ISO Date, Featured Images) for immediate database ingestion.
Real-Time Stream Support: Results are pushed to the dataset as they complete, making it ideal for 24/7 monitoring pipelines.
Automated Resilience: Advanced internal logic handles difficult-to-render architectures to ensure consistent delivery.

Enterprise Use Cases

Financial Intelligence & Quantitative Analysis

Feed high-fidelity market news directly into sentiment models or trading algorithms. Monitor global financial publications with zero maintenance overhead.

RAG & Knowledge Base Construction

Build a high-quality "News Memory" for AI Agents. Our clean Markdown output ensures your vector database contains only the core analysis, saving costs and improving accuracy.

Competitive Intelligence

Track industry shifts across multiple premium publications with a single API key. Standardize all sources into one unified JSON schema for cross-platform comparison.

What data can you extract?

Field	Description	Example
`url`	Original article URL	`https://www.bloomberg.com/news/articles/...`
`title`	Article headline	`"What to Watch as China's Leaders Hash Out Plan"`
`domain`	Source domain	`bloomberg.com`
`byline`	Author name(s)	`"Jennifer Schuessler"`
`publishedDate`	ISO 8601 publication date	`"2026-03-07T10:03:00.000Z"`
`text`	Full article as clean plain text	`"The National Endowment for the Humanities..."`
`markdown`	Full article as Markdown	`"# Article Title\n\nFull text here..."`
`excerpt`	Article summary/description	`"The agency used AI to flag grants..."`
`image`	Featured/OG image URL	`"https://static01.nyt.com/images/..."`
`siteName`	Publication name	`"bloomberg.com"`
`elapsedMs`	Extraction time in milliseconds	`5090`

Verified High-Complexity Research Domains

This Agent features specialized extraction logic for the following global institutions (tested March 2026). It also supports hundreds of additional news domains via its universal normalization engine.

Financial & Market Intelligence: Bloomberg, Wall Street Journal (WSJ), Financial Times (FT), Australian Financial Review, Handelsblatt.

Global Policy & Analysis: The Economist, New York Times, Washington Post, Foreign Affairs, Politico, The Hill.

Innovation & Strategy: Wired, MIT Technology Review, Harvard Business Review, Fortune, Time.

International Perspectives: Le Monde, Der Spiegel, Nikkei Asia, South China Morning Post, Japan Times, The Straits Times, El Pais, Corriere della Sera, Haaretz, Irish Times.

Commonwealth & UK: The Telegraph, The Times, The Guardian, The Independent, New Statesman, The Australian, Globe and Mail.

US Regional & Culture: Los Angeles Times, Chicago Tribune, Boston Globe, SF Chronicle, Seattle Times, The Atlantic, The New Yorker, Vanity Fair, Business Insider, Salon, Slate, The Daily Beast.

How to Use

Input URLs: Paste your target research links into the articleUrls field.
Execute: Click Start. The Agent will begin high-fidelity extraction.
Export: Download your data in JSON, CSV, or feed it via Webhook to your AI pipeline.

API Implementation

curl -X POST "https://api.apify.com/v2/acts/workhard3000~news-intelligence-rag-extractor/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"articleUrls": ["https://www.bloomberg.com/news/articles/..."]}'

Output Example

{
  "url": "https://www.bloomberg.com/news/articles/2026-03-03/what-to-watch-as-china-s-leaders-hash-out-plan-for-economic-path",
  "domain": "bloomberg.com",
  "title": "What to Watch as China's Leaders Hash Out Plan for Economic Path",
  "text": "China's annual legislative meetings are set to kick off this week...",
  "markdown": "# What to Watch as China's Leaders Hash Out Plan\n\nChina's annual legislative meetings...",
  "excerpt": "The National People's Congress opens amid uncertainty over trade tensions.",
  "byline": "Bloomberg News",
  "publishedDate": "2026-03-03T08:00:00.000Z",
  "image": "https://assets.bwbx.io/images/...",
  "siteName": "bloomberg.com",
  "extractedAt": "2026-03-08T15:30:00.000Z",
  "elapsedMs": 3973
}

Input Parameters

Parameter	Type	Default	Description
`articleUrls`	Array of strings	required	List of article URLs to extract
`autoArchive`	Boolean	`true`	Try web archives as a last resort if direct extraction fails
`maxRetries`	Integer	`3`	Number of retry attempts per URL (1–10)
`proxyConfiguration`	Object	Residential	Proxy settings — residential proxies are used by default

Integrations

Results are available via the Apify API and can be connected to:

Webhooks — trigger downstream processing when a run completes
Google Sheets — export results directly to a spreadsheet
Slack / Email — get notifications with extracted article summaries
Zapier / Make — connect to 5,000+ apps
Amazon S3 / Google Cloud Storage — store results in your cloud bucket
Custom API — fetch results programmatically via the Apify dataset API

Compliance & Legal Disclaimer

Research Intent: This tool is a technical instrument intended for authorized academic research, internal data analysis, and interoperability testing between web formats and AI systems.
Content Neutrality: This Actor does not host, cache, or redistribute copyrighted material. It acts as a format converter (HTML to Markdown) to facilitate data portability for research environments.
User Responsibility: Users are solely responsible for ensuring their data acquisition complies with the source's Terms of Service and local laws. Use of this tool constitutes agreement that the developer is not liable for any third-party misuse.

FAQ

How does the tiered pricing work?

High-complexity domains like Bloomberg and the Financial Times require significant compute resources to normalize into clean Markdown. We only charge when the full text is successfully retrieved, ensuring you never pay for an incomplete or blocked request. Your per-extraction rate is determined by your Apify subscription plan: FREE ($0.025), BRONZE ($0.020), SILVER ($0.015), or GOLD ($0.010). Higher plans unlock up to 60% savings.

What if the content cannot be retrieved?

If the Agent encounters a page it cannot normalize to our quality standards, it returns an error field and you are not charged. You only pay for successful, full-text delivery.

Is this safe for real-time monitoring?

Yes. Since there is no "Base Fee," you can schedule this Actor to check for new links frequently. You will only be billed when the Agent successfully delivers a new, full-text article.

Can I extract articles in languages other than English?

Yes. The Agent successfully normalizes French (Le Monde), German (Der Spiegel, Handelsblatt), Japanese (Nikkei Asia, Japan Times), Spanish (El Pais), Italian (Corriere della Sera), Hebrew (Haaretz), and Chinese (SCMP) content. The extraction engine is language-agnostic.

How fast is the extraction?

Most articles are extracted in 2–8 seconds. Some sites with aggressive protection may take 15–40 seconds due to retry logic. The elapsedMs field in the output tells you exactly how long each article took.

Fast Website Content Crawler

6sigmag/fast-website-content-crawler

A high-performance web scraper that rapidly extracts and analyzes content from multiple websites simultaneously. Perfect for competitive research, content aggregation, and website structure analysis.

David

4.2K

4.9

Website Content to Markdown for LLM Training

easyapi/website-content-to-markdown-for-llm-training

🚀 Transform web content into clean, LLM-ready Markdown! 📘 Scrape multiple pages, extract main content, and convert to Markdown format. Perfect for AI researchers, data scientists, and LLM developers. Fast, efficient, and customizable. Supercharge your AI training data today! 🌐📝🧠

EasyApi

328

5.0

🔥 FireScrape AI Website Content Markdown Scraper

mohamedgb00714/fireScraper-AI-Website-Content-Markdown-Scraper

Advanced web scraper powered by Crawlee and Puppeteer — extracts website content, converts it to Markdown, and structures it for LLM training datasets.

mohamed el hadi msaid

304

1.9

Ranked Keywords Checker - Any Domain's Google Keywords

santhej/ranked-keywords-checker

See every keyword a domain ranks for on Google: position, search volume, CPC, competition, traffic estimate & the exact ranking URL. Spy on competitors or audit your own site. 190+ countries. Clean JSON/CSV. A cheap Ahrefs/Semrush alternative. No API keys.

Santhej Kallada

5.0

Competitor Keyword Research - See Any Site's Ranked Keywords

doesaiknow/doesaiknow-competitor-keywords-apify

See every keyword any competitor ranks for in Google - volume, difficulty, intent & traffic value - plus the keyword gap vs your own domain. Live native SEO data, no scraping, no broken tools. Semrush & Ahrefs alternative - pay per keyword, no subscription.

Dawid S

Page Scraping Analyzer

apify/page-analyzer

Performs analysis of a webpage to figure out the best way how to scrape its data. Provide a URL and data points to find and get back a detailed dashboard showing how the data can be scraped. Works with initial and rendered HTML, JavaScript variables and dynamically loaded data.

Apify

1.3K

4.7

Google Finance Scraper

scrapapi/google-finance-scraper

Google Finance Scraper extracts financial data from Google Finance. Collect stock prices, company details, market trends, historical data, and financial metrics. Ideal for market research, investment analysis, financial dashboards, and automated stock tracking.

ScrapAPI

Google Finance

canadesk/google-finance

Get the latest Quotes and Historical data for Stocks, Indexes, Crypto, Exchange rates and more from Google Finance. It's fast and costs little!

Canadesk Support

Advanced Finviz Scraper

saswave/advanced-finviz-scraper

finviz.com website scraper. Collect data from public listed companies. Extract data at scale for easier analysis and insider trade / news monitoring. For new feature, contact us or create an issue