📰 Article Extractor
Pricing
Pay per event
📰 Article Extractor
Extract clean article content with title, author, publish date, images from news and blog pages. Output as text or markdown. Great for media monitoring, content aggregation, AI pipelines.
Extract clean article content with title, author, publish date, images from news and blog pages. Great for media monitoring and AI pipelines.
Store Quickstart
Start with the Quickstart template (3 demo articles). For media monitoring, use News Monitoring with 100 article URLs.
Key Features
- 📰 Article-optimized extraction — Tuned for news, blog, magazine articles
- 👤 Author detection — From byline, JSON-LD, OpenGraph
- 📅 Publish date extraction — From multiple signals (datePublished, time tags, meta)
- 🖼️ Image extraction — Main article image + inline content images
- 🌍 Language detection — Per-article language identification
- 📊 Word count — Reading-time estimation data
Use Cases
| Who | Why |
|---|---|
| Media monitoring firms | Daily bulk extraction of news articles for PR tracking |
| Content aggregators | Build news reader apps with clean content |
| NLP researchers | News corpora for sentiment/topic analysis |
| PR agencies | Track client mentions with extracted article context |
| Financial analysts | Gather news articles about tracked companies |
Input
| Field | Type | Default | Description |
|---|---|---|---|
| urls | string[] | (required) | Article URLs (max 300) |
| includeImages | boolean | true | Extract article images |
| outputFormat | string | text | text or markdown |
Input Example
{"urls": ["https://news.example.com/story-1", "https://blog.example.com/post"],"includeImages": true,"outputFormat": "markdown"}
Output Example
{"url": "https://news.example.com/story-1","title": "Breaking: Major Announcement","author": "John Smith","publishedDate": "2026-04-05T10:30:00Z","content": "TOKYO — In a surprising move...","images": ["https://news.example.com/hero.jpg"],"siteName": "Example News","language": "en","wordCount": 850}
FAQ
What article formats work best?
Standard news/blog CMSs with JSON-LD or OpenGraph metadata work best. Custom layouts may need fine-tuning.
Does it remove paywalls?
No. Paywalled content returns partial or teaser text only.
Can it handle non-English articles?
Yes. Content extraction is language-agnostic; author/date detection works across locales.
How is this different from website-content-extractor?
This is tuned specifically for news/blog articles with richer metadata (author, pubDate). Use this for articles; website-content-extractor for general pages.
Related Actors
News & Content cluster — explore related Apify tools:
- 📰 Google News Scraper — Scrape Google News articles for any search query via official RSS feed.
- 📄 Website Content Extractor — Extract clean main content from any webpage as text, markdown, or HTML.
- 📡 RSS Feed Aggregator — Aggregate multiple RSS and Atom feeds with keyword filtering and deduplication.
- 📰 Hacker News Scraper — Fetch Hacker News top, new, best, ask, show, job stories via official Firebase API.
Cost
Pay Per Event:
actor-start: $0.01 (flat fee per run)dataset-item: $0.005 per output item
Example: 1,000 items = $0.01 + (1,000 × $0.005) = $5.01
No subscription required — you only pay for what you use.