📰 Article Extractor avatar

📰 Article Extractor

Pricing

Pay per event

Go to Apify Store
📰 Article Extractor

📰 Article Extractor

Extract clean article content with title, author, publish date, images from news and blog pages. Output as text or markdown. Great for media monitoring, content aggregation, AI pipelines.

Pricing

Pay per event

Rating

0.0

(0)

Developer

太郎 山田

太郎 山田

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

2

Monthly active users

3 days ago

Last modified

Categories

Share

Extract clean article content with title, author, publish date, images from news and blog pages. Great for media monitoring and AI pipelines.

Store Quickstart

Start with the Quickstart template (3 demo articles). For media monitoring, use News Monitoring with 100 article URLs.

Key Features

  • 📰 Article-optimized extraction — Tuned for news, blog, magazine articles
  • 👤 Author detection — From byline, JSON-LD, OpenGraph
  • 📅 Publish date extraction — From multiple signals (datePublished, time tags, meta)
  • 🖼️ Image extraction — Main article image + inline content images
  • 🌍 Language detection — Per-article language identification
  • 📊 Word count — Reading-time estimation data

Use Cases

WhoWhy
Media monitoring firmsDaily bulk extraction of news articles for PR tracking
Content aggregatorsBuild news reader apps with clean content
NLP researchersNews corpora for sentiment/topic analysis
PR agenciesTrack client mentions with extracted article context
Financial analystsGather news articles about tracked companies

Input

FieldTypeDefaultDescription
urlsstring[](required)Article URLs (max 300)
includeImagesbooleantrueExtract article images
outputFormatstringtexttext or markdown

Input Example

{
"urls": ["https://news.example.com/story-1", "https://blog.example.com/post"],
"includeImages": true,
"outputFormat": "markdown"
}

Output Example

{
"url": "https://news.example.com/story-1",
"title": "Breaking: Major Announcement",
"author": "John Smith",
"publishedDate": "2026-04-05T10:30:00Z",
"content": "TOKYO — In a surprising move...",
"images": ["https://news.example.com/hero.jpg"],
"siteName": "Example News",
"language": "en",
"wordCount": 850
}

FAQ

What article formats work best?

Standard news/blog CMSs with JSON-LD or OpenGraph metadata work best. Custom layouts may need fine-tuning.

Does it remove paywalls?

No. Paywalled content returns partial or teaser text only.

Can it handle non-English articles?

Yes. Content extraction is language-agnostic; author/date detection works across locales.

How is this different from website-content-extractor?

This is tuned specifically for news/blog articles with richer metadata (author, pubDate). Use this for articles; website-content-extractor for general pages.

News & Content cluster — explore related Apify tools:

Cost

Pay Per Event:

  • actor-start: $0.01 (flat fee per run)
  • dataset-item: $0.005 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.005) = $5.01

No subscription required — you only pay for what you use.