Pricing

from $8.00 / 1,000 results

Article Content Extractor & Reader Scraper

Article content extractor + reader scraper for news, blog, and press URLs. Returns article body, byline, publish date, excerpt, and hero image. Cookie banner / nav / share-button stripping is more aggressive than off-the-shelf readability libraries.

Pricing

from $8.00 / 1,000 results

Rating

0.0

(0)

Developer

naoki anzai

Actor stats

Bookmarked

Total users

Monthly active users

6 days ago

Last modified

Article Content Extractor

After this run

Turn this Actor's output into a capped paid report with Website RAG Readiness Audit Report. Use it when AI builders, documentation teams, support teams, and technical marketers need to decide whether public website pages are clean and complete enough for RAG ingestion.

First report: $9 / website_rag_snapshot_report; set maxChargeUsd to $9.
Deeper report: $29 / website_rag_readiness_report; use only when the first result needs competitor or action-depth.
This is an internal Apify flow aid. It is not revenue proof until accounted paid usage appears.

Content teams, researchers, SEO teams, and AI dataset builders use this actor to turn Public article URLs supplied by the user into a clean dataset for Site QA & Content Intelligence Pack. Provide focused source inputs, keep the first run small, and expand only after the output shape is useful. Each emitted row includes source context, timestamps, and fields designed for monitoring, QA, research, or workflow handoff.

Store Quickstart

Start with a small list of article URLs, review body extraction quality, then schedule recurring publisher checks.

Recommended first run:

{
  "urls": [
    "https://example.com/news/example"
  ],
  "includeImages": true,
  "limit": 10,
  "delivery": "dataset",
  "dryRun": false
}

Input examples

Article URLs

{
  "urls": [
    "https://example.com/news/example"
  ],
  "includeImages": true,
  "limit": 10,
  "delivery": "dataset",
  "dryRun": false
}

Press pages

{
  "urls": [
    "https://example.com/press/release"
  ],
  "includeImages": false,
  "limit": 10,
  "delivery": "dataset",
  "dryRun": false
}

Research webhook

{
  "urls": [
    "https://example.com/blog/post"
  ],
  "delivery": "webhook",
  "webhookUrl": "https://example.com/webhook",
  "dryRun": false
}

Sample output

{
  "meta": {
    "actorName": "article-content-extractor",
    "actorTitle": "Article Content Extractor",
    "bundle": "Site QA & Content Intelligence Pack",
    "fetchedAt": "2026-05-06T00:00:00.000Z",
    "totalRows": 1
  },
  "rows": [
    {
      "actorName": "article-content-extractor",
      "rowType": "article",
      "url": "https://example.com/news/example",
      "headline": "Example Headline",
      "author": "Example Author",
      "publishedAt": "2026-05-06",
      "articleText": "Example article body.",
      "sourceUrl": "https://example.com/news/example",
      "fetchedAt": "2026-05-06T00:00:00.000Z"
    }
  ],
  "warnings": []
}

Output fields

rowType
url
headline
author
publishedAt
articleText
excerpt
heroImage
sourceUrl

Rows also include source URLs, fetch timestamps, warnings when a source is partial, and stable IDs when the workflow supports recurring change detection.

Pricing and no-change runs

$0.001 actor start and $0.008 per useful article row. Failed/no-content rows should stay out of the default dataset.

The default dataset is the billable surface. Dry runs, validation-only runs, missing-key warnings, and unchanged recurring polls should not write payable default-dataset rows.

Compliance guardrails

Fetch public article pages supplied by the user.
Do not imply content ownership transfer or publisher endorsement.
Use output for research, QA, and internal workflows.
Do not use provider emblems or wording that implies approval by an upstream data provider.

⭐ Was Article Content Extractor & Reader Scraper useful for your article body extraction?

If this actor saved you time, please leave a 5★ rating on Apify Store — it takes 10 seconds, helps other engineers and analysts discover it, and keeps updates free.

Have a feature request, bug, or sample workflow you'd like to share? Open an issue — we read every one and use them to prioritise the next release.

Web Article Extractor — Clean Reader Mode Text & Metadata

maged120/reader-mode

Extract clean, readable article content from any web page. Strips ads, navigation, and clutter — returns title, author, full body text, and publish date in structured JSON.

Maged

Article Content Extractor 📄

easyapi/article-content-extractor

Extract clean article content, metadata and structured information from any web page. Supports multiple URLs and returns well-formatted JSON with title, description, content, author, publish date and more. 🔍📄

EasyApi

124

News Article Scraper — Newsroom & Press Release Extractor

scrapepilot/company-ok

Scrape full article content from any newsroom, press release page, or blog. Get title, author, publish date, summary, SEO keywords, word count, and full body text. Auto-discovers article links. Checkpoint resume. $5 per 1,000 articles

Scrape Pilot

Smart Article Extractor

datapilot/smart-article-extractor

News Article Extractor Actor fetches article URLs and extracts structured content using Requests, , and Newspaper3k. It collects title, author, publish date, text, summary, keywords, images, and word count. Supports proxy use and outputs clean JSON results.

Data Pilot

Google News Article Scraper

webscrap18/google-news-article-scraper

Scrape Google News, Extract full content with Title, Article Text, Images and Structured data.

WebScrap

News Article Scraper for Feeding LLM

proscraper/newsarticlescraper

Scrape news articles metadata to feed into LLM models. Returns article body, published date, article title, author etc.

Owais Nazir

174

News Article Extractor for AI & RAG

wiry_kingdom/news-article-extractor-ai

Extract clean, structured JSON from any news article or blog post - title, authors, published date, full content, keywords, images. Perfect for LLM training data, RAG pipelines, content monitoring and news aggregation. Uses JSON-LD, Open Graph and readability heuristics.

Mohieldin Mohamed

Smart Article Extractor

parseforge/article-extractor

Extract clean article content from any news, blog, or publisher site! Pull full body text, author, publish date, word count, language, reading time, images, and metadata at scale. Ideal for content research, media monitoring, SEO audits, and AI training. Start extracting articles in minutes!

ParseForge

Article Extraction API

tugelbay/article-extractor

Extract clean article text and metadata from URLs as Markdown, text, or HTML for RAG, AI agents, monitoring, and research. Guide: https://konabayev.com/tools/article-extractor/?utm_source=apify_info&utm_medium=referral&utm_campaign=article-extractor

Tugelbay Konabayev

Smart Article & Blog Extractor

lightkong/universal-blog-scraper

Extract clean text, author, title, and reading time from any news, blog, or article webpage. Perfect for AI/LLM training and RAG systems.

Lightkong

Article Content Extractor & Reader Scraper

Article Content Extractor

After this run

Store Quickstart

Input examples

Article URLs

Press pages

Research webhook

Sample output

Output fields

See also (Content extraction cluster)

Pricing and no-change runs

Compliance guardrails

See also

⭐ Was Article Content Extractor & Reader Scraper useful for your article body extraction?

Web Article Extractor — Clean Reader Mode Text & Metadata

Article Content Extractor 📄

News Article Scraper — Newsroom & Press Release Extractor

Smart Article Extractor

Google News Article Scraper

News Article Scraper for Feeding LLM

News Article Extractor for AI & RAG

Smart Article Extractor

Article Extraction API

Smart Article & Blog Extractor

Article Content Extractor & Reader Scraper

Article Content Extractor

After this run

Store Quickstart

Input examples

Article URLs

Press pages

Research webhook

Sample output

Output fields

See also (Content extraction cluster)

Pricing and no-change runs

Compliance guardrails

See also

Related report Actors

Related paid report workflows

⭐ Was Article Content Extractor & Reader Scraper useful for your article body extraction?

You might also like

Web Article Extractor — Clean Reader Mode Text & Metadata

Article Content Extractor 📄

News Article Scraper — Newsroom & Press Release Extractor

Smart Article Extractor

Google News Article Scraper

News Article Scraper for Feeding LLM

News Article Extractor for AI & RAG

Smart Article Extractor

Article Extraction API

Smart Article & Blog Extractor