Webpage Text Extractor (Readability) avatar

Webpage Text Extractor (Readability)

Pricing

$30.00 / 1,000 page extracteds

Go to Apify Store
Webpage Text Extractor (Readability)

Webpage Text Extractor (Readability)

Extract the clean main article content of any URL as plain text and markdown — strips nav, ads, footers. The reader API agents need for RAG.

Pricing

$30.00 / 1,000 page extracteds

Rating

0.0

(0)

Developer

Anthony Snider

Anthony Snider

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Turn any article URL into clean main-content text and markdown — nav, ads, sidebars, and footers stripped. The reader your AI agent needs for RAG, summarization, and content pipelines. No API key, pay per page.

▶ Live on the Apify Store — run it instantly, or call it as an agent tool via Apify MCP.

Why

LLM agents waste tokens on boilerplate. This returns just the readable article — as portable markdown (absolute links/images) and plain text — plus word count, reading time, and a readability score.

What you get (per page)

  • markdown — clean GitHub-flavored markdown of the main content
  • text — plain readable text
  • title, byline, publishedAt, lang, excerpt
  • wordCount, readingTimeMin, fleschReadingEase

Input

{ "url": "https://example.com/some-article", "outputFormat": "both" }

or bulk:

{ "urls": ["https://a.com/post", "https://b.com/post"], "maxUrls": 25 }

Output

{
"url": "https://example.com/some-article",
"title": "How web scraping works",
"byline": "Jane Doe",
"lang": "en",
"markdown": "# How web scraping works\n\nWeb scraping is ...",
"text": "How web scraping works. Web scraping is ...",
"wordCount": 1240,
"readingTimeMin": 6,
"fleschReadingEase": 58.2
}

Notes

Uses a readability heuristic (semantic containers + text-density scoring) — works on most articles and blogs without a headless browser, so it's fast and cheap. Returns only the public content of the URL you provide.