Pricing

from $1.00 / 1,000 article extracteds

Article to Text Extractor (for TTS/LLMs)

Extract the core readable text of any article or blog post, stripping out boilerplate. Perfect for Text-to-Speech or AI summaries.

Pricing

from $1.00 / 1,000 article extracteds

Rating

0.0

(0)

Developer

Andok

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

Article Text Extractor for TTS & AI

Extract clean, readable article text from any web page, stripped of navigation, ads, and boilerplate. Feed the output directly into text-to-speech engines, summarization models, or LLM pipelines without wasting tokens on HTML noise. Bulk-process hundreds of URLs with parallel concurrency.

Features

Readability engine — uses Mozilla Readability to isolate the main article content from page clutter
Plain text output — returns clean text ready for TTS APIs like ElevenLabs or OpenAI TTS
Bulk processing — extract articles from hundreds of URLs in a single run
Metadata extraction — captures title, author byline, and excerpt alongside the article text
Redirect tracking — follows HTTP redirects and records the final URL
Configurable concurrency — process 1 to 50 URLs in parallel
Backwards compatible — accepts both urls array and single url field

Input

Field	Type	Required	Default	Description
`urls`	`array`	No	—	List of webpage URLs to extract article text from
`url`	`string`	No	—	Single URL for backwards compatibility (use `urls` for bulk)
`timeoutSeconds`	`integer`	No	`15`	Maximum seconds to wait for each URL response
`concurrency`	`integer`	No	`10`	Number of URLs to process in parallel (1-50)

Input Example

{
  "urls": [
    "https://crawlee.dev",
    "https://blog.apify.com/what-is-web-scraping/"
  ],
  "timeoutSeconds": 15,
  "concurrency": 10
}

Output

Each URL produces one dataset item containing the extracted plain text and metadata.

Key output fields:

inputUrl (string) — the original URL provided
finalUrl (string) — the URL after following redirects
status (number) — HTTP status code
pageTitle (string) — extracted article title
byline (string) — author name if available
excerpt (string) — short summary of the article
textContent (string) — the full article text, cleaned and ready for TTS or AI processing
error (string) — error message if extraction failed, otherwise null
checkedAt (string) — ISO 8601 timestamp of when the extraction was performed

Output Example

{
  "inputUrl": "https://crawlee.dev",
  "finalUrl": "https://crawlee.dev/",
  "status": 200,
  "pageTitle": "Crawlee - Build reliable crawlers. Fast.",
  "byline": null,
  "excerpt": "Crawlee is a web scraping and browser automation library for Node.js.",
  "textContent": "Crawlee\n\nBuild reliable crawlers. Fast.\n\nCrawlee is a web scraping and browser automation library that helps you build reliable crawlers...",
  "error": null,
  "checkedAt": "2025-01-15T10:30:00.000Z"
}

Pricing

Event	Cost
Article Extracted	Pay-per-event (see actor pricing page)

The actor respects the per-run max charge limit. Processing stops automatically when the spending cap is reached.

Use Cases

Podcast generation — turn blog posts and news articles into clean text payloads for TTS APIs
LLM summarization — feed distraction-free article text into GPT, Claude, or other models
Content monitoring — track article changes over time with clean text snapshots
Accessibility tools — extract readable text for screen readers and assistive technology
Newsletter curation — pull article text from multiple sources for digest generation

Actor	What it adds
Web Page to Markdown Converter for LLMs	Markdown-formatted output with heading structure preserved
PDF to Text Converter for AI & RAG	Extend text extraction to PDF documents
RSS Feed Parser & Reader	Discover article URLs automatically from RSS feeds

Smart Article & Blog Extractor

lightkong/universal-blog-scraper

Extract clean text, author, title, and reading time from any news, blog, or article webpage. Perfect for AI/LLM training and RAG systems.

Lightkong

Public Article Intelligence & Citation Extractor

jacksu/public-article-intelligence-agent

Extract clean article text, metadata, summaries, citations, diagnostics, and change signals from public article URLs.

jack su

Article Extraction API

tugelbay/article-extractor

Extract clean article text and metadata from URLs as Markdown, text, or HTML for RAG, AI agents, monitoring, and research. Guide: https://konabayev.com/tools/article-extractor/?utm_source=apify_info&utm_medium=referral&utm_campaign=article-extractor

Tugelbay Konabayev

Text Scraper (Free)

karamelo/text-scraper-free

Website Text Extractor. Extract Text from Webpages and Feed Your LLMs

karamelo

1.1K

4.1

Google Free Text to Speech

jupri/google-speech

Use free Google Text to Speech to translate text into voice

cat

300

Text to speech generator

akash9078/advanced-text-to-speech

Professional-grade Text-to-Speech (TTS) actor powered by advanced AI models. Convert any text into natural, human-like speech with 50+ premium voices across 9 languages. Perfect for content creation, accessibility, voiceovers, audiobooks, podcasts, and multilingual applications.

Akash Kumar Naik

Webpage Text Extractor â€” URL to Clean Text & Markdown

eliai/webpage-text-extractor

Pass article or page URLs; get back the clean, readable main text as Markdown or plain text, one result per URL â€” ads, navigation, and boilerplate stripped with Readability. Pay only per result extracted. Built for RAG pipelines, AI agents, and content workflows.

Anthony Snider

Web Article Extractor — Clean Reader Mode Text & Metadata

maged120/reader-mode

Extract clean, readable article content from any web page. Strips ads, navigation, and clutter — returns title, author, full body text, and publish date in structured JSON.

Maged

Speech To Text

vivid_astronaut/speech-to-text

Convert speech to text with high accuracy using Azure AI. Supports 100+ languages, speaker detection, and timestamps. Perfect for transcription, subtitles, and voice-to-text applications.