Article to Text Extractor (for TTS/LLMs)
Pricing
from $1.00 / 1,000 dataset items
Article to Text Extractor (for TTS/LLMs)
Extract the core readable text of any article or blog post, stripping out boilerplate. Perfect for Text-to-Speech or AI summaries.
Pricing
from $1.00 / 1,000 dataset items
Rating
0.0
(0)
Developer

Andok
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
8 hours ago
Last modified
Categories
Share
Text-to-Speech Page Reader (Bulk)
Extracts the core readable article text from a list of URLs, stripping out navigation, ads, and sidebars, preparing the content for TTS (Text-to-Speech) pipelines.
What it does
For each input URL, it downloads the HTML and uses Mozilla's Readability engine to:
- Extract the main article text (plain text).
- Extract the main title.
- Extract byline/author and excerpt.
Typical uses
- Podcast generation: turn blog posts and articles into clean text payloads for TTS APIs (like ElevenLabs or OpenAI TTS).
- Summarization: feed the clean text into an LLM without wasting tokens on HTML boilerplate.
Input
urls(required): list of URLs to check.timeoutSeconds(default15)concurrency(default10)
Output
Writes one dataset item per input URL containing the clean article text and metadata.
Monetization + safety
This actor is designed for Pay-Per-Event (dataset item = 1 unit of work) and respects the per-run max charge limit.