Agent Ready Data Cleaner
Pricing
from $0.10 / actor start
Go to Apify Store

Agent Ready Data Cleaner
Clean and token-optimise HTML, JSON, scraped text, or URLs for LLM pipelines. Strip boilerplate, chunk by semantics, get token counts — feed your agents clean data, not nav bars.
Pricing
from $0.10 / actor start
Rating
0.0
(0)
Developer
Les
Maintained by Community
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a month ago
Last modified
Categories
Share
Agent-Ready Data Cleaner
Apify Actor that transforms noisy URL/HTML/JSON/text inputs into clean, token-optimized chunks for LLM pipelines.
Features
- URL input fetch (10s timeout, redirects followed) then clean as HTML
- HTML cleanup with configurable boilerplate removal
- JSON flattening with null/empty removal
- Text sanitization (control chars + blank-line dedupe)
- Semantic/fixed/none chunking
- Token counting via
gpt-tokenizer(per chunk + total) - Metadata extraction + cleanliness scoring
Input
See INPUT_SCHEMA.json.
Output
One dataset item per input record with:
chunks[]withtokenCountandchunkTypetotalTokens- size/compression stats
- optional
metadata(title,description,canonicalUrl,wordCount,cleanlinessScore)
Pricing hint
- $0.10 per start
- $0.003 per result item
Local test
npm installnode test-local.js