Agent Ready Data Cleaner avatar

Agent Ready Data Cleaner

Pricing

from $0.10 / actor start

Go to Apify Store
Agent Ready Data Cleaner

Agent Ready Data Cleaner

Clean and token-optimise HTML, JSON, scraped text, or URLs for LLM pipelines. Strip boilerplate, chunk by semantics, get token counts — feed your agents clean data, not nav bars.

Pricing

from $0.10 / actor start

Rating

0.0

(0)

Developer

Les

Les

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a month ago

Last modified

Share

Agent-Ready Data Cleaner

Apify Actor that transforms noisy URL/HTML/JSON/text inputs into clean, token-optimized chunks for LLM pipelines.

Features

  • URL input fetch (10s timeout, redirects followed) then clean as HTML
  • HTML cleanup with configurable boilerplate removal
  • JSON flattening with null/empty removal
  • Text sanitization (control chars + blank-line dedupe)
  • Semantic/fixed/none chunking
  • Token counting via gpt-tokenizer (per chunk + total)
  • Metadata extraction + cleanliness scoring

Input

See INPUT_SCHEMA.json.

Output

One dataset item per input record with:

  • chunks[] with tokenCount and chunkType
  • totalTokens
  • size/compression stats
  • optional metadata (title, description, canonicalUrl, wordCount, cleanlinessScore)

Pricing hint

  • $0.10 per start
  • $0.003 per result item

Local test

npm install
node test-local.js