Pricing

$4.00 / 1,000 page reads

Skim Clean Web Reader — URL to Markdown for LLMs & RAG

Feed your LLM ~4x less input for the same content. Skim turns any URL into clean, LLM-ready Markdown plus metadata in about a second — ads, nav, and boilerplate stripped. Flat $4 per 1,000 pages, no compute billing. Built for AI agents, RAG pipelines, and no-code automations.

Pricing

$4.00 / 1,000 page reads

Rating

0.0

(0)

Developer

Karilyn Colegrove

Actor stats

Bookmarked

Total users

Monthly active users

7 days ago

Last modified

Skim — Clean Web Reader (URL to Markdown)

Turn any URL into clean, agent-ready Markdown — no ads, no nav, no boilerplate — in about a second.

This Actor is the official Apify integration for Skim, the canonical x402 clean reader API. Give it a list of URLs; it returns each page as clean Markdown plus structured metadata (title, byline, published date, language, excerpt).

See it before you wire it: try Skim free in your browser — 10 free skims a day, no wallet, no signup. Paste a URL, see exactly what you get back.

Why this Actor

Fast. Skim returns most pages in about a second — it reads and cleans, it doesn't spin up a browser farm. In production tests it ran 2.3x faster than Firecrawl on the same pages (benchmarks).
Clean. Output is readable Markdown roughly 4x smaller than the raw HTML — ideal for feeding LLMs without paying token costs for junk.
Simple. One URL in, one clean document out. No crawling configuration, no selectors, no proxies to manage.

If you need deep multi-page crawling with browser rendering, a heavyweight crawler is the right tool. If you need this page, clean, now — that's Skim.

Input

{
  "urls": [
    "https://en.wikipedia.org/wiki/HTTP_402",
    "https://example.com/article"
  ],
  "includePlainText": false
}

urls — up to 500 URLs per run.
includePlainText — also include a plain-text version of each page.

Output

One dataset item per URL:

{
  "url": "https://en.wikipedia.org/wiki/HTTP_402",
  "ok": true,
  "markdown": "# HTTP 402\n\n...",
  "metadata": {
    "title": "HTTP 402",
    "byline": null,
    "lang": "en",
    "excerpt": "..."
  },
  "elapsedMs": 1050
}

Failed URLs come back with ok: false and an error message — you are only charged for successful reads.

Use it from your stack

Call the Actor from anywhere Apify runs — your own code, n8n, Make, Zapier, or LangChain via the Apify integration.

Python:

from apify_client import ApifyClient

client = ApifyClient("<YOUR_APIFY_TOKEN>")
run = client.actor("jessiejanie/skim-clean-reader").call(run_input={
    "urls": ["https://en.wikipedia.org/wiki/HTTP_402"]
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["metadata"]["title"])
    print(item["markdown"][:500])

JavaScript:

import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: "<YOUR_APIFY_TOKEN>" });
const run = await client.actor("jessiejanie/skim-clean-reader").call({
  urls: ["https://en.wikipedia.org/wiki/HTTP_402"],
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].markdown);

FAQ

Does it render JavaScript? Not in this Actor — it reads server-rendered HTML, which covers most articles, docs, and blog pages, and is why it is fast and cheap. If a page returns nearly empty content, it is likely a client-rendered app.

Can I feed the output straight to an LLM? Yes — that is the point. The Markdown is boilerplate-free and roughly 4x smaller than the raw HTML, so you stop paying token prices for nav bars and cookie banners.

What happens on a failed page? You get the item back with ok: false and an error message, and you are not charged for it.

How many URLs per run? Up to 500. URLs are processed one at a time, most in about a second each — a 100-page run typically finishes in a couple of minutes.

Is there a way to try it without paying? Yes — freeskims.skim402.com runs the same engine in your browser, 10 free skims a day, no signup.

Pricing: one flat number, nothing else

$4.00 per 1,000 successful page reads. That is the entire bill.

No per-run start fees.
No separate compute, memory, or proxy charges — platform usage is included in the price.
Failed pages are never charged.
No subscription, no rental, no minimum.

Most tools in this niche look cheap until you read the fine print. Here is what the fine print typically adds:

Cost	Skim	Typical alternatives in this niche
Per page read	$4.00 / 1,000	$3–$50 / 1,000
Per-run start fee	none	$0.0015–$0.09 every run
Compute / memory billing	included	"free" actors bill raw platform usage — commonly $0.50–$4+ / 1,000 pages, varies with settings
Monthly rental	none	some charge a flat monthly fee on top
Failed pages	free	often billed like successes

Start fees are the quiet one: agent workloads are typically many small runs, and a $0.09 start fee makes a 10-page run cost roughly three times what the per-page price suggests. With Skim, a 10-page run costs $0.04 — exactly what the headline says.

Built on a real API, not just an Actor

Skim is a standalone product with its own infrastructure at skim402.com — a production clean-reader API serving AI agents directly, with public docs, published benchmarks, an MCP server, and connectors for LangChain, LlamaIndex, CrewAI, and Haystack. This Actor is its official Apify integration: the same engine, the same clean output, with normal Apify billing.

You can try it free in your browser before running a single paid read.

Agents that carry their own crypto wallets can also call Skim directly over the x402 protocol — pay per call in USDC, no account: skim402.com/docs. This Actor exists so Apify users get the same reads with no wallet required.

AI Web Extractor: URL → Clean Markdown + JSON for LLM/RAG

boxbox10/ai-web-extractor

Turn any URL into clean, LLM-ready Markdown + structured JSON (title, headings, main content, links, metadata, token count). Perfect for RAG pipelines, AI agents, and LLM context.

Marvin Eguilos

Website to Markdown - Clean LLM-Ready Content

ambitious_door/web-to-markdown

Convert any webpage into clean markdown stripped of navigation, ads, and boilerplate. Perfect for RAG pipelines, LLM context, and content extraction. Token counts included.

C. K.

LLM-Ready Web Extractor: URL to Clean Markdown & JSON

f0rty7even/llm-web-extractor

Turn any web page or site into clean, LLM-ready Markdown and structured JSON for RAG, agents, and fine-tuning. Strips nav/ads/boilerplate; returns main content + metadata.

Michael Yousrie

URL to Markdown — Clean Web Reader for AI Agents

logiover/url-to-markdown

Turn any URL into clean, LLM-ready Markdown in one call. Keyless Firecrawl / Jina Reader alternative: strips nav, ads and boilerplate, returns article Markdown + metadata. No API key, no browser.

Logiover

Webpage to Markdown

technicaldost/webpage-to-clean-markdown

Convert any web page into clean, LLM-ready Markdown. Strips ads, nav and boilerplate, keeping headings, links, tables and code. Perfect for RAG pipelines and AI agents.

Technical Dost Solutions

Website to Markdown

cool_ya/website-to-markdown

Convert any web page into clean, LLM-ready Markdown. Strips nav, ads and boilerplate and returns the main article text plus title, description and word count. Perfect for RAG and AI pipelines.

Y A

Web Page to Markdown & Text - URL Reader for LLMs & RAG

entranced_gelato/ai-web-page-reader

Read any web page as clean text + Markdown for LLMs and automations. Strips ads, nav, and scripts; returns the main content, metadata (title, author, date, word count), and an optional AI TL;DR + key points. The web-reading primitive for AI agents, RAG pipelines, and no-code flows.

AIDevs

Website to Markdown for RAG & LLMs

hereditary_model/website-to-markdown

Crawls a website and converts every page into clean, LLM-ready Markdown for RAG pipelines, vector databases, and AI agents. Removes nav, ads, and boilerplate. Predictable pricing: $0.004 per page converted.

Aaron Marxsen

AI Web Content Crawler - Markdown for LLMs

intelscrape/ai-web-content-crawler

Crawl any website and extract clean Markdown optimized for LLM training, RAG pipelines, and AI knowledge bases - removes boilerplate and outputs structured JSON with URL, title, markdown, and metadata.