Pricing

from $2.00 / 1,000 results

Website to Markdown – Clean LLM & RAG Content Extractor

Convert any public web page to clean, LLM-ready Markdown with metadata — by URL, a list of URLs, or a whole-site crawl. Strips nav/ads/boilerplate, keeps headings/lists/tables/code. Respects robots.txt. No API key.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Daniel Brenner

Actor stats

Bookmarked

Total users

Monthly active users

15 days ago

Last modified

Website to Markdown — Clean LLM & RAG Content Extractor

Convert any public web page to clean, LLM-ready Markdown — by URL, a list of URLs, or a whole-site crawl. No API key, no browser to manage. The tool fetches the page, strips navigation, ads and boilerplate, and returns tidy Markdown plus structured metadata — ideal for feeding LLMs, building RAG pipelines, archiving articles, or training datasets.

It's the no-fuss answer to "how do I turn a website into Markdown for ChatGPT / a vector database / an AI agent?" — give it URLs, get Markdown back as JSON, CSV or Excel.

What it does

HTML → clean Markdown, with headings, lists, tables, code blocks and links preserved (GitHub-flavored).
Main-content extraction — removes menus, headers, footers, sidebars, cookie banners and ads so the Markdown is just the article, not the chrome. (Or keep the full page if you prefer.)
Structured metadata per page — title, description, author, published date, language, site name, canonical URL.
RAG-ready extras — optional token-count estimate and overlapping chunks for retrieval-augmented generation.
Three modes — a single page, a list of pages, or crawl a site (same-domain link following, depth- and page-limited).
Respects robots.txt and sends a descriptive User-Agent. Public pages only — no logins, no paywalls.

What you get (per page)

field	description
`url` / `finalUrl`	requested URL, and the final URL after redirects (if different)
`title`, `description`, `author`, `publishedDate`, `lang`, `siteName`	page metadata (honest-null when not present)
`canonicalUrl`	the page's canonical link, if declared
`markdown`	the clean, LLM-ready Markdown
`wordCount`, `tokenEstimate`	length, plus an approximate LLM token count (a heuristic estimate, not exact)
`chunks`	optional RAG chunks (`{index, text}`) when "Chunk for RAG" is on
`links`, `images`	optional lists of absolute links / image URLs on the page
`httpStatus`, `fetchedAt`, `robotsAllowed`	response status, fetch time, robots check result

Anything a page doesn't expose comes back as null — never guessed.

How to use it

Single page or a list:

{ "startUrls": ["https://example.com/blog/post"], "mode": "single" }

{ "startUrls": ["https://a.com/p1", "https://a.com/p2"], "mode": "list" }

Crawl a site for a RAG dataset:

{
  "startUrls": ["https://docs.example.com/"],
  "mode": "crawl",
  "maxCrawlDepth": 2,
  "maxPages": 100,
  "sameDomainOnly": true,
  "chunkForRag": true,
  "chunkSize": 2000,
  "chunkOverlap": 200
}

Set contentMode to main (default — just the article) or full (the whole page). Toggle includeLinks / includeImages to also collect a page's links and images.

Why this tool

No API key, no headless browser to babysit — give it URLs, get Markdown.
Clean output — boilerplate-stripped main content, not a raw HTML dump.
LLM-first — Markdown + metadata + optional chunking and token estimate, the exact shape RAG and fine-tuning pipelines want.
Polite & transparent — respects robots.txt, identifies itself, fetches only public pages one at a time.

Pricing is pay-per-result: $2 per 1,000 pages — you only pay for pages successfully converted. Export as JSON, CSV or Excel.

Responsible use

This tool reformats the public pages you point it at — the same content a browser would load. You're responsible for how you use content from sites you don't own (respect each site's terms and applicable copyright). The tool honors robots.txt and never bypasses logins or paywalls.

FAQ

Do I need an API key?

No. Give it one or more URLs and run it — no key, no quota.

Can it convert a whole website, not just one page?

Yes — use crawl mode with a start URL. It follows same-domain links up to the depth and page limits you set, converting each page to Markdown.

Is the Markdown ready for an LLM / RAG?

Yes — that's the point. You get clean Markdown plus metadata, an approximate token count, and optional overlapping chunks for retrieval-augmented generation.

How much does it cost?

Pay-per-result: $2 per 1,000 pages — you only pay for the pages you actually get.

⭐ Found this useful? A quick rating on this actor's Store page helps others discover it — and if something is off or you wish it did more, open an issue on the actor. I read every one.

Website to Markdown for LLMs & RAG — Content Extractor

runlayer/website-to-markdown

Turn any URL or whole site into clean, LLM-ready Markdown, text, or JSON. Strips nav/ads/boilerplate; keeps headings, links, tables, code. Sitemap-aware concurrent crawl, URL filters, robots.txt respected, rich metadata. Charged only per page extracted — no startup fee.

Runlayer

LLM-Ready Web Extractor — URL to Clean Markdown & JSON

f0rty7even/llm-web-extractor

Turn any web page or site into clean, LLM-ready Markdown and structured JSON for RAG, agents, and fine-tuning. Strips nav/ads/boilerplate; returns main content + metadata.

F0rty7even

Website to Markdown

cool_ya/website-to-markdown

Convert any web page into clean, LLM-ready Markdown. Strips nav, ads and boilerplate and returns the main article text plus title, description and word count. Perfect for RAG and AI pipelines.

Y A

AI Web to Markdown - LLM-Ready Extractor

wiry_kingdom/ai-web-to-markdown

Convert any URL into clean LLM-ready markdown. Strips ads, nav, footer. Preserves headings, lists, tables, code blocks. Returns token count. Perfect for RAG, fine-tuning, AI agents. 10x cheaper than Firecrawl.

Mohieldin Mohamed

Site to Markdown — any site to clean, LLM-ready markdown

topsail/site-to-markdown

Scrape any website to clean, LLM-ready markdown — a compliant Firecrawl alternative for RAG ingestion, robots.txt always on.

Connor Teskey

Web to Markdown — AI-Ready Text from Any URL

wsgcjj/web-to-markdown

Convert any web page URL to clean Markdown format. Perfect for LLM training data, RAG pipelines, and AI content processing. Extracts main content, strips ads/nav/footers.

陈俊杰

Website to Markdown - Clean LLM-Ready Content

ambitious_door/web-to-markdown

Convert any webpage into clean markdown stripped of navigation, ads, and boilerplate. Perfect for RAG pipelines, LLM context, and content extraction. Token counts included.

C. K.

Article Extractor — Clean Web Content to Markdown/Text

omao/article-extractor

Extract the main article from any web page into clean Markdown or text, with title, author, date and description. Strips nav, ads and boilerplate. Fast, no setup.

Marouane Oulabass

Smart Web Content Extractor for AI & LLM

project_bbb/smart-web-content-extractor

Crawl any website and extract clean, structured content optimized for LLM consumption. Outputs Markdown, plain text, or HTML with metadata. Removes nav, ads, and boilerplate automatically.

BBB & Company

URL to Markdown — Clean Web Reader for AI Agents

logiover/url-to-markdown

Turn any URL into clean, LLM-ready Markdown in one call. Keyless Firecrawl / Jina Reader alternative: strips nav, ads and boilerplate, returns article Markdown + metadata. No API key, no browser.