Pricing

Pay per usage

Website to Markdown - Clean LLM-Ready Content

Convert any webpage into clean markdown stripped of navigation, ads, and boilerplate. Perfect for RAG pipelines, LLM context, and content extraction. Token counts included.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

C. K.

Actor stats

Bookmarked

Total users

Monthly active users

21 days ago

Last modified

Website to Markdown — Clean, LLM-Ready Content Extraction

Convert any webpage or website into clean markdown, stripped of navigation, ads, sidebars, and boilerplate. Output drops straight into any RAG pipeline, LLM context window, or vector store without cleanup. Token counts included so you can plan your embedding budget.

What it does

Most web scrapers give you raw HTML or a wall of unstructured text. You then spend hours cleaning, reformatting, and fixing broken context. This Actor eliminates that step.

Give it a URL. It crawls the site, strips all chrome (navigation, sidebars, footers, cookie banners), and converts each page to clean markdown preserving headings, code blocks, tables, lists, and links. Every page includes a token count (cl100k_base encoding) so you know exactly what it costs to embed or send to an LLM.

Output format

Field	Type	Description
`url`	string	Source URL of the page
`title`	string	Page title
`content`	string	Clean markdown content
`token_count`	integer	Token count (cl100k_base encoding)
`content_length`	integer	Character count
`meta_description`	string	Page meta description (if available)

Input parameters

Parameter	Type	Default	Description
`startUrl`	string	—	URL to start crawling from
`urls`	array	—	List of specific URLs to convert (batch mode)
`maxPages`	integer	`50`	Maximum pages to convert
`crawlSameDomain`	boolean	`true`	Stay within the start URL's domain
`pathPrefix`	string	`""`	Only crawl paths starting with this prefix
`outputFormat`	string	`"markdown"`	`"markdown"` or `"plain_text"`
`includeMetadata`	boolean	`true`	Include token count and meta description

Example usage

Single page

{
    "startUrl": "https://docs.python.org/3/library/asyncio.html",
    "maxPages": 1
}

Batch conversion

{
    "urls": [
        "https://example.com/page-1",
        "https://example.com/page-2",
        "https://example.com/page-3"
    ],
    "maxPages": 3
}

Full site crawl

{
    "startUrl": "https://fastapi.tiangolo.com/",
    "maxPages": 100,
    "pathPrefix": "/tutorial/"
}

Pricing

This Actor uses the pay-per-event model. You are charged per page successfully converted to markdown. No charge for pages that are skipped (empty, non-content).

How it works

Crawl — Crawlee handles the URL queue, deduplication, rate limiting, and robots.txt compliance.
Clean — Strips navigation, sidebars, footers, cookie banners, and boilerplate using curated selectors. Falls back to <article>, <main>, or <body>.
Convert — Transforms clean HTML to structured markdown, preserving headings, code blocks, tables, lists, and links.
Count — Uses cl100k_base (GPT-4 / modern embedding encoding) for accurate token counts.

Responsible use

This Actor respects robots.txt by default (enforced by Crawlee).
Crawlee's built-in autoscaling keeps request rates reasonable.
You are responsible for ensuring your use complies with the target site's Terms of Service.

Built with

Crawlee for reliable crawling
BeautifulSoup for HTML parsing
tiktoken for token counting

Website To Markdown

smart_api/website-to-markdown

Convert any webpage into clean, LLM-ready Markdown in seconds — perfect for AI training data, RAG pipelines, and content archiving.

SmartApi

5.0

Webpage To Markdown

kawsar/webpage-to-markdown

Convert any webpage into clean, structured, LLM-ready Markdown. Handles JavaScript-rendered sites, strips ads and navigation clutter, and outputs metadata alongside content built for RAG pipelines, AI training, SEO audits, and content archiving.

Kawsar

Website to Markdown

cool_ya/website-to-markdown

Convert any web page into clean, LLM-ready Markdown. Strips nav, ads and boilerplate and returns the main article text plus title, description and word count. Perfect for RAG and AI pipelines.

Y A

Website Content Crawler - Markdown & Text for LLM / RAG

pear_fight/website-content-crawler-markdown-text-for-llm-rag

Crawl any website and extract clean article text and Markdown, ready to feed into LLMs, ChatGPT, vector databases and RAG pipelines. Removes navigation, ads and boilerplate. Configurable crawl depth and page limits. Export to JSON, CSV, Excel.

Harald

Web Content Extractor - Clean Markdown for AI

geekguymj/web-content-extractor

Extract clean, readable markdown content from any web page. Removes navigation, ads, footers, and boilerplate — outputs structured markdown optimized for LLM training, RAG pipelines, and AI agents. Pay-per-event pricing. $0.002/page.

Matthew Jenkins

LLM Markdown Crawler

sleek_waveform/llm-markdown-crawler

Crawl any website and extract clean, boilerplate-free Markdown optimized for LLMs, RAG pipelines, and AI training datasets. Uses Mozilla Readability to strip navigation and ads, then converts to clean Markdown. No browser required — fast and cheap.

Daniel Dimitrov

Website to Markdown MCP Server

quodlibetical_buffalo/website-to-markdown-mcp

Convert any webpage to clean Markdown. MCP server for AI agents and LLM pipelines.

Marek Pommier

LLM-Ready Web Extractor — URL to Clean Markdown & JSON

f0rty7even/llm-web-extractor

Turn any web page or site into clean, LLM-ready Markdown and structured JSON for RAG, agents, and fine-tuning. Strips nav/ads/boilerplate; returns main content + metadata.

F0rty7even

Web-to-Markdown Generator for AI & RAG Pipelines

profitstack/web-to-markdown-generator-for-ai-rag-pipelines

Convert any website into clean, heading-based chunking, LLM-ready Markdown for RAG and AI agents.

Manas Mantri

Website to Markdown for LLM and RAG

jeweled_jockstrap/my-actor-3

Convert any URL to clean Markdown text for AI applications. Strips HTML extracts content. For LLM training RAG pipelines and vector databases. Free Firecrawl alternative.