Crawl4AI Web to Markdown — URL to Clean Markdown for LLM & RAG
Pricing
from $1.00 / 1,000 page converteds
Crawl4AI Web to Markdown — URL to Clean Markdown for LLM & RAG
Convert any URL, sitemap, or whole website into clean Markdown for LLMs, RAG pipelines, and AI agents. Powered by the open-source Crawl4AI engine. Pay per page ($1/1,000), failed pages never charged. MCP-ready — call it from Claude or Cursor.
Pricing
from $1.00 / 1,000 page converteds
Rating
0.0
(0)
Developer
Bikram
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
13 hours ago
Last modified
Categories
Share
Convert any URL to clean, LLM-ready Markdown — without installing or hosting anything. This Actor is a hosted Crawl4AI: it wraps the popular open-source Crawl4AI crawler (the most-starred LLM-friendly web crawler on GitHub) and runs it on Apify's infrastructure with a real Chromium browser, so JavaScript-heavy pages render correctly. Point it at a URL, a sitemap, or a whole site, and get back boilerplate-free Markdown ready for RAG pipelines, vector databases, fine-tuning datasets, or direct pasting into an LLM context window.
Features
- URL to Markdown in one call — single pages, full sitemaps, or breadth-first site crawls (up to 1,000 pages per run)
- Built on Crawl4AI — the same
AsyncWebCrawler+ pruning content filter you'd run locally, with zero setup - Boilerplate removal — navigation menus, footers, cookie banners and sidebars are stripped, leaving "fit markdown" optimized for token budgets
- Real browser rendering — Chromium via Playwright, so SPAs and JavaScript-rendered content convert correctly
- Three output formats — Markdown only, Markdown + cleaned HTML, or Markdown + metadata/links JSON
- RAG-friendly dataset output — each page is one dataset item with
url,title,markdown,wordCount,crawledAt; export as JSON, CSV, or via API - Respects robots.txt by default (configurable)
- Fair pay-per-event pricing — you are charged only for pages that convert successfully; failed pages are free
- MCP-ready — callable as a tool from Claude, Cursor, or any MCP client via Apify's MCP server
Input example
{"startUrls": [{ "url": "https://docs.crawl4ai.com" }],"crawlMode": "crawl","maxPages": 50,"includeLinks": false,"outputFormat": "markdown","removeBoilerplate": true,"respectRobotsTxt": true}
| Field | Type | Default | Description |
|---|---|---|---|
startUrls | array | — (required) | URLs to convert |
crawlMode | string | single | single (only listed URLs), sitemap (pages from sitemap.xml), crawl (follow same-domain links) |
maxPages | integer | 10 | Max pages per run (1–1000) |
includeLinks | boolean | false | Keep hyperlinks in the Markdown |
outputFormat | string | markdown | markdown, markdown+html, or markdown+json |
removeBoilerplate | boolean | true | Strip navigation/footer/cookie-banner noise ("fit markdown") |
respectRobotsTxt | boolean | true | Skip pages disallowed by robots.txt (not charged) |
proxyConfiguration | object | none | Optional Apify Proxy / custom proxy settings |
Output example
Each successfully converted page becomes one dataset item:
{"url": "https://docs.crawl4ai.com/core/quickstart/","title": "Quick Start - Crawl4AI Documentation","markdown": "# Getting Started with Crawl4AI\n\nWelcome to Crawl4AI, an open-source LLM-friendly Web Crawler & Scraper...","wordCount": 1183,"crawledAt": "2026-06-13T10:42:07.512345+00:00"}
With outputFormat: "markdown+json", items additionally contain metadata (description, og tags, etc.) and links.internal / links.external arrays. With markdown+html, items contain the html field with cleaned HTML.
Pricing — about $1 per 1,000 pages
This Actor uses Apify's pay-per-event model with one simple event:
| Event | Price | When it's charged |
|---|---|---|
page-converted | $0.001 | Once per page successfully converted to Markdown |
That's $1 per 1,000 pages, plus standard Apify platform usage for your runs (compute, proxy if enabled). Pages that fail to load, return an HTTP error, time out, or are blocked by robots.txt are never charged. You can also set a maximum cost per run in Apify Console — the Actor stops gracefully when your limit is reached.
Comparable webpage-to-markdown Actors on Apify Store charge up to $0.05 per page for the same job.
Use from Claude, Cursor & other AI agents (MCP)
This Actor works as a tool over the Model Context Protocol. Add Apify's MCP server to your client and your agent can convert URLs to Markdown on demand:
{"mcpServers": {"apify": {"url": "https://mcp.apify.com/sse?actors=bikram07/web-to-markdown-crawl4ai","headers": {"Authorization": "Bearer YOUR_APIFY_TOKEN"}}}}
Then ask your agent things like: "Fetch https://example.com/blog as Markdown and summarize it" — the agent calls this Actor, gets clean Markdown back, and works with it directly. This is ideal for agentic RAG: the agent decides what to read, this Actor handles rendering, extraction, and cleanup.
You can also call it from code via the Apify API:
curl -X POST "https://api.apify.com/v2/acts/bikram07~web-to-markdown-crawl4ai/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"startUrls": [{"url": "https://example.com"}], "crawlMode": "single"}'
Hosted vs. self-hosted Crawl4AI
Crawl4AI is open source — you can absolutely run it yourself. Self-hosting means managing a Python environment, Playwright browser binaries, OS dependencies, memory for Chromium, retries, and a server that's always on. This Actor is for the cases where that overhead isn't worth it: you pay roughly $1 per 1,000 pages, get an HTTPS API + MCP endpoint immediately, scale to parallel runs without provisioning anything, and your results land in queryable dataset storage. If you're converting millions of pages a month on dedicated hardware, self-hosting can be cheaper; for everything from prototypes to production RAG ingestion at moderate volume, hosted is simpler.
FAQ
How do I convert a website to Markdown for an LLM?
Add the site URL to startUrls, pick crawl mode (or sitemap if the site has a sitemap.xml), set maxPages, and run. Each page becomes a dataset item with clean Markdown you can chunk and embed for RAG.
Does it handle JavaScript-rendered pages and SPAs? Yes. Pages are rendered in headless Chromium via Playwright before conversion, so client-side rendered content is included — unlike simple HTML-to-markdown converters that only see the initial HTML.
What's the difference between this and running crawl4ai locally? The conversion engine is the same library. The difference is operational: no Python/Playwright setup, no server to maintain, an instant REST API and MCP endpoint, parallel scaling, and dataset storage with JSON/CSV export. See the comparison section above.
Am I charged for pages that fail?
No. The page-converted event is only charged for pages that successfully convert. Timeouts, HTTP errors, and robots.txt-blocked pages are logged and free. You can also cap the maximum total cost per run in Apify Console.
Can I keep links and raw HTML in the output?
Yes. Set includeLinks: true to preserve hyperlinks in the Markdown, and outputFormat: "markdown+html" or "markdown+json" to additionally get cleaned HTML or metadata + link lists per page.
Related searches this Actor answers
crawl4ai hosted · url to markdown · website to markdown for LLM · web scraping for RAG · html to markdown converter API · convert webpage to markdown for vector database · LLM-ready web content extraction
Built on Crawl4AI (Apache 2.0). This Actor is not affiliated with the Crawl4AI project; it packages the library as a hosted service.