URL to Markdown (JustHTML) - Clean Markdown Extractor
Pricing
Pay per usage
URL to Markdown (JustHTML) - Clean Markdown Extractor
Convert webpages to clean Markdown for RAG and archiving. Uses JustHTML and supports optional Cloudflare/Turnstile bypass plus CSS selector extraction.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Anass Seb
Actor stats
1
Bookmarked
3
Total users
1
Monthly active users
8 days ago
Last modified
Categories
Share
Link to Markdown (JustHTML + Cloudflare Bypass)
🔗 URL → 🧼 Clean Markdown • 🛡️ Optional bypass • 🎯 CSS selector
Convert web links into clean Markdown for RAG, archiving, content pipelines, and AI agents.
This Actor fetches a URL, optionally bypasses Cloudflare challenges using the same Camoufox-based open source bypass approach in this repository, and converts the resulting HTML to Markdown using JustHTML (pure Python HTML5 parser with built-in safe output).
Keywords
link to markdown, html to markdown, webpage to markdown, url to markdown, cloudflare bypass, turnstile, anti-bot, RAG, LLM, AI agent, markdown extractor
Why this Actor (SEO)
If you need a dependable URL → Markdown converter for RAG pipelines, you usually hit three problems:
- Broken or messy HTML that produces garbage Markdown
- Heavy JavaScript pages that hide the real content
- Anti-bot / Cloudflare interstitials that block simple fetchers
This Actor is built to be a practical extractor for AI agents, vector databases, knowledge bases, and content archiving workflows.
Common use cases
- Convert product docs pages into Markdown for RAG
- Build internal knowledge base snapshots from URLs
- Extract “article” content with a CSS selector (
main,article,.content) - Prepare clean Markdown for embedding/search indexing
Tips for better extraction
- Set
selectorto target the content container (article,main,.markdown-body) - Use
includeHtml=trueonly when debugging extraction - Keep
safe=truewhen ingesting untrusted pages into downstream systems
What you get
- Markdown output per URL (optionally for a specific CSS selector like
article,main, or.markdown-body) - Safe-by-default sanitization for untrusted HTML
- Optional Cloudflare challenge bypass fallback when normal fetching fails
- Dataset output suitable for exporting to JSON/CSV
Input
urls(array) orurl(string)selector(string, optional)safe(boolean, default: true)useCloudflareBypass(boolean, default: true)bypassCache(boolean, default: false)proxyUrl(string, optional)includeHtml(boolean, default: false)maxConcurrency(int, default: 2)
Output (dataset items)
Each item contains:
url,finalUrlstatus(successorfailed)titlemarkdownstatusCode,contentTypebypassed(boolean)error(string, if failed)
Example input
{"urls": ["https://github.com/EmilStenstrom/justhtml"],"selector": ".markdown-body","safe": true,"useCloudflareBypass": true}
Deploy to Apify
- Install Apify CLI and log in
- From this Actor directory, run:
$apify push
Then publish from the Apify Console with a title/description similar to this README for strong discoverability:
- Keywords: link to markdown, html to markdown, justhtml, cloudflare bypass, turnstile, RAG
Licensing
- This Actor’s code in this repository follows the repository’s license.
- JustHTML is vendored under and distributed under its own license (see its LICENSE file).