Pricing

Pay per usage

Try for free

Go to Apify Store

URL to Markdown (JustHTML) - Clean Markdown Extractor

Try for free

Convert webpages to clean Markdown for RAG and archiving. Uses JustHTML and supports optional Cloudflare/Turnstile bypass plus CSS selector extraction.

Pricing

Pay per usage

Rating

5.0

(1)

Developer

Anass

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Link to Markdown (JustHTML + Cloudflare Bypass)

🔗 URL → 🧼 Clean Markdown • 🛡️ Optional bypass • 🎯 CSS selector

Convert web links into clean Markdown for RAG, archiving, content pipelines, and AI agents.

This Actor fetches a URL, optionally bypasses Cloudflare challenges using the same Camoufox-based open source bypass approach in this repository, and converts the resulting HTML to Markdown using JustHTML (pure Python HTML5 parser with built-in safe output).

Keywords

link to markdown, html to markdown, webpage to markdown, url to markdown, cloudflare bypass, turnstile, anti-bot, RAG, LLM, AI agent, markdown extractor

Why this Actor (SEO)

If you need a dependable URL → Markdown converter for RAG pipelines, you usually hit three problems:

Broken or messy HTML that produces garbage Markdown
Heavy JavaScript pages that hide the real content
Anti-bot / Cloudflare interstitials that block simple fetchers

This Actor is built to be a practical extractor for AI agents, vector databases, knowledge bases, and content archiving workflows.

Common use cases

Convert product docs pages into Markdown for RAG
Build internal knowledge base snapshots from URLs
Extract “article” content with a CSS selector (main, article, .content)
Prepare clean Markdown for embedding/search indexing

Tips for better extraction

Set selector to target the content container (article, main, .markdown-body)
Use includeHtml=true only when debugging extraction
Keep safe=true when ingesting untrusted pages into downstream systems

What you get

Markdown output per URL (optionally for a specific CSS selector like article, main, or .markdown-body)
Safe-by-default sanitization for untrusted HTML
Optional Cloudflare challenge bypass fallback when normal fetching fails
Dataset output suitable for exporting to JSON/CSV

Input

urls (array) or url (string)
selector (string, optional)
safe (boolean, default: true)
useCloudflareBypass (boolean, default: true)
bypassCache (boolean, default: false)
proxyUrl (string, optional)
includeHtml (boolean, default: false)
maxConcurrency (int, default: 2)

Output (dataset items)

Each item contains:

url, finalUrl
status (success or failed)
title
markdown
statusCode, contentType
bypassed (boolean)
error (string, if failed)

Example input

{
  "urls": [
    "https://github.com/EmilStenstrom/justhtml"
  ],
  "selector": ".markdown-body",
  "safe": true,
  "useCloudflareBypass": true
}

Deploy to Apify

Install Apify CLI and log in
From this Actor directory, run:

$apify push

Then publish from the Apify Console with a title/description similar to this README for strong discoverability:

Keywords: link to markdown, html to markdown, justhtml, cloudflare bypass, turnstile, RAG

Licensing

This Actor’s code in this repository follows the repository’s license.
JustHTML is vendored under and distributed under its own license (see its LICENSE file).

Website To Markdown

smart_api/website-to-markdown

Convert any webpage into clean, LLM-ready Markdown in seconds — perfect for AI training data, RAG pipelines, and content archiving.

SmartApi

5.0

Website To Markdown

hamzasaleem/website-to-markdown

Convert any webpage to clean, readable Markdown format. Perfect for content extraction and readability.

Hmza

Webpage to Markdown

extremescrapes/webpage-to-markdown

This actor cost-effectively converts websites into structured markdown optimized for AI processing. It extracts webpage content, formats it into clean markdown, and ensures compatibility with AI models.

Extreme Scrapes

173

5.0

Website Content to Markdown for LLM Training

easyapi/website-content-to-markdown-for-llm-training

🚀 Transform web content into clean, LLM-ready Markdown! 📘 Scrape multiple pages, extract main content, and convert to Markdown format. Perfect for AI researchers, data scientists, and LLM developers. Fast, efficient, and customizable. Supercharge your AI training data today! 🌐📝🧠

EasyApi

247

5.0

WebPage Scraper

muhammadsaifkhalid4/my-actor

You can scrape Webpages for data. What changed? Multiple URLs Error handling: Each URL is handled independently, failures are logged & stored. Anti-blocking: Added User-Agent + Accept-Language. Data structure: Instead of just a flat heading list, you now get per-URL results with metadata.

Saif Khalid

107

1.3

AI Website Content Markdown Scraper

quaking_pail/ai-website-content-markdown-scraper

This Apify Actor, "Website Content Crawler with Markdown Extraction," is designed to perform a comprehensive crawl of specified websites, extract their text content, convert it into Markdown format, and store it in a structured dataset. The extracted content is suitable for feeding LLMs.

AI_Builder

891

3.9

Yahoo Finance Historical Data Scraper

eraydiler/yahoo-finance-historical-data-scraper

Get detailed historical price and volume data from Yahoo Finance for multiple stocks and custom year ranges

Eray Diler

🔥 FireScrape AI Website Content Markdown Scraper

mohamedgb00714/fireScraper-AI-Website-Content-Markdown-Scraper

Advanced web scraper powered by Crawlee and Puppeteer — extracts website content, converts it to Markdown, and structures it for LLM training datasets.

mohamed el hadi msaid

264

2.6

Fast Website Content Crawler

6sigmag/fast-website-content-crawler

A high-performance web scraper that rapidly extracts and analyzes content from multiple websites simultaneously. Perfect for competitive research, content aggregation, and website structure analysis.