Pricing

Pay per usage

Webpage to Markdown Converter

Convert any webpage URL to clean Markdown format. Preserves headings, lists, tables, links, and code blocks. Optimized for LLM consumption, RAG pipelines, and vector database ingestion.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Donny

Actor stats

Bookmarked

Total users

Monthly active users

a day ago

Last modified

Webpage to Markdown

What is Webpage to Markdown?

Webpage to Markdown is an Apify actor that converts any webpage into clean, well-structured Markdown format. It fetches HTML from one or more URLs, strips away scripts, styles, navigation, headers, footers, and other non-content elements, then converts the remaining content into proper Markdown with headings, lists, tables, code blocks, bold, italic, links, and images. The output includes the page title, word count, character count, and a timestamp alongside the Markdown content. This actor is ideal for building RAG (Retrieval-Augmented Generation) pipelines, content archiving systems, knowledge bases, and any application where you need structured, LLM-ready text from web pages.

Unlike simple text extraction, Webpage to Markdown preserves the document structure. Headings remain as headings, tables remain as tables, code blocks keep their formatting, and lists maintain their hierarchy. This structural preservation is critical for LLMs that benefit from understanding document organization and for downstream applications that need to render or further process the content.

Why use Webpage to Markdown?

Preserves document structure -- Headings (H1-H6), lists (ordered and unordered), tables, code blocks, blockquotes, bold, and italic formatting are all correctly converted to Markdown syntax.
Clean content extraction -- Scripts, styles, navigation, headers, footers, forms, buttons, hidden elements, and iframes are all automatically removed before conversion.
Batch processing -- Process multiple URLs in a single run. Just provide a list of URLs and the actor handles them sequentially.
Configurable output -- Choose whether to include images and hyperlinks in the Markdown output. Control maximum content length to stay within LLM token limits.
LLM-ready output -- The clean Markdown format is directly consumable by Claude, GPT, and other LLMs for question answering, summarization, and analysis.
Full table support -- HTML tables are converted to pipe-delimited Markdown tables with proper header separation.
Error resilience -- Failed URLs produce error records in the dataset rather than crashing the entire run, so batch processing always completes.

How to use Webpage to Markdown

Go to the actor page on Apify and click "Start".
Add your URLs to the urls input field. You can add one URL or dozens.
Configure options: Toggle image inclusion, link preservation, and set the maximum content length.
Run the actor and download results from the dataset in JSON, CSV, or Excel format.

Using the Apify API

curl -X POST "https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://en.wikipedia.org/wiki/Web_scraping", "https://docs.apify.com"],
    "includeImages": false,
    "includeLinks": true,
    "maxContentLength": 50000
  }'

Using the Apify SDK

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('YOUR_ACTOR_ID').call({
    urls: ['https://en.wikipedia.org/wiki/Web_scraping'],
    includeLinks: true,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].markdown);

Input configuration

Field	Type	Default	Description
`urls`	String[]	`["https://example.com"]`	List of webpage URLs to convert to Markdown format
`includeImages`	Boolean	`false`	Whether to include image references (`![alt](src)`) in the Markdown output
`includeLinks`	Boolean	`true`	Whether to preserve hyperlinks (`[text](href)`) in the Markdown output
`maxContentLength`	Integer	`50000`	Maximum number of characters in the output Markdown content. Content exceeding this limit is truncated with a notice.

Output data

Each processed URL produces one record in the dataset with the following fields:

{
  "url": "https://en.wikipedia.org/wiki/Web_scraping",
  "title": "Web scraping - Wikipedia",
  "markdown": "# Web scraping\n\nWeb scraping, web harvesting, or web data extraction is [data scraping](https://en.wikipedia.org/wiki/Data_scraping) used for extracting data from websites...\n\n## Techniques\n\n- Human copy-and-paste\n- Text pattern matching\n- HTTP programming\n- DOM parsing...",
  "wordCount": 4523,
  "charCount": 28190,
  "timestamp": "2026-03-03T12:00:00.000Z"
}

Field	Type	Description
`url`	String	The original URL that was processed
`title`	String	The page title extracted from the HTML `<title>` tag
`markdown`	String	The converted Markdown content with proper formatting
`wordCount`	Integer	Number of words in the Markdown output
`charCount`	Integer	Number of characters in the Markdown output
`timestamp`	String	ISO 8601 timestamp of when the conversion was performed
`error`	String	Error message if the URL could not be processed (only present on failures)

Cost of usage

Webpage to Markdown is a lightweight actor that uses minimal compute resources. Each URL typically processes in 1-3 seconds depending on page size. With default memory (2048 MB), processing a batch of 10 URLs costs approximately $0.01-$0.03. For large-scale operations processing 1000 URLs, expect costs around $1-$3. The actor uses no external paid APIs or browser automation, keeping costs minimal. The per-event pricing is $0.05 per actor run plus $0.001 per result.

Tips and tricks

Set includeLinks to true for RAG: When building RAG pipelines, keeping hyperlinks in the Markdown helps LLMs provide source attribution and follow-up references.
Set includeImages to false for text-only LLMs: If your LLM cannot process images, disable image inclusion to keep the Markdown clean and reduce token usage.
Adjust maxContentLength for your LLM context window: If you are using a model with a 4K token limit, set maxContentLength to around 12000 characters. For 128K context models, the default 50000 is usually fine.
Batch URLs for efficiency: Processing multiple URLs in one run is more efficient than starting separate runs for each URL due to reduced actor startup overhead.
Handle errors gracefully: URLs that fail (404, timeout, etc.) produce records with an error field and empty content. Filter these out in your downstream processing.
Combine with chunking: If you need text chunks for vector database ingestion, pair this actor with the URL to Clean Text actor which supports automatic text chunking with configurable overlap.
Export as CSV: The Apify dataset can be exported as CSV, making it easy to import into spreadsheets or databases for further analysis.

Webpage To Markdown Converter

brave_paradise/webpage-to-markdown-converter

Converts web pages into clean Markdown format. For each provided URL, the actor fetches the HTML content, extracts the main body, and converts HTML elements to their Markdown equivalents including headers, links, lists, bold and italic text, images, and paragraphs. Outputs the page title, convert...

Donny

Webpage To Markdown

consummate_mandala/webpage-to-markdown

Webpage To Markdown. Powerful automation with structured JSON/CSV output, proxy rotation, and automatic retries. Pay only for results.

Donny Nguyen

Html To Markdown Converter 📄

powerful_bachelor/html-to-markdown-converter

📄✨ HTML to Markdown Converter transforms web pages into clean, portable Markdown. Simply input a URL to extract content while preserving structure, formatting, and media elements.🔄 Perfect for content repurposing, documentation, and creating readable, platform-independent text from any webpage! 🚀

Powerful Bachelor

Ai Ready Web Page To Markdown Converter

mustafa.irshaid.113/ai-ready-web-page-to-markdown-converter

Convert any webpage into structured Markdown and HTML using just a URL. Get the page title, link, and content—perfect for SEO, devs, and AI crawlers. Fast, clean, and ideal for repurposing or analysis. Start turning websites into Markdown instantly.

Mustafa Irshaid

Website To Markdown

hamzasaleem/website-to-markdown

Convert any webpage to clean, readable Markdown format. Perfect for content extraction and readability.

Hmza

Webpage to Markdown

extremescrapes/webpage-to-markdown

This actor cost-effectively converts websites into structured markdown optimized for AI processing. It extracts webpage content, formats it into clean markdown, and ensures compatibility with AI models.

Extreme Scrapes

171

5.0

(3)

Mcp Document Converter

consummate_mandala/mcp-document-converter

Mcp Document Converter. Transform data between formats with high fidelity. Fast processing with structured output.

Donny Nguyen

Webpage Content Scraper to Markdown

riisager/tulabot-cloudflare-markdown

Focus on cost, Scrape any webpage content into LLM-ready Markdown for RAG. Uses a smart hybrid 4 tier engine: Apify for crawling + Cloudflare Browser Rendering for perfect extraction. Automatically saves costs by detecting native markdown support.

Søren Riisager

Lightning-fast Web to Markdown Converter for AI Agents

intricate_roster/ghost-reader-mcp-markdown

Converts URLs to clean Markdown. Built this for my own RAG pipeline to save API costs. It strips all the HTML bloat but keeps the structure intact. Perfect for feeding context to Claude/GPT without hitting rate limits. Fast, simple, does one thing well.

xiaodong xiang

AI Markdown Maker

onescales/bulk-ai-markdown-maker

Convert any web page into clean, AI ready markdown format in seconds. Perfect for feeding content to AI models, creating documentation, or archiving web content in a portable format. In addition it intelligently parse web content, removing ads, navigation, and other clutter. Generate Markdown Today!