Markdownify MCP Server

Pricing

$10.00 / 1,000 results

Try for free

Go to Apify Store

Markdownify MCP Server

Try for free

Convert any webpage to clean, formatted Markdown perfect for AI consumption. Ideal for building knowledge bases, documentation scrapers, and content migration tools.

Pricing

$10.00 / 1,000 results

Rating

5.0

(5)

Developer

Crawler Bros

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Features

✅ Convert any webpage to Markdown - Clean, formatted output
✅ CSS Selector Support - Include/exclude specific sections
✅ JavaScript Rendering - Optional Playwright support for dynamic content
✅ Authentication Support - HTTP Basic Auth for restricted content
✅ Customizable Output - Configure heading styles, strip tags, etc.
✅ Error Handling - Graceful failures with detailed error messages
✅ MCP Server Ready - Structured output for AI consumption

How It Works

Input - Provide URL(s) and optional configuration
Fetch - Download webpage content (HTTP or Playwright)
Extract - Apply include/exclude selectors
Convert - Transform HTML to clean Markdown
Output - Save to Apify dataset with metadata

Input Parameters

Required

urls (array of strings) - List of webpage URLs to convert

Optional

includeSelectors (array of strings) - CSS selectors to include specific sections
Example: ["article", ".main-content", "#documentation"]
excludeSelectors (array of strings) - CSS selectors to exclude
Example: ["nav", "footer", ".advertisement", "script", "style"]
useJavaScript (boolean) - Enable Playwright for JavaScript-heavy pages
Default: false
headingStyle (string) - Markdown heading style
Options: "ATX" (# Heading) or "SETEXT" (Heading\n=======)
Default: "ATX"
stripTags (array of strings) - HTML tags to completely remove
Default: ["script", "style", "iframe", "noscript"]
auth (object) - HTTP Basic Authentication credentials
Example: {"username": "user", "password": "pass"}
timeout (integer) - Request timeout in seconds
Default: 30, Range: 10-120

Input Example

{
  "urls": ["https://apify.com/docs", "https://en.wikipedia.org/wiki/Markdown"],
  "excludeSelectors": ["nav", "footer", ".advertisement"],
  "useJavaScript": false,
  "headingStyle": "ATX",
  "timeout": 30
}

Output Format

Each converted page is saved as a separate record in the dataset:

{
  "url": "https://example.com",
  "title": "Example Domain",
  "markdown": "# Example Domain\n\nThis domain is for use...",
  "markdown_length": 1234,
  "success": true,
  "error": null,
  "scraped_at": "2025-10-24T10:30:00.000Z",
  "meta": {
    "method": "http",
    "heading_style": "ATX",
    "stripped_tags": ["script", "style"],
    "used_include_selectors": false,
    "used_exclude_selectors": true
  }
}

Use Cases

📚 Build AI-Ready Knowledge Bases

Convert documentation, wikis, and help centers into Markdown for AI training or RAG systems.

📝 Content Migration

Migrate existing web content to Markdown for static site generators (Jekyll, Hugo, etc.).

🤖 AI Agent Integration

Enable AI agents to consume web content in a clean, structured format.

📄 Documentation Scraping

Extract and format technical documentation from multiple sources.

🔄 Content Synchronization

Keep Markdown versions of web pages up-to-date automatically.

API Integration

JavaScript/Node.js

const { ApifyClient } = require("apify-client");

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const input = {
  urls: ["https://example.com"],
  excludeSelectors: ["nav", "footer"],
};

const run = await client.actor("YOUR_ACTOR_ID").call(input);
const { items } = await client.dataset(run.defaultDatasetId).listItems();

items.forEach((item) => {
  console.log(`Title: ${item.title}`);
  console.log(`Markdown length: ${item.markdown_length}`);
  console.log(item.markdown);
});

Python

from apify_client import ApifyClient

client = ApifyClient('YOUR_API_TOKEN')

input_data = {
    'urls': ['https://example.com'],
    'excludeSelectors': ['nav', 'footer']
}

run = client.actor('YOUR_ACTOR_ID').call(run_input=input_data)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(f"Title: {item['title']}")
    print(f"Markdown length: {item['markdown_length']}")
    print(item['markdown'])

cURL

curl -X POST https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com"],
    "excludeSelectors": ["nav", "footer"]
  }'

Tips & Best Practices

🚀 Performance

Use useJavaScript: false for static pages (much faster)
Only enable useJavaScript: true for dynamic content
Use includeSelectors to extract only what you need
Batch multiple URLs in a single run

🎯 Accuracy

Test selectors in browser DevTools first
Use specific includeSelectors for precise extraction
Combine include and exclude for best results
Add common noise elements to excludeSelectors

🔧 Troubleshooting

Empty markdown? Check if selectors are correct
Missing content? Try enabling useJavaScript
Timeout errors? Increase timeout value
Authentication issues? Verify auth credentials

Development

Local Testing

# Install dependencies
pip install -r requirements.txt

# Install Playwright browsers
playwright install chromium

# Run locally
python -m src

Project Structure

markdownify-mcp/
├── .actor/
│   ├── actor.json          # Actor configuration
│   ├── input_schema.json   # Input validation
│   └── output_schema.json  # Output structure
├── src/
│   ├── __main__.py         # Main entry point
│   ├── fetcher.py          # HTTP & Playwright fetchers
│   ├── extractor.py        # Content extraction
│   └── converter.py        # HTML to Markdown
├── Dockerfile              # Docker configuration
├── requirements.txt        # Python dependencies
└── README.md              # This file

License

Apache 2.0

Support

For issues, questions, or feature requests, please contact support or open an issue in the repository.

Made with ❤️ for the AI community

Website To Markdown

hamzasaleem/website-to-markdown

Convert any webpage to clean, readable Markdown format. Perfect for content extraction and readability.

Hmza

AI Website Content Markdown Scraper

quaking_pail/ai-website-content-markdown-scraper

This Apify Actor, "Website Content Crawler with Markdown Extraction," is designed to perform a comprehensive crawl of specified websites, extract their text content, convert it into Markdown format, and store it in a structured dataset. The extracted content is suitable for feeding LLMs.

AI_Builder

845

3.9

Website Content to Markdown for LLM Training

easyapi/website-content-to-markdown-for-llm-training

🚀 Transform web content into clean, LLM-ready Markdown! 📘 Scrape multiple pages, extract main content, and convert to Markdown format. Perfect for AI researchers, data scientists, and LLM developers. Fast, efficient, and customizable. Supercharge your AI training data today! 🌐📝🧠

EasyApi

224

5.0

Extract-any-webpage-content-for-llm

ai-developer/extract-any-webpage-content-for-llm

Fast and easy way to extract data from any webpage and are LLM friendly. The tool lets you easily extract content from any website. Ideal for researchers, marketers, and developers.

aideveloper

588

n8n Documentation MCP Server

agentify/n8n-mcp-server

n8n MCP Server provides AI assistants with structured access to n8n node documentation, properties, and validation tools for building and verifying workflows efficiently.

agentify

Website Content Crawler Fast

timelody/website-content-crawler-fast

Scraping data from every single web page.

timelody

5.0

Single page web scraping

krishnapada.m.99/single-page-web-scraping

Scrapes the <title> tag or H1 tag from a single webpage provided by the user. Useful for SEO audits or content previews.

Somnath Mandal

LLMs.txt generator

antonio_espresso/llms-txt-generator

Generates a clean and structured markdown format for your AI Agents LLMs.txt file. Helps define AI agent interactions, improve SEO, and ensure compatibility across platforms. Ready to integrate into your website for better transparency and control.

Antonio Blago

Webpage To Clean Markdown

technicaldost/webpage-to-clean-markdown

Technical Dost Solutions

OpenRouter

apify/openrouter

You can use any AI LLM model without accounts in AI providers. Use this Actor as a proxy for all requests. Use pay-per-event pricing to pay only for the real credit used.