Markdownify MCP Server
Pricing
$10.00 / 1,000 results
Markdownify MCP Server
Convert any webpage to clean, formatted Markdown perfect for AI consumption. Ideal for building knowledge bases, documentation scrapers, and content migration tools.
5.0 (3)
Pricing
$10.00 / 1,000 results
0
2
2
Last modified
2 days ago
Convert any webpage to clean, formatted Markdown perfect for AI consumption. This Actor is ideal for building knowledge bases, documentation scrapers, and content migration tools.
Features
✅ Convert any webpage to Markdown - Clean, formatted output
✅ CSS Selector Support - Include/exclude specific sections
✅ JavaScript Rendering - Optional Playwright support for dynamic content
✅ Authentication Support - HTTP Basic Auth for restricted content
✅ Customizable Output - Configure heading styles, strip tags, etc.
✅ Error Handling - Graceful failures with detailed error messages
✅ MCP Server Ready - Structured output for AI consumption
How It Works
- Input - Provide URL(s) and optional configuration
- Fetch - Download webpage content (HTTP or Playwright)
- Extract - Apply include/exclude selectors
- Convert - Transform HTML to clean Markdown
- Output - Save to Apify dataset with metadata
Input Parameters
Required
urls(array of strings) - List of webpage URLs to convert
Optional
-
includeSelectors(array of strings) - CSS selectors to include specific sections
Example:["article", ".main-content", "#documentation"] -
excludeSelectors(array of strings) - CSS selectors to exclude
Example:["nav", "footer", ".advertisement", "script", "style"] -
useJavaScript(boolean) - Enable Playwright for JavaScript-heavy pages
Default:false -
headingStyle(string) - Markdown heading style
Options:"ATX"(# Heading) or"SETEXT"(Heading\n=======)
Default:"ATX" -
stripTags(array of strings) - HTML tags to completely remove
Default:["script", "style", "iframe", "noscript"] -
auth(object) - HTTP Basic Authentication credentials
Example:{"username": "user", "password": "pass"} -
timeout(integer) - Request timeout in seconds
Default:30, Range:10-120
Input Example
{"urls": ["https://apify.com/docs", "https://en.wikipedia.org/wiki/Markdown"],"excludeSelectors": ["nav", "footer", ".advertisement"],"useJavaScript": false,"headingStyle": "ATX","timeout": 30}
Output Format
Each converted page is saved as a separate record in the dataset:
{"url": "https://example.com","title": "Example Domain","markdown": "# Example Domain\n\nThis domain is for use...","markdown_length": 1234,"success": true,"error": null,"scraped_at": "2025-10-24T10:30:00.000Z","meta": {"method": "http","heading_style": "ATX","stripped_tags": ["script", "style"],"used_include_selectors": false,"used_exclude_selectors": true}}
Use Cases
📚 Build AI-Ready Knowledge Bases
Convert documentation, wikis, and help centers into Markdown for AI training or RAG systems.
📝 Content Migration
Migrate existing web content to Markdown for static site generators (Jekyll, Hugo, etc.).
🤖 AI Agent Integration
Enable AI agents to consume web content in a clean, structured format.
📄 Documentation Scraping
Extract and format technical documentation from multiple sources.
🔄 Content Synchronization
Keep Markdown versions of web pages up-to-date automatically.
API Integration
JavaScript/Node.js
const { ApifyClient } = require("apify-client");const client = new ApifyClient({ token: "YOUR_API_TOKEN" });const input = {urls: ["https://example.com"],excludeSelectors: ["nav", "footer"],};const run = await client.actor("YOUR_ACTOR_ID").call(input);const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach((item) => {console.log(`Title: ${item.title}`);console.log(`Markdown length: ${item.markdown_length}`);console.log(item.markdown);});
Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_API_TOKEN')input_data = {'urls': ['https://example.com'],'excludeSelectors': ['nav', 'footer']}run = client.actor('YOUR_ACTOR_ID').call(run_input=input_data)for item in client.dataset(run['defaultDatasetId']).iterate_items():print(f"Title: {item['title']}")print(f"Markdown length: {item['markdown_length']}")print(item['markdown'])
cURL
curl -X POST https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs \-H "Authorization: Bearer YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"urls": ["https://example.com"],"excludeSelectors": ["nav", "footer"]}'
Tips & Best Practices
🚀 Performance
- Use
useJavaScript: falsefor static pages (much faster) - Only enable
useJavaScript: truefor dynamic content - Use
includeSelectorsto extract only what you need - Batch multiple URLs in a single run
🎯 Accuracy
- Test selectors in browser DevTools first
- Use specific
includeSelectorsfor precise extraction - Combine
includeandexcludefor best results - Add common noise elements to
excludeSelectors
🔧 Troubleshooting
- Empty markdown? Check if selectors are correct
- Missing content? Try enabling
useJavaScript - Timeout errors? Increase
timeoutvalue - Authentication issues? Verify
authcredentials
Development
Local Testing
# Install dependenciespip install -r requirements.txt# Install Playwright browsersplaywright install chromium# Run locallypython -m src
Project Structure
markdownify-mcp/├── .actor/│ ├── actor.json # Actor configuration│ ├── input_schema.json # Input validation│ └── output_schema.json # Output structure├── src/│ ├── __main__.py # Main entry point│ ├── fetcher.py # HTTP & Playwright fetchers│ ├── extractor.py # Content extraction│ └── converter.py # HTML to Markdown├── Dockerfile # Docker configuration├── requirements.txt # Python dependencies└── README.md # This file
License
Apache 2.0
Support
For issues, questions, or feature requests, please contact support or open an issue in the repository.
Made with ❤️ for the AI community
On this page
Share Actor:
