Website to Markdown Converter for LLM Training
Pricing
Pay per usage
Website to Markdown Converter for LLM Training
Convert any web page to clean Markdown. Strips nav, ads, scripts, styling. Preserves headings, lists, tables, code blocks, links. Perfect for LLM training data, RAG pipelines, content migration, documentation archival, and text analysis. Bulk processing with word/link/image counts.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Ava Torres
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Website to Markdown Converter for LLM Training | $0.002/page
Convert any web page to clean, structured Markdown optimized for LLM training, RAG pipelines, and content processing. Process hundreds of URLs in a single run with concurrent fetching.
What it does
- Fetches web pages and extracts the main readable content
- Strips navigation, ads, footers, scripts, styles, and other non-content elements
- Converts HTML to Markdown preserving structure: headings, paragraphs, lists, tables, code blocks, blockquotes, and inline formatting
- Optionally includes links and images as Markdown references
- Reports metadata per page: word count, character count, link count, image count
Use cases
- LLM training data -- Convert web content to clean text for fine-tuning datasets and RAG retrieval pipelines
- AI knowledge bases -- Build structured Markdown knowledge bases from documentation sites, wikis, and help centers
- Content migration -- Move content between CMS platforms (WordPress, Ghost, Notion) in portable Markdown format
- Documentation archival -- Archive web-based docs as version-controlled Markdown files
- SEO content auditing -- Extract and analyze competitor content structure and word counts at scale
- Text analysis -- Clean text extraction for NLP processing, sentiment analysis, and topic modeling
- Research data collection -- Scrape articles, blog posts, and papers into a structured dataset
Input
| Field | Type | Default | Description |
|---|---|---|---|
urls | string[] | required | List of URLs to convert |
maxResults | integer | 100 | Maximum pages to process |
includeLinks | boolean | true | Keep hyperlinks in the Markdown output |
includeImages | boolean | false | Keep image references in the Markdown output |
Output fields
Each result in the dataset includes:
| Field | Description |
|---|---|
url | Source page URL |
title | Page title |
markdown | Full Markdown content with structure preserved |
wordCount | Number of words in the extracted content |
characterCount | Total characters in the Markdown output |
linkCount | Number of hyperlinks found |
imageCount | Number of images found |
Example output
{"url": "https://en.wikipedia.org/wiki/Web_scraping","title": "Web scraping - Wikipedia","markdown": "## Web scraping\n\nWeb scraping is data scraping used for extracting data from websites...","wordCount": 3847,"characterCount": 24102,"linkCount": 156,"imageCount": 2}
Pricing
$0.002 per page converted. Convert 500 pages for $1.
Integrations
Works with Apify's API, webhooks, and MCP server integration. Chain with other actors to build automated content pipelines.