Website to Markdown Converter for LLM Training avatar

Website to Markdown Converter for LLM Training

Pricing

Pay per usage

Go to Apify Store
Website to Markdown Converter for LLM Training

Website to Markdown Converter for LLM Training

Convert any web page to clean Markdown. Strips nav, ads, scripts, styling. Preserves headings, lists, tables, code blocks, links. Perfect for LLM training data, RAG pipelines, content migration, documentation archival, and text analysis. Bulk processing with word/link/image counts.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Ava Torres

Ava Torres

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Website to Markdown Converter for LLM Training | $0.002/page

Convert any web page to clean, structured Markdown optimized for LLM training, RAG pipelines, and content processing. Process hundreds of URLs in a single run with concurrent fetching.

What it does

  • Fetches web pages and extracts the main readable content
  • Strips navigation, ads, footers, scripts, styles, and other non-content elements
  • Converts HTML to Markdown preserving structure: headings, paragraphs, lists, tables, code blocks, blockquotes, and inline formatting
  • Optionally includes links and images as Markdown references
  • Reports metadata per page: word count, character count, link count, image count

Use cases

  • LLM training data -- Convert web content to clean text for fine-tuning datasets and RAG retrieval pipelines
  • AI knowledge bases -- Build structured Markdown knowledge bases from documentation sites, wikis, and help centers
  • Content migration -- Move content between CMS platforms (WordPress, Ghost, Notion) in portable Markdown format
  • Documentation archival -- Archive web-based docs as version-controlled Markdown files
  • SEO content auditing -- Extract and analyze competitor content structure and word counts at scale
  • Text analysis -- Clean text extraction for NLP processing, sentiment analysis, and topic modeling
  • Research data collection -- Scrape articles, blog posts, and papers into a structured dataset

Input

FieldTypeDefaultDescription
urlsstring[]requiredList of URLs to convert
maxResultsinteger100Maximum pages to process
includeLinksbooleantrueKeep hyperlinks in the Markdown output
includeImagesbooleanfalseKeep image references in the Markdown output

Output fields

Each result in the dataset includes:

FieldDescription
urlSource page URL
titlePage title
markdownFull Markdown content with structure preserved
wordCountNumber of words in the extracted content
characterCountTotal characters in the Markdown output
linkCountNumber of hyperlinks found
imageCountNumber of images found

Example output

{
"url": "https://en.wikipedia.org/wiki/Web_scraping",
"title": "Web scraping - Wikipedia",
"markdown": "## Web scraping\n\nWeb scraping is data scraping used for extracting data from websites...",
"wordCount": 3847,
"characterCount": 24102,
"linkCount": 156,
"imageCount": 2
}

Pricing

$0.002 per page converted. Convert 500 pages for $1.

Integrations

Works with Apify's API, webhooks, and MCP server integration. Chain with other actors to build automated content pipelines.