HTML to Markdown — clean conversion, boilerplate stripping avatar

HTML to Markdown — clean conversion, boilerplate stripping

Pricing

from $2.00 / 1,000 results

Go to Apify Store
HTML to Markdown — clean conversion, boilerplate stripping

HTML to Markdown — clean conversion, boilerplate stripping

Convert scraped HTML into clean Markdown and plain text: headings, nested lists, links, images, code blocks, blockquotes, and tables. Drops scripts, styles, and structural boilerplate (nav/footer/aside) so only content remains. Pure parsing, no LLM cost.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Shinobu Otani

Shinobu Otani

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Categories

Share

HTML to Markdown

Convert scraped HTML into clean Markdown and plain text — pure parsing, no LLM cost. Pairs well with crawlers upstream and with Doc Structure Extractor or RAG Text Chunker downstream.

What it does

  • Headings, paragraphs, nested lists, links, images, emphasis, inline code, fenced code blocks, blockquotes, simple tables, horizontal rules.
  • Always drops <script>, <style> and other non-content tags; drops structural boilerplate (nav, footer, aside, form) by default so only the article content remains.
  • Extracts the page title (<title>, falling back to the first <h1>).
  • Also returns a plain-text rendering and basic stats.

Input

{
"documents": ["<html><body><h1>Guide</h1><p>Hello <strong>world</strong></p></body></html>"],
"drop_boilerplate": true,
"include_links": true,
"include_images": true
}

Output (one dataset item per document)

{
"title": "Guide",
"markdown": "# Guide\n\nHello **world**",
"text": "Guide\n\nHello world",
"stats": {"blocks": 2, "characters": 26, "words": 3},
"document_index": 0
}

Usage

Feed it raw HTML from any crawler run, then chunk the resulting Markdown for RAG, index the plain text for search, or store the Markdown directly.