Website to Markdown avatar

Website to Markdown

Pricing

from $1.050005 / actor start

Go to Apify Store
Website to Markdown

Website to Markdown

Convert any web page into clean, LLM-ready Markdown. Strips nav, ads and boilerplate and returns the main article text plus title, description and word count. Perfect for RAG and AI pipelines.

Pricing

from $1.050005 / actor start

Rating

0.0

(0)

Developer

Y A

Y A

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

13 hours ago

Last modified

Share

Turn any web page into clean, LLM-ready Markdown โ€” no boilerplate, no HTML noise.

Feed this Actor a list of URLs and get back the main readable content of each page as tidy Markdown, with navigation, ads, scripts, and footers stripped out. Built for RAG pipelines, AI agents, LLM ingestion, content archiving, and research.

What it does

  • ๐Ÿ“ Clean Markdown โ€” headings, lists, tables, blockquotes, and code blocks preserved.
  • ๐Ÿงน Boilerplate removal โ€” drops nav, headers, footers, sidebars, forms, scripts, and ads.
  • ๐ŸŽฏ Smart main-content detection โ€” finds <article>/<main>/content containers automatically.
  • ๐Ÿท๏ธ Metadata โ€” page title, meta/OpenGraph description, and word count.
  • โšก HTTP-only & fast โ€” no headless browser, so it's cheap even at scale.

Input

FieldTypeDescription
startUrlsarrayPage URLs to convert.
urlstringConvenience field for a single page.
minWordsintegerSkip pages with fewer than N words (0 = keep all).

Example input

{
"startUrls": ["https://example.com/some-article"],
"minWords": 100
}

Output

One record per page:

{
"url": "https://example.com/some-article",
"title": "An Interesting Article",
"description": "A short summary from the page meta tags.",
"markdown": "# An Interesting Article\n\nThe cleaned body text...",
"wordCount": 842
}

Pipe the markdown field straight into your vector DB, LLM prompt, or knowledge base.

Pricing

Billed per page successfully converted. Failed/empty pages return an error record and are effectively free.

Notes

  • Works best on article/blog/documentation pages. JavaScript-only single-page apps that render content client-side may return limited text (this Actor is HTTP-only by design, for speed and low cost).
  • Identifies itself with a clear User-Agent.