Website to Markdown
Pricing
from $1.050005 / actor start
Website to Markdown
Convert any web page into clean, LLM-ready Markdown. Strips nav, ads and boilerplate and returns the main article text plus title, description and word count. Perfect for RAG and AI pipelines.
Pricing
from $1.050005 / actor start
Rating
0.0
(0)
Developer
Y A
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
13 hours ago
Last modified
Categories
Share
Turn any web page into clean, LLM-ready Markdown โ no boilerplate, no HTML noise.
Feed this Actor a list of URLs and get back the main readable content of each page as tidy Markdown, with navigation, ads, scripts, and footers stripped out. Built for RAG pipelines, AI agents, LLM ingestion, content archiving, and research.
What it does
- ๐ Clean Markdown โ headings, lists, tables, blockquotes, and code blocks preserved.
- ๐งน Boilerplate removal โ drops nav, headers, footers, sidebars, forms, scripts, and ads.
- ๐ฏ Smart main-content detection โ finds
<article>/<main>/content containers automatically. - ๐ท๏ธ Metadata โ page title, meta/OpenGraph description, and word count.
- โก HTTP-only & fast โ no headless browser, so it's cheap even at scale.
Input
| Field | Type | Description |
|---|---|---|
startUrls | array | Page URLs to convert. |
url | string | Convenience field for a single page. |
minWords | integer | Skip pages with fewer than N words (0 = keep all). |
Example input
{"startUrls": ["https://example.com/some-article"],"minWords": 100}
Output
One record per page:
{"url": "https://example.com/some-article","title": "An Interesting Article","description": "A short summary from the page meta tags.","markdown": "# An Interesting Article\n\nThe cleaned body text...","wordCount": 842}
Pipe the markdown field straight into your vector DB, LLM prompt, or knowledge base.
Pricing
Billed per page successfully converted. Failed/empty pages return an error record and are effectively free.
Notes
- Works best on article/blog/documentation pages. JavaScript-only single-page apps that render content client-side may return limited text (this Actor is HTTP-only by design, for speed and low cost).
- Identifies itself with a clear User-Agent.