Webpage to Markdown Converter for LLMs
Pricing
from $1.00 / 1,000 page converteds
Webpage to Markdown Converter for LLMs
Convert any URL into clean Markdown text. Remove ads and navbars to perfectly format web content for AI and RAG ingestion.
Pricing
from $1.00 / 1,000 page converteds
Rating
0.0
(0)
Developer
Andok
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
19 days ago
Last modified
Categories
Share
Web Page to Markdown Converter for LLMs
Convert any webpage into clean, structured Markdown optimized for LLMs and RAG pipelines. Stop wasting tokens on HTML boilerplate — get only the core content with metadata, ready for AI ingestion. Process hundreds of URLs in a single run with configurable concurrency.
Features
- Readability cleaning — strips ads, navigation, sidebars, and footers using Mozilla Readability
- Markdown formatting — converts article HTML to well-structured Markdown with ATX headings and fenced code blocks
- Bulk processing — convert hundreds of URLs in a single run
- Metadata extraction — captures page title, author byline, and excerpt alongside the Markdown content
- Redirect handling — follows HTTP redirects and reports the final URL
- Configurable concurrency — control parallel processing from 1 to 50 simultaneous requests
- Pay-per-event pricing — pay only for pages successfully converted
Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
urls | array | Yes | — | List of webpage URLs to convert to Markdown |
timeoutSeconds | integer | No | 15 | Maximum seconds to wait for each URL response |
concurrency | integer | No | 10 | Number of URLs to process in parallel (1-50) |
Input Example
{"urls": ["https://crawlee.dev","https://docs.apify.com/academy/web-scraping-for-beginners"],"timeoutSeconds": 15,"concurrency": 10}
Output
Each URL produces one dataset item containing the converted Markdown and extracted metadata.
Key output fields:
inputUrl(string) — the original URL providedfinalUrl(string) — the URL after following redirectsstatus(number) — HTTP status codepageTitle(string) — extracted article titlemarkdown(string) — the full article content converted to Markdownexcerpt(string) — short summary or description of the articlebyline(string) — author name if availableerror(string) — error message if conversion failed, otherwisenull
Output Example
{"inputUrl": "https://crawlee.dev","finalUrl": "https://crawlee.dev/","status": 200,"pageTitle": "Crawlee - Build reliable crawlers. Fast.","markdown": "# Crawlee\n\nBuild reliable crawlers. Fast.\n\nCrawlee is a web scraping and browser automation library...","excerpt": "Crawlee is a web scraping and browser automation library for Node.js.","byline": null,"error": null}
Pricing
| Event | Cost |
|---|---|
| Page Converted | Pay-per-event (see actor pricing page) |
The actor respects the per-run max charge limit. Processing stops automatically when the spending cap is reached.
Use Cases
- RAG pipeline ingestion — convert documentation sites and knowledge bases into Markdown for vector database indexing
- LLM context preparation — clean web content for ChatGPT, Claude, or other LLM context windows without HTML noise
- Documentation migration — bulk-convert web pages to Markdown files for static site generators
- Content archiving — save readable article snapshots in a portable, version-control-friendly format
- AI training data — prepare clean text corpora from web sources for fine-tuning or evaluation
Related Actors
| Actor | What it adds |
|---|---|
| Article Text Extractor for TTS & AI | Plain text output optimized for text-to-speech and summarization |
| PDF to Text Converter for AI & RAG | Extend your pipeline to extract text from PDF documents |
| YouTube Transcript Scraper for AI & RAG | Add video transcript extraction to your content pipeline |