Webpage to Markdown Converter for LLMs avatar

Webpage to Markdown Converter for LLMs

Pricing

from $1.00 / 1,000 page converteds

Go to Apify Store
Webpage to Markdown Converter for LLMs

Webpage to Markdown Converter for LLMs

Convert any URL into clean Markdown text. Remove ads and navbars to perfectly format web content for AI and RAG ingestion.

Pricing

from $1.00 / 1,000 page converteds

Rating

0.0

(0)

Developer

Andok

Andok

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

19 days ago

Last modified

Share

Web Page to Markdown Converter for LLMs

Convert any webpage into clean, structured Markdown optimized for LLMs and RAG pipelines. Stop wasting tokens on HTML boilerplate — get only the core content with metadata, ready for AI ingestion. Process hundreds of URLs in a single run with configurable concurrency.

Features

  • Readability cleaning — strips ads, navigation, sidebars, and footers using Mozilla Readability
  • Markdown formatting — converts article HTML to well-structured Markdown with ATX headings and fenced code blocks
  • Bulk processing — convert hundreds of URLs in a single run
  • Metadata extraction — captures page title, author byline, and excerpt alongside the Markdown content
  • Redirect handling — follows HTTP redirects and reports the final URL
  • Configurable concurrency — control parallel processing from 1 to 50 simultaneous requests
  • Pay-per-event pricing — pay only for pages successfully converted

Input

FieldTypeRequiredDefaultDescription
urlsarrayYesList of webpage URLs to convert to Markdown
timeoutSecondsintegerNo15Maximum seconds to wait for each URL response
concurrencyintegerNo10Number of URLs to process in parallel (1-50)

Input Example

{
"urls": [
"https://crawlee.dev",
"https://docs.apify.com/academy/web-scraping-for-beginners"
],
"timeoutSeconds": 15,
"concurrency": 10
}

Output

Each URL produces one dataset item containing the converted Markdown and extracted metadata.

Key output fields:

  • inputUrl (string) — the original URL provided
  • finalUrl (string) — the URL after following redirects
  • status (number) — HTTP status code
  • pageTitle (string) — extracted article title
  • markdown (string) — the full article content converted to Markdown
  • excerpt (string) — short summary or description of the article
  • byline (string) — author name if available
  • error (string) — error message if conversion failed, otherwise null

Output Example

{
"inputUrl": "https://crawlee.dev",
"finalUrl": "https://crawlee.dev/",
"status": 200,
"pageTitle": "Crawlee - Build reliable crawlers. Fast.",
"markdown": "# Crawlee\n\nBuild reliable crawlers. Fast.\n\nCrawlee is a web scraping and browser automation library...",
"excerpt": "Crawlee is a web scraping and browser automation library for Node.js.",
"byline": null,
"error": null
}

Pricing

EventCost
Page ConvertedPay-per-event (see actor pricing page)

The actor respects the per-run max charge limit. Processing stops automatically when the spending cap is reached.

Use Cases

  • RAG pipeline ingestion — convert documentation sites and knowledge bases into Markdown for vector database indexing
  • LLM context preparation — clean web content for ChatGPT, Claude, or other LLM context windows without HTML noise
  • Documentation migration — bulk-convert web pages to Markdown files for static site generators
  • Content archiving — save readable article snapshots in a portable, version-control-friendly format
  • AI training data — prepare clean text corpora from web sources for fine-tuning or evaluation
ActorWhat it adds
Article Text Extractor for TTS & AIPlain text output optimized for text-to-speech and summarization
PDF to Text Converter for AI & RAGExtend your pipeline to extract text from PDF documents
YouTube Transcript Scraper for AI & RAGAdd video transcript extraction to your content pipeline