Pricing

Pay per usage

Universal Web to Markdown (Bulk & AI-Ready)

Bulk convert any website URLs to clean Markdown for AI & LLMs. Universal scraper that removes ads, scripts, and clutter. Optimized for RAG, ChatGPT, Claude, and LangChain. Fast, async, and API-ready.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

kalthireddy Abhishek

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

🚀 Universal Web to Markdown (AI-Ready)

Turn any website into clean, noise-free Markdown. The perfect data feeder for LLMs, RAG, and AI Agents.

🤖 Why use this Actor?

Large Language Models (like ChatGPT, Claude, and Gemini) struggle with raw HTML. It consumes too many tokens and confuses the AI with scripts, styles, and ads.

This Actor solves that. It visits URLs, strips away the junk (ads, navbars, footers), and converts the core content into clean Markdown.

✨ Features

⚡ Fast & Async: Built on httpx for high-speed non-blocking extraction.
📦 Bulk Processing: Add 1 or 100 URLs at once—the Actor handles the queue for you.
🧹 Smart Cleaning: Automatically removes ads, scripts, sidebars, and popups.
🧠 AI Optimized: Output is formatted specifically for RAG (Retrieval-Augmented Generation) pipelines.
🛡️ Anti-Bot Bypass: Uses browser headers to read sites that block basic bots.

📥 Input

You can provide a single URL or a list of URLs to scrape.

Example Input (JSON):

{
  "startUrls": [
    { "url": "[https://en.wikipedia.org/wiki/Artificial_intelligence](https://en.wikipedia.org/wiki/Artificial_intelligence)" },
    { "url": "[https://www.example.com](https://www.example.com)" }
  ]
}

📤 Output

The Actor stores results in the default dataset. You can download it in JSON, CSV, Excel, or XML.

Sample JSON Output:

[
  {
    "url": "[https://en.wikipedia.org/wiki/Artificial_intelligence](https://en.wikipedia.org/wiki/Artificial_intelligence)",
    "title": "Artificial intelligence - Wikipedia",
    "markdown": "# Artificial intelligence\n\nArtificial intelligence (AI) is intelligence demonstrated by machines..."
  },
  {
    "url": "[https://www.example.com](https://www.example.com)",
    "title": "Example Domain",
    "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples in documents..."
  }
]

🔌 API Example (Python) Easily integrate this into your own AI agent:

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

# Run the Actor with multiple URLs
run = client.actor("YOUR_USERNAME/web-to-markdown-converter").call(run_input={
    "startUrls": [
        {"url": "[https://en.wikipedia.org/wiki/Artificial_intelligence](https://en.wikipedia.org/wiki/Artificial_intelligence)"},
        {"url": "[https://www.example.com](https://www.example.com)"}
    ]
})# Get results
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["title"])
    print(item["markdown"][:100]) # Print first 100 chars

Universal Markdown Scraper for LLMs

botflowtech/universal-markdown-scraper-for-llms

Universal Markdown Scraper for LLMs

BotFlowTech

Web-to-Markdown Generator for AI & RAG Pipelines

profitstack/web-to-markdown-generator-for-ai-rag-pipelines

Convert any website into clean, heading-based chunking, LLM-ready Markdown for RAG and AI agents.

Manas Mantri

Website To Markdown

smart_api/website-to-markdown

Convert any webpage into clean, LLM-ready Markdown in seconds — perfect for AI training data, RAG pipelines, and content archiving.

SmartApi

5.0

Universal RAG Web Scraper

express_kingfisher/rag-web-scraper

Turn any website into clean, LLM-ready Markdown. Automatically strips ads, navigation, and noise using Mozilla Readability. Perfect for feeding data to ChatGPT, Claude, or Vector Databases (RAG).

Prince Raj

AI RAG Feeder V2

mickeywmoore/ai-rag-feeder-v2

Turn any website into AI-ready Markdown. Scrapes entire domains, removes ads/clutter, and formats text specifically for RAG pipelines and LLM training data.

Mickey Moore

Docs Markdown Rag Ready Crawler

devwithbobby/docs-markdown-rag-ready-crawler

Turn any documentation site or website into clean, structured markdown—ready for RAG, embeddings, and AI agents.

Dev with Bobby

Webpage to Markdown Converter for LLMs

andok/markdown-extractor

Convert any URL into clean Markdown text. Remove ads and navbars to perfectly format web content for AI and RAG ingestion.

Andok

Web Scraper RAG Ready

traorealexy/Web-Sraper-RAG-Ready

Turn any website into clean, token-efficient Markdown ready for RAG and LLM pipelines. Removes boilerplate, handles JavaScript rendering, and outputs structured JSON for LangChain, LlamaIndex, and vector databases.

Alexy Traore

Website to Clean Markdown (AI & RAG Ready)

ahmed_jasarevic/website-to-clean-markdown-ai-rag-ready

Convert any website into clean, noise-free Markdown. Perfect for training LLMs, building Custom GPTs, and RAG pipelines. Save 80% on OpenAI tokens by stripping HTML junk.

Ahmed Jasarevic

AI Markdown Maker

onescales/bulk-ai-markdown-maker

Convert any web page into clean, AI ready markdown format in seconds. Perfect for feeding content to AI models, creating documentation, or archiving web content in a portable format. In addition it intelligently parse web content, removing ads, navigation, and other clutter. Generate Markdown Today!