Universal Web to Markdown (Bulk & AI-Ready) avatar
Universal Web to Markdown (Bulk & AI-Ready)

Pricing

Pay per usage

Go to Apify Store
Universal Web to Markdown (Bulk & AI-Ready)

Universal Web to Markdown (Bulk & AI-Ready)

Bulk convert any website URLs to clean Markdown for AI & LLMs. Universal scraper that removes ads, scripts, and clutter. Optimized for RAG, ChatGPT, Claude, and LangChain. Fast, async, and API-ready.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

kalthireddy Abhishek

kalthireddy Abhishek

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

🚀 Universal Web to Markdown (AI-Ready)

Turn any website into clean, noise-free Markdown. The perfect data feeder for LLMs, RAG, and AI Agents.


🤖 Why use this Actor?

Large Language Models (like ChatGPT, Claude, and Gemini) struggle with raw HTML. It consumes too many tokens and confuses the AI with scripts, styles, and ads.

This Actor solves that. It visits URLs, strips away the junk (ads, navbars, footers), and converts the core content into clean Markdown.

✨ Features

  • ⚡ Fast & Async: Built on httpx for high-speed non-blocking extraction.
  • 📦 Bulk Processing: Add 1 or 100 URLs at once—the Actor handles the queue for you.
  • 🧹 Smart Cleaning: Automatically removes ads, scripts, sidebars, and popups.
  • 🧠 AI Optimized: Output is formatted specifically for RAG (Retrieval-Augmented Generation) pipelines.
  • 🛡️ Anti-Bot Bypass: Uses browser headers to read sites that block basic bots.

📥 Input

You can provide a single URL or a list of URLs to scrape.

Example Input (JSON):

{
"startUrls": [
{ "url": "[https://en.wikipedia.org/wiki/Artificial_intelligence](https://en.wikipedia.org/wiki/Artificial_intelligence)" },
{ "url": "[https://www.example.com](https://www.example.com)" }
]
}

📤 Output

The Actor stores results in the default dataset. You can download it in JSON, CSV, Excel, or XML.

Sample JSON Output:

[
{
"url": "[https://en.wikipedia.org/wiki/Artificial_intelligence](https://en.wikipedia.org/wiki/Artificial_intelligence)",
"title": "Artificial intelligence - Wikipedia",
"markdown": "# Artificial intelligence\n\nArtificial intelligence (AI) is intelligence demonstrated by machines..."
},
{
"url": "[https://www.example.com](https://www.example.com)",
"title": "Example Domain",
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples in documents..."
}
]

🔌 API Example (Python) Easily integrate this into your own AI agent:

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
# Run the Actor with multiple URLs
run = client.actor("YOUR_USERNAME/web-to-markdown-converter").call(run_input={
"startUrls": [
{"url": "[https://en.wikipedia.org/wiki/Artificial_intelligence](https://en.wikipedia.org/wiki/Artificial_intelligence)"},
{"url": "[https://www.example.com](https://www.example.com)"}
]
})# Get results
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["title"])
print(item["markdown"][:100]) # Print first 100 chars