universal-web-to-markdown avatar

universal-web-to-markdown

Pricing

from $0.50 / 1,000 results

Go to Apify Store
universal-web-to-markdown

universal-web-to-markdown

High-performance tool for AI & RAG pipelines. Converts web pages to clean Markdown by removing noise and fixing relative URLs. Built with Cheerio for extreme speed and low cost ($0.50/1k pages). Perfect for feeding clean data to LLMs.

Pricing

from $0.50 / 1,000 results

Rating

0.0

(0)

Developer

JI JUN

JI JUN

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

a day ago

Last modified

Share

🚀 Universal Web-to-Markdown (RAG-Ready API)

Universal Web-to-Markdown is a high-performance, cost-efficient tool designed specifically for AI developers and RAG (Retrieval-Augmented Generation) pipelines. It transforms any web page into clean, structured Markdown.

✨ Key Features

  • Pure Content Extraction: Removes ads, navbars, and footers automatically.
  • RAG-Optimized: Converts all relative links and images into absolute URLs for seamless LLM integration.
  • Zero-Cost Engine: Built with Cheerio for maximum speed and minimum compute usage.
  • Branded Metadata: Includes source tracking for data lineage.

🛠 How to Use

  1. Enter the Start URLs you want to crawl.
  2. (Optional) Set Max Depth to follow links.
  3. Run the Actor and get your Markdown data in JSON format!

💰 Pricing

  • Actor Start: $0.01 (One-time event)
  • Per Result: $0.50 per 1,000 pages (Only $0.0005 per page!)
  • Platform Usage: Free (Included)

Developed by hachi-dev

🔍 Before & After (Why it's perfect for RAG)

Stop feeding your LLM with noisy HTML. See the difference:

✅ After (Clean Markdown by hachi-dev):

## What is AI?
![AI diagram](https://example.com/images/ai.png)
Read more on our [blog](https://example.com/blog/ai-future).

💻 Quick Start Code Snippets

Copy and paste this into your project to start extracting data immediately.

Python (apify-client)

from apify_client import ApifyClient
# Initialize the ApifyClient with your API token
client = ApifyClient("YOUR_API_TOKEN")
# Start the Actor and wait for it to finish
run = client.actor("hachi-dev/universal-web-to-markdown").call(
run_input={ "startUrls": [{ "url": "https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/article" }] }
)
# Fetch and print the Markdown results
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item.get("markdown"))

Node.js / JavaScript

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor("hachi-dev/universal-web-to-markdown").call({
startUrls: [{ url: "https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/article" }]
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].markdown);