Pricing

from $0.50 / 1,000 results

AI / RAG Web Crawler

Crawl any website and extract clean, LLM-ready Markdown chunks to feed AI agents, chatbots, and RAG pipelines. One row per embeddable chunk.

Pricing

from $0.50 / 1,000 results

Rating

0.0

(0)

Developer

Group Oject

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

[1.0.0] — 2026-06-15

Added

Initial release.
HTTP crawler (Crawlee CheerioCrawler) with link discovery, depth + page limits, same-domain crawling, include/exclude URL globs.
Main-content extraction: strips nav/header/footer/sidebar/scripts/ads, prefers <article>/<main>, converts to clean Markdown (node-html-markdown — no headless browser).
RAG chunking: paragraph-aware, overlapping chunks sized for embeddings; one dataset row per chunk, or one row per page when chunking is off.
Per-page error isolation, configurable concurrency, optional Apify Proxy, optional cleaned-HTML output.
Outputs: chunk rows in the dataset + SUMMARY key-value record.
Apify input / dataset / output schemas (all validated), README, and Store listing copy.
Vitest suite covering boilerplate stripping, Markdown conversion, link preservation, and chunking (size, overlap, metadata).

Website to RAG Markdown Crawler

knotted_tussock/rag-markdown-crawler

Crawl any website or docs site and export clean Markdown plus JSONL-style chunks for RAG, LLM apps, and AI agents.

Ralph T

Web-to-Markdown Generator for AI & RAG Pipelines

profitstack/web-to-markdown-generator-for-ai-rag-pipelines

Convert any website into clean, heading-based chunking, LLM-ready Markdown for RAG and AI agents.

Manas Mantri

Website Content Scraper: Clean Markdown for AI and RAG

scrapemint/website-content-scraper

Crawl any website and get clean markdown, text, or HTML per page, ready for RAG pipelines, chatbots, and LLM fine tuning. Plain HTTP, no browser, no API key. Pay per page.

Ken M

AI-Ready Website Crawler

optimus-fulcria/ai-ready-website-crawler

Crawl websites and convert to clean markdown for AI/RAG, LLM fine-tuning, and document pipelines.

Fulcria Labs

Website Content Crawler — Text, Markdown & HTML for AI/LLM

hichemdev/website-content-crawler

Crawl any website and extract clean text, Markdown, and HTML from every page — ready for LLM, RAG, and AI ingestion.

Hichem Ben Moussa

AI Web Extractor: URL → Clean Markdown + JSON for LLM/RAG

boxbox10/ai-web-extractor

Turn any URL into clean, LLM-ready Markdown + structured JSON (title, headings, main content, links, metadata, token count). Perfect for RAG pipelines, AI agents, and LLM context.

Marvin Eguilos

RAG Web Browser

travelmonitorlab/rag-web-browser

Search Google & extract clean Markdown from any URL — built for AI agents, RAG pipelines & LLM apps. Structured JSON output. API + MCP ready. $0.003/page.

Travel Monitor Lab

Website to Markdown Crawler for LLM & RAG

logiover/website-text-markdown-crawler

Crawl any website to clean Markdown and plain text for LLM training and RAG. HTML to Markdown, no API or login. Export website text to CSV or JSON.

Logiover

RAG Web Extractor — Clean Markdown, HTML & Chunks

junipr/rag-web-extractor

Extract clean website content for RAG and AI search. Crawl pages, remove boilerplate, preserve structure, and export markdown, HTML, text, JSON, and chunks.