Docs Markdown Rag Ready Crawler
Pricing
from $5.00 / 1,000 results
Docs Markdown Rag Ready Crawler
Turn any documentation site or website into clean, structured markdown—ready for RAG, embeddings, and AI agents.
Pricing
from $5.00 / 1,000 results
Turn any documentation site or website into clean, structured markdown—ready for RAG, embeddings, and AI agents.
Domain to crawl (e.g., https://docs.example.com)
Maximum number of pages to crawl
Maximum crawl depth from start URLs
Generate RAG-ready chunks with stable IDs and hashes
Extraction strategy optimized for different site types
Which datasets to generate
Also crawl subdomains of the main domain
Follow robots.txt rules for crawling
Crawling engine to use. Use 'playwright' for SPAs/JavaScript-heavy sites, 'cheerio' for static HTML sites (faster)
CSS selectors to remove from content
[ "nav", "aside", "header", "footer", ".toc", "#TableOfContents", ".table-of-contents"]Regex patterns for URLs to exclude
[ ".*(\\?|&)utm_.*", ".*(\\?|&)(ref|source|campaign)=.*"]Target number of characters per chunk
Maximum number of characters per chunk
Minimum number of characters per chunk