Website to Markdown Crawler — RAG / AI Data
Crawl any website into clean markdown and RAG chunks for AI and LLM apps. Fast, CPU-only. Structured export.
Website to Text & Markdown — AI / RAG Content Crawlerinexhaustible_glass/rag-website-crawler
URL
Title
Words
Tokens
+4 fieldsTextNumberBooleanListObject
Input
Start URLs(required)
url:https://docs.apify.com
Max pages:50
Max crawl depth:3
Stay on the same domain:true
Allow subdomains:true
Crawl linked documents (PDF/Word/Excel):true
Discover URLs from sitemap.xml:false
Only crawl URLs matching (glob)
Skip URLs matching (glob)
Chunk size (tokens):500
Chunk overlap (tokens):50
Respect robots.txt:true
Delay between requests (seconds):1
Request timeout (seconds):25
Max page size (MB):5
Use Apify Proxy (anti-block):false
Proxy groups
Output fields
URL
Title
Words
Tokens
Chunks
Doc?
Depth
AI Summary
Sign up on Apify01
Create your Apify account to access the Website to Text & Markdown — AI / RAG Content Crawler.
Start the run02
The Actor will start running based on the input automatically.
Receive the output03
Monitor the progress in real-time. You will be notified as soon as your dataset is complete and ready for review.
Integrate into your workflow04
The final output is delivered in JSON, CSV, or Excel format, ready to be plugged into your workflow.
