Website to Markdown Crawler — RAG / AI Data

Crawl any website into clean markdown and RAG chunks for AI and LLM apps. Fast, CPU-only. Structured export.

Try for free
Website to Text & Markdown — AI / RAG Content Crawler
Website to Text & Markdown — AI / RAG Content Crawlerinexhaustible_glass/rag-website-crawler
URL
Title
Words
Tokens
+4 fields
Text
Number
Boolean
List
Object

Input

Start URLs(required)
url:https://docs.apify.com
Max pages:50
Max crawl depth:3
Stay on the same domain:true
Allow subdomains:true
Crawl linked documents (PDF/Word/Excel):true
Discover URLs from sitemap.xml:false
Only crawl URLs matching (glob)
Skip URLs matching (glob)
Chunk size (tokens):500
Chunk overlap (tokens):50
Respect robots.txt:true
Delay between requests (seconds):1
Request timeout (seconds):25
Max page size (MB):5
Use Apify Proxy (anti-block):false
Proxy groups

Output fields

URL
Title
Words
Tokens
Chunks
Doc?
Depth
AI Summary

How it works

Sign up on Apify01

Create your Apify account to access the Website to Text & Markdown — AI / RAG Content Crawler.

Start the run02

The Actor will start running based on the input automatically.

Receive the output03

Monitor the progress in real-time. You will be notified as soon as your dataset is complete and ready for review.

Integrate into your workflow04

The final output is delivered in JSON, CSV, or Excel format, ready to be plugged into your workflow.

Image

Integrate Actor directly into your workflow

Choose from one of 100+ integration options we provide or integrate via API

Webhook

Webhook

n8n

n8n

Make

Make

Zapier

Zapier

Airbyte

Airbyte

Keboola

Keboola

IFTTT

IFTTT

Hubspot

Hubspot

GDrive

GDrive

Gmail

Gmail

Apify MCP

Apify MCP

GitHub

GitHub

Slack

Slack

LangChain

LangChain

LlamaIndex

LlamaIndex

Flowise

Flowise

Pinecone

Pinecone

OpenAI

OpenAI

Mastra

Mastra

Clay

Clay