# Website to Markdown Crawler — RAG / AI Data

**Use case:** 

Crawl any website into clean markdown and RAG chunks for AI and LLM apps. Fast, CPU-only. Structured export.

## Input

```json
{
  "startUrls": [
    {
      "url": "https://docs.apify.com"
    }
  ],
  "maxPages": 50,
  "maxDepth": 3,
  "sameDomainOnly": true,
  "allowSubdomains": true,
  "includeDocuments": true,
  "crawlSitemaps": false,
  "includeUrlGlobs": [],
  "excludeUrlGlobs": [],
  "chunkSizeTokens": 500,
  "chunkOverlapTokens": 50,
  "respectRobotsTxt": true,
  "crawlDelaySeconds": 1,
  "requestTimeoutSecs": 25,
  "maxPageSizeMb": 5,
  "useProxy": false,
  "proxyGroups": [],
  "aiProvider": ""
}
```

## Output

```json
{
  "url": {
    "label": "URL",
    "format": "link"
  },
  "title": {
    "label": "Title"
  },
  "word_count": {
    "label": "Words"
  },
  "token_count": {
    "label": "Tokens"
  },
  "chunk_count": {
    "label": "Chunks"
  },
  "is_document": {
    "label": "Doc?"
  },
  "depth": {
    "label": "Depth"
  },
  "ai_summary": {
    "label": "AI Summary"
  }
}
```

## About this Actor

This example demonstrates how to use [Website to Text & Markdown — AI / RAG Content Crawler](https://apify.com/inexhaustible_glass/rag-website-crawler) with a specific input configuration. Visit the [Actor detail page](https://apify.com/inexhaustible_glass/rag-website-crawler) to learn more, explore other use cases, and run it yourself.