# RAG Knowledge Base Ingestion

**Use case:** 

Prepare documentation for vector databases by crawling docs, excluding noisy paths, and outputting clean markdown chunks.

## Input

```json
{
  "startUrl": "https://docs.apify.com/platform",
  "maxPages": 150,
  "includeCodeBlocks": false,
  "chunkMode": "heading",
  "maxChunkWords": 350,
  "linkSelector": "",
  "excludePatterns": [
    "*/changelog/*",
    "*/blog/*",
    "*/release-notes/*"
  ],
  "waitForSelector": ""
}
```

## Output

```json
{
  "chunkId": {
    "label": "Chunk ID",
    "format": "text"
  },
  "title": {
    "label": "Title",
    "format": "text"
  },
  "url": {
    "label": "URL",
    "format": "link"
  },
  "wordCount": {
    "label": "Words",
    "format": "number"
  },
  "charCount": {
    "label": "Characters",
    "format": "number"
  },
  "tokenEstimate": {
    "label": "Est. tokens",
    "format": "number"
  },
  "headingHierarchy": {
    "label": "Breadcrumb",
    "format": "text"
  },
  "content": {
    "label": "Markdown",
    "format": "text"
  }
}
```

## About this Actor

This example demonstrates how to use [Docs-to-RAG Crawler](https://apify.com/automation-lab/docs-rag-crawler) with a specific input configuration. Visit the [Actor detail page](https://apify.com/automation-lab/docs-rag-crawler) to learn more, explore other use cases, and run it yourself.