# llms.txt + llms-full.txt Generator for Any Docs Site (`ianymu/llms-txt-converter`) Actor

Crawl any documentation site and emit the 2026 GEO-standard llms.txt + llms-full.txt files. Makes your docs machine-friendly for ChatGPT / Claude / Perplexity / Gemini retrieval.

- **URL**: https://apify.com/ianymu/llms-txt-converter.md
- **Developed by:** [Yanlong Mu](https://apify.com/ianymu) (community)
- **Categories:** AI, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## llms.txt + llms-full.txt Generator

> **The 2026 GEO standard, generated for any documentation site in seconds.**

### What does this Actor do?

This Actor crawls any documentation website you give it and produces two files:

- **`llms.txt`** — a concise, machine-friendly index of the site's pages, headings, and descriptions
- **`llms-full.txt`** — the full text content of every page concatenated into a single retrieval-friendly Markdown file

These files are the 2026 standard adopted by **ChatGPT, Claude, Perplexity, Gemini, and Bing Copilot** to find and cite your documentation. Without them, your docs are invisible to AI assistants. With them, you become an authoritative source they cite when users ask questions.

### Why use this Actor?

Almost no documentation sites have shipped `llms.txt` yet. The first 1,000 sites to publish them are going to dominate the AI citation share for their topic. This Actor lets you ship them in minutes instead of hand-writing them.

**Business use cases:**

- **SaaS vendors** — make your API docs cited by ChatGPT when users ask "how do I do X with Y?"
- **Open-source maintainers** — get your project recommended over competitors when a user asks Claude for "the best library for X"
- **Internal knowledge bases** — feed your company's wiki into LLM-powered tools that respect `llms.txt`
- **AI-first agencies** — generate `llms.txt` for your clients as a productized service ($50-500 per site)

### How to use

1. Paste your docs root URL in the **Documentation root URL** field (e.g. `https://docs.example.com`)
2. Set **Max pages** (start at 100 for a smoke test, raise to 1000+ for full sites)
3. Click **Start**
4. Wait for the crawl to finish (typically 1-10 min for 100 pages)
5. Download `llms.txt` and `llms-full.txt` from the **Storage** tab → **Key-Value Store**
6. Upload both files to your site's root (e.g., `https://yoursite.com/llms.txt`)

### Input

- **`startUrl`** *(required)* — the root URL of the documentation site
- **`maxPages`** — stop after this many pages (default 100, max 5000)
- **`sameDomainOnly`** — restrict crawl to the same hostname (default true)
- **`includeFullContent`** — include page bodies in `llms-full.txt` (default true; set false for outline-only)

### Output

The Actor produces two outputs:

1. **Key-value store** — `llms.txt` and `llms-full.txt` as downloadable Markdown
2. **Dataset** — a single row summarizing the run (pages crawled, byte counts, root URL)

Example `llms.txt` produced for `https://docs.example.com`:

```markdown
## Example Docs

> A comprehensive guide to the Example platform.

This is a machine-friendly summary of docs.example.com for AI agents and LLMs.
Full content of every page is in `llms-full.txt`.

### getting-started

- [Quickstart](https://docs.example.com/getting-started/quickstart) — Get up and running in 5 minutes.
- [Authentication](https://docs.example.com/getting-started/auth) — Setup API keys and OAuth.

### api

- [Reference](https://docs.example.com/api/reference) — Full REST API surface.
- [Webhooks](https://docs.example.com/api/webhooks) — Real-time event delivery.
````

### Pricing

This Actor uses the Apify Pay-Per-Event model — you pay per page crawled.

- **First 100 pages**: free trial
- **Per-page rate**: $0.005 per page after that
- **Cost estimate**: $0.50 for a 100-page site, $5 for a 1,000-page site

### Tips

- **Big sites**: start with `maxPages: 100` to validate the crawl works on your domain, then bump up
- **JavaScript-heavy docs sites**: if `llms-full.txt` looks empty, the site may require browser rendering — open an issue and we'll add a Playwright variant
- **Auth-walled docs**: not supported yet; will refuse with a clear error
- **Image-heavy sites**: images are not included in the text output (use a separate image scraper if needed)

### FAQ

#### What is `llms.txt`?

`llms.txt` is a proposed standard (analogous to `robots.txt`) that tells LLMs how to consume your site. See https://llmstxt.org for the spec. Major model providers index sites with `llms.txt` and prefer them when answering user questions.

#### Why does this matter for GEO (Generative Engine Optimization)?

When ChatGPT/Claude/Perplexity answer a question, they cite sources. Sites with `llms.txt` are easier to cite (clean structure, no scraping noise), so they get cited more often. More citations = more authority in the AI's training/retrieval corpus = more traffic over time.

#### How is this different from a regular sitemap.xml?

`sitemap.xml` is for crawlers indexing search results. `llms.txt` is for LLMs answering questions. Different consumers, different format (Markdown vs XML), different content (full text vs just URLs).

#### Does this comply with the target site's terms of service?

The Actor only crawls publicly available pages (no auth bypass, no rate-limit evasion). You're responsible for ensuring you have the right to crawl the target site — typically true for your own docs, your employer's docs, or open-source project docs.

### Support

Issues / feature requests: open in the **Issues** tab on the Apify console for this Actor.

Built by Ian Mu — github.com/ianymu — also author of [`verify-before-stop`](https://github.com/ianymu/claude-verify-before-stop), the open-source Claude Code Stop hook against "lies of completion".

# Actor input Schema

## `startUrl` (type: `string`):

The root URL of the documentation site to crawl. Examples: https://docs.anthropic.com, https://nextjs.org/docs, https://vercel.com/docs

## `maxPages` (type: `integer`):

Stop after this many pages. Larger docs sites can run into the thousands — start at 100 for a smoke test.

## `sameDomainOnly` (type: `boolean`):

Restrict crawl to the same hostname as startUrl. Recommended.

## `includeFullContent` (type: `boolean`):

If false, llms-full.txt only contains titles + headings (smaller, faster).

## Actor input object example

```json
{
  "startUrl": "https://docs.anthropic.com",
  "maxPages": 100,
  "sameDomainOnly": true,
  "includeFullContent": true
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrl": "https://docs.anthropic.com"
};

// Run the Actor and wait for it to finish
const run = await client.actor("ianymu/llms-txt-converter").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "startUrl": "https://docs.anthropic.com" }

# Run the Actor and wait for it to finish
run = client.actor("ianymu/llms-txt-converter").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrl": "https://docs.anthropic.com"
}' |
apify call ianymu/llms-txt-converter --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=ianymu/llms-txt-converter",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "llms.txt + llms-full.txt Generator for Any Docs Site",
        "description": "Crawl any documentation site and emit the 2026 GEO-standard llms.txt + llms-full.txt files. Makes your docs machine-friendly for ChatGPT / Claude / Perplexity / Gemini retrieval.",
        "version": "0.0",
        "x-build-id": "UPrM2dkovAHawFdNO"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/ianymu~llms-txt-converter/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-ianymu-llms-txt-converter",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/ianymu~llms-txt-converter/runs": {
            "post": {
                "operationId": "runs-sync-ianymu-llms-txt-converter",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/ianymu~llms-txt-converter/run-sync": {
            "post": {
                "operationId": "run-sync-ianymu-llms-txt-converter",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrl"
                ],
                "properties": {
                    "startUrl": {
                        "title": "Documentation root URL",
                        "type": "string",
                        "description": "The root URL of the documentation site to crawl. Examples: https://docs.anthropic.com, https://nextjs.org/docs, https://vercel.com/docs"
                    },
                    "maxPages": {
                        "title": "Max pages to crawl",
                        "minimum": 1,
                        "maximum": 5000,
                        "type": "integer",
                        "description": "Stop after this many pages. Larger docs sites can run into the thousands — start at 100 for a smoke test.",
                        "default": 100
                    },
                    "sameDomainOnly": {
                        "title": "Same-domain only",
                        "type": "boolean",
                        "description": "Restrict crawl to the same hostname as startUrl. Recommended.",
                        "default": true
                    },
                    "includeFullContent": {
                        "title": "Include full page content in llms-full.txt",
                        "type": "boolean",
                        "description": "If false, llms-full.txt only contains titles + headings (smaller, faster).",
                        "default": true
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
