# Site to llms.txt Generator (`devsef/site-to-llms-txt`) Actor

Generate a complete llms.txt file for any website in one run. Crawls up to 200 same-origin pages, extracts titles and meta descriptions, and outputs a clean, spec-compliant llms.txt that makes your site readable for AI assistants and agents.

- **URL**: https://apify.com/devsef/site-to-llms-txt.md
- **Developed by:** [Steffano van Hoven](https://apify.com/devsef) (community)
- **Categories:** AI, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

### What does Site to llms.txt do?

**Site to llms.txt** crawls any website and generates a [llms.txt](https://llmstxt.org) file in one run. The llms.txt standard gives AI assistants a structured, machine-readable overview of your site — so tools like Claude, ChatGPT, and Perplexity can accurately answer questions about your content. Point the Actor at your docs, marketing site, or product pages, and receive a ready-to-publish `llms.txt` within minutes.

### Why use Site to llms.txt?

- **AI discoverability** — LLMs increasingly respect `llms.txt` the way search engines respect `robots.txt`. A well-structured file improves how AI tools cite and represent your content.
- **Zero setup** — no code, no CLI, no configuration files. Paste a URL and run.
- **Same-origin crawl** — only pages on your own domain are collected, so you stay in control.
- **Runs on Apify** — full API access, scheduling, webhook notifications, and run history out of the box.

### How to use Site to llms.txt

1. Open the Actor in the Apify Console and click **Try for free**.
2. Enter the **Website URL** (e.g. `https://docs.yoursite.com`).
3. Optionally set **Max pages** (default 30, maximum 200).
4. Click **Start** and wait for the run to finish (typically under 2 minutes for 30 pages).
5. Download your `llms.txt` from the **Key-Value Store** output tab.

### Input

| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `url` | string | yes | — | Start URL of the website to crawl |
| `maxPages` | integer | no | 30 | Maximum pages to crawl (1–200) |
| `siteName` | string | no | hostname | Overrides the H1 heading in llms.txt |
| `summary` | string | no | meta description | One-line summary line in llms.txt |

**Example input:**

```json
{
  "url": "https://docs.apify.com",
  "maxPages": 10
}
````

### Output

The Actor produces two outputs:

**Key-Value Store — `llms.txt` (text/plain):** The generated file, ready to publish at `https://yoursite.com/llms.txt`.

**Dataset:** One row per run with `url`, `pagesCrawled`, and `llmsTxt` fields.

**Example `llms.txt` output (first 10 lines from a real run on docs.apify.com):**

```
## docs.apify.com

> Overview of docs.apify.com

### academy

- [Apify Academy | Academy | Apify Documentation](https://docs.apify.com/academy): Learn everything about web scraping and automation with our free courses that will turn you into an expert scraper developer.

### api

- [Apify API | Apify Documentation](https://docs.apify.com/api/v2): The Apify API (version 2) provides programmatic access to the Apify
```

You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.

### Pricing

This Actor uses **pay-per-event** pricing: **1 event is charged per successfully generated llms.txt file**, regardless of how many pages were crawled. You are not charged for failed runs or runs that produced no output.

Check the Apify pricing page for the current cost per event. For most sites, a single run costs less than $0.01.

### Limitations

- **Same-origin only** — links to external domains are not followed.
- **Maximum 200 pages** — for larger sites, crawl sections separately and merge the results.
- **No JavaScript rendering** — pages that require JavaScript to load their content will return empty or partial data. Use a Playwright-based Actor for JS-heavy sites.
- **Meta description as summary** — if the homepage has no `<meta name="description">`, the summary falls back to a generic `Overview of <hostname>` line. Override it with the `summary` input field for a better result.

### FAQ and support

**Is this legal?** The Actor only crawls pages your web server already serves publicly. It respects server-imposed limits (timeouts, connection errors). Always verify you have the right to crawl the target site.

**The summary says "Overview of ..." — why?** Your homepage does not have a `<meta name="description">` tag, so the Actor used its generic fallback. Set the `summary` input field to provide a better one manually.

**Can I automate this?** Yes — use the Apify scheduler to regenerate your `llms.txt` weekly, or trigger it via webhook whenever your docs are published.

For bugs or feature requests, open an issue in the Issues tab. For a custom enterprise solution, contact Apify support.

# Actor input Schema

## `url` (type: `string`):

Start URL of the website to generate llms.txt for, e.g. https://example.com

## `maxPages` (type: `integer`):

Maximum number of pages to crawl (1-200)

## `siteName` (type: `string`):

Overrides the H1 in llms.txt. Defaults to the hostname.

## `summary` (type: `string`):

One-line site summary. Defaults to the homepage meta description.

## Actor input object example

```json
{
  "url": "https://docs.apify.com",
  "maxPages": 30
}
```

# Actor output Schema

## `llmsTxtFile` (type: `string`):

No description

## `dataset` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "url": "https://docs.apify.com"
};

// Run the Actor and wait for it to finish
const run = await client.actor("devsef/site-to-llms-txt").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "url": "https://docs.apify.com" }

# Run the Actor and wait for it to finish
run = client.actor("devsef/site-to-llms-txt").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "url": "https://docs.apify.com"
}' |
apify call devsef/site-to-llms-txt --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=devsef/site-to-llms-txt",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Site to llms.txt Generator",
        "description": "Generate a complete llms.txt file for any website in one run. Crawls up to 200 same-origin pages, extracts titles and meta descriptions, and outputs a clean, spec-compliant llms.txt that makes your site readable for AI assistants and agents.",
        "version": "0.0",
        "x-build-id": "cMnuxUXrRsdbtU82H"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/devsef~site-to-llms-txt/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-devsef-site-to-llms-txt",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/devsef~site-to-llms-txt/runs": {
            "post": {
                "operationId": "runs-sync-devsef-site-to-llms-txt",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/devsef~site-to-llms-txt/run-sync": {
            "post": {
                "operationId": "run-sync-devsef-site-to-llms-txt",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "url"
                ],
                "properties": {
                    "url": {
                        "title": "Website URL",
                        "type": "string",
                        "description": "Start URL of the website to generate llms.txt for, e.g. https://example.com"
                    },
                    "maxPages": {
                        "title": "Max pages",
                        "minimum": 1,
                        "maximum": 200,
                        "type": "integer",
                        "description": "Maximum number of pages to crawl (1-200)",
                        "default": 30
                    },
                    "siteName": {
                        "title": "Site name (optional)",
                        "type": "string",
                        "description": "Overrides the H1 in llms.txt. Defaults to the hostname."
                    },
                    "summary": {
                        "title": "Summary (optional)",
                        "type": "string",
                        "description": "One-line site summary. Defaults to the homepage meta description."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
