# 📄 llms.txt Generator & Validator — AI Search Visibility (`upstanding_biobot/llms-txt-generator`) Actor

Generate llms.txt files for AI search visibility. Fetches sitemap, validates structure, checks AI bot access (OAI-SearchBot, PerplexityBot, ClaudeBot, Google-Extended). 2 blue ocean competitors only.

- **URL**: https://apify.com/upstanding\_biobot/llms-txt-generator.md
- **Developed by:** [Alexander Maksimchuk](https://apify.com/upstanding_biobot) (community)
- **Categories:** SEO tools, AI, Developer tools
- **Stats:** 2 total users, 1 monthly users, 0.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 📄 llms.txt Generator & Validator

Generate `llms.txt` files from your sitemap and validate AI search engine visibility. Checks robots.txt for AI bot access (OAI-SearchBot, PerplexityBot, ClaudeBot, Google-Extended), scores your AI discoverability 0–100, and produces a ready-to-deploy `llms.txt` file.

### 📋 Table of Contents

- [What is this?](#-what-is-this)
- [How to use](#-how-to-use)
- [Input](#-input)
- [Output](#-output)
- [JSON output example](#-json-output-example)
- [How much does it cost?](#-how-much-does-it-cost)
- [Use cases](#-use-cases)
- [Integrations](#-integrations)
- [Is it legal?](#-is-it-legal)
- [Troubleshooting](#-troubleshooting)
- [FAQ](#-faq)
- [Tech stack](#-tech-stack)
- [Feedback & issues](#-feedback--issues)

### 🤔 What is this?

`llms.txt` is a new standard (llmstxt.org) that tells AI search engines — ChatGPT Search, Perplexity, Claude, Gemini — what your website is about and which pages matter most. It's like `robots.txt` but for AI citations.

This Actor:
1. **Generates** `llms.txt` from your sitemap — fetches all URLs, extracts page titles/descriptions, categorizes into sections, writes spec-compliant file
2. **Validates** existing `llms.txt` — checks structure, H1/blockquote presence, line count, link depth
3. **Checks** AI bot access in `robots.txt` — verifies OAI-SearchBot, PerplexityBot, ClaudeBot, Google-Extended are allowed (not blocked)
4. **Scores** AI discoverability 0–100 across 8 categories

### 🚀 How to use

#### Step 1: Enter your URL

Provide your website root URL (e.g., `https://example.com`). The Actor auto-discovers your sitemap.

#### Step 2: Provide site name + description

Enter a human-readable site name and one-sentence description. These become the H1 header and blockquote in your `llms.txt`.

#### Step 3: Run

Click **Start**. The Actor:
- Fetches + parses your sitemap
- Extracts page metadata via headless browser
- Generates `llms.txt` with categorized sections
- Checks robots.txt for AI bot access
- Checks for AI discovery endpoints (`/.well-known/ai.txt`, `/ai/summary.json`)
- Scores your AI visibility 0–100

#### Step 4: Download

- **Key-value store: `llms.txt`** — Ready-to-deploy file
- **Key-value store: `OUTPUT`** — Full analysis JSON with score + recommendations
- **Dataset** — Summary row with score, grade, issues

### 📥 Input

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `url` | string | **required** | Root URL (e.g., `https://example.com`) |
| `siteName` | string | hostname | Human-readable site name |
| `siteDescription` | string | `"Website at..."` | One-sentence description |
| `maxUrls` | integer | 50 | Max URLs in llms.txt (5–500, keep under 200) |
| `validateOnly` | boolean | false | Only check existing llms.txt + AI bot access. Don't generate. |

### 📤 Output

| Field | Type | Description |
|-------|------|-------------|
| `llmsTxt` | string | Generated llms.txt content |
| `llmsFullTxt` | string | Same as llmsTxt (downloadable) |
| `robotsAnalysis` | object | AI bot access: which bots allowed/blocked |
| `existingLlmsTxt` | object | Analysis of existing llms.txt (if found) |
| `aiDiscovery` | object | AI discovery endpoints check |
| `sitemapEntries` | array | URLs with title, description, section |
| `score` | integer | AI visibility score 0–100 |
| `grade` | string | Critical / Foundation / Good / Excellent |
| `recommendations` | array | Prioritized action items |

### 📄 JSON output example

```json
{
  "websiteUrl": "https://example.com",
  "generatedAt": "2026-07-04T04:30:00.000Z",
  "llmsTxt": "# Example Site\n\n> A website about...\n\n## Home\n\n- [Home](https://example.com): Main page\n\n## Blog\n\n- [Latest Posts](https://example.com/blog): Blog index\n\n## Links\n\n- [XML Sitemap](https://example.com/sitemap.xml)\n- [robots.txt](https://example.com/robots.txt)",
  "sitemapUrlsFound": 47,
  "sitemapUrlsIncluded": 47,
  "robotsAnalysis": {
    "aiBots": {
      "OAI-SearchBot": true,
      "PerplexityBot": true,
      "ClaudeBot": true,
      "Google-Extended": true,
      "GPTBot": false,
      "anthropic-ai": false
    },
    "blockingTraining": ["GPTBot", "anthropic-ai"],
    "allowingCitation": ["OAI-SearchBot", "PerplexityBot", "ClaudeBot", "Google-Extended"]
  },
  "existingLlmsTxt": {
    "exists": false,
    "issues": ["llms.txt not found"]
  },
  "aiDiscovery": {
    "hasAiTxt": false,
    "hasSummaryJson": false,
    "hasFaqJson": false
  },
  "score": 50,
  "grade": "Foundation",
  "recommendations": [
    "Create llms.txt file — this Actor can generate one for you",
    "Add /.well-known/ai.txt for AI agent discovery",
    "Add /ai/summary.json for AI summary access"
  ]
}
````

### 💰 How much does it cost?

**$2 per website.** Includes sitemap fetch, page metadata extraction, llms.txt generation, robots.txt analysis, AI discovery check, and scoring. No subscription.

#### Cost examples

| Use case | Sites | Cost |
|----------|-------|------|
| Single site llms.txt | 1 | $2 |
| Agency (10 clients) | 10 | $20 |
| SEO audit portfolio (50 sites) | 50 | $100 |

### 🎯 Use cases

- **AI SEO preparation** — Get your site ready for ChatGPT Search, Perplexity, and Google AI Overviews citations
- **Periodic audits** — Track your AI visibility score over time
- **Agency deliverable** — Generate llms.txt for clients as part of SEO package
- **Competitive analysis** — Run on competitor sites to see their AI visibility setup
- **Pre-launch check** — Verify AI bot access before deploying a new site

### 🔗 Integrations

#### AI agents (MCP)

Works with Apify MCP server (`https://mcp.apify.com`). AI agents can run this Actor to check any website's AI visibility.

#### Chain with ADA Scanner

Run alongside our [ADA Compliance Checker](https://apify.com/upstanding_biobot/ada-wcag-compliance-scan) for a full AI+accessibility audit.

#### Automation

- **Zapier** — Generate llms.txt on new site deployment → push to GitHub
- **Make** — Schedule monthly AI visibility audits → email report
- **Custom** — Use Apify webhooks to trigger CI/CD llms.txt deployment

### ⚖️ Is it legal?

**Yes.** This Actor fetches public web pages (sitemap.xml, robots.txt) and generates a static text file. It does not modify, hack, or breach any website. `llms.txt` is an open standard (llmstxt.org).

### 🔧 Troubleshooting

| Problem | Cause | Fix |
|---------|-------|-----|
| **0 URLs found** | Sitemap not found or empty | Check for sitemap at `/sitemap.xml` or `/sitemap_index.xml`. Ensure robots.txt has Sitemap directive |
| **Score is 0** | No AI bot access + no llms.txt | Allow AI citation bots in robots.txt. Generate llms.txt |
| **Slow generation** | Many URLs to extract | Reduce `maxUrls` (default 50). Each URL requires browser page load for metadata |
| **Empty llms.txt** | Sitemap found but no URLs | Check sitemap XML is valid. Try `validateOnly: true` first |

### ❓ FAQ

<details>
<summary><strong>What is llms.txt?</strong></summary>

`llms.txt` is a text file at your website root (like robots.txt) that tells AI search engines what your site is about and which pages matter. Spec: llmstxt.org. Structure: H1 (site name) → blockquote (description) → H2 sections with descriptive links.

</details>

<details>
<summary><strong>Which AI bots should I allow?</strong></summary>

**Citation bots** (allow these): `OAI-SearchBot` (ChatGPT Search), `PerplexityBot` (Perplexity), `ClaudeBot` (Claude), `Google-Extended` (Gemini AI Overviews).

**Training bots** (block if desired): `GPTBot` (OpenAI training), `anthropic-ai` (Anthropic training). Block these to prevent your content being used for model training while still allowing AI search citations.

</details>

<details>
<summary><strong>Does this generate /ai/summary.json?</strong></summary>

No — this Actor checks for its existence but doesn't create it. `/ai/summary.json` is a separate standard for providing AI agents a structured summary of your site. This Actor focuses on `llms.txt` generation + AI bot access validation.

</details>

### 🛠️ Tech stack

- **[Apify SDK](https://docs.apify.com/sdk/js/)** — Platform integration
- **[Playwright](https://playwright.dev/)** — Headless Chromium for page metadata extraction
- **[llmstxt.org](https://llmstxt.org)** — llms.txt specification

### 💬 Feedback & issues

- **Apify Console** — Issues tab on this Actor's page
- **Review** — Rate on Store page

# Actor input Schema

## `url` (type: `string`):

Root URL of the website (e.g., https://example.com). Sitemap will be auto-discovered.

## `siteName` (type: `string`):

Human-readable site name for llms.txt header.

## `siteDescription` (type: `string`):

One-sentence description of what the site does.

## `maxUrls` (type: `integer`):

Maximum number of URLs to include. Keep under 200 for best results.

## `validateOnly` (type: `boolean`):

If true, only check for existing llms.txt and AI bot access. Don't generate new file.

## Actor input object example

```json
{
  "url": "https://example.com",
  "siteName": "Example Site",
  "siteDescription": "A website about...",
  "maxUrls": 50,
  "validateOnly": false
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "url": "https://example.com",
    "siteName": "Example Site",
    "siteDescription": "A website about..."
};

// Run the Actor and wait for it to finish
const run = await client.actor("upstanding_biobot/llms-txt-generator").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "url": "https://example.com",
    "siteName": "Example Site",
    "siteDescription": "A website about...",
}

# Run the Actor and wait for it to finish
run = client.actor("upstanding_biobot/llms-txt-generator").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "url": "https://example.com",
  "siteName": "Example Site",
  "siteDescription": "A website about..."
}' |
apify call upstanding_biobot/llms-txt-generator --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=upstanding_biobot/llms-txt-generator",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "📄 llms.txt Generator & Validator — AI Search Visibility",
        "description": "Generate llms.txt files for AI search visibility. Fetches sitemap, validates structure, checks AI bot access (OAI-SearchBot, PerplexityBot, ClaudeBot, Google-Extended). 2 blue ocean competitors only.",
        "version": "1.0",
        "x-build-id": "jCD31ZwNDEVDdBkqs"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/upstanding_biobot~llms-txt-generator/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-upstanding_biobot-llms-txt-generator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/upstanding_biobot~llms-txt-generator/runs": {
            "post": {
                "operationId": "runs-sync-upstanding_biobot-llms-txt-generator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/upstanding_biobot~llms-txt-generator/run-sync": {
            "post": {
                "operationId": "run-sync-upstanding_biobot-llms-txt-generator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "url"
                ],
                "properties": {
                    "url": {
                        "title": "Website URL",
                        "type": "string",
                        "description": "Root URL of the website (e.g., https://example.com). Sitemap will be auto-discovered."
                    },
                    "siteName": {
                        "title": "Site name",
                        "type": "string",
                        "description": "Human-readable site name for llms.txt header."
                    },
                    "siteDescription": {
                        "title": "Site description",
                        "type": "string",
                        "description": "One-sentence description of what the site does."
                    },
                    "maxUrls": {
                        "title": "Max URLs in llms.txt",
                        "minimum": 5,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Maximum number of URLs to include. Keep under 200 for best results.",
                        "default": 50
                    },
                    "validateOnly": {
                        "title": "Validate only (don't generate)",
                        "type": "boolean",
                        "description": "If true, only check for existing llms.txt and AI bot access. Don't generate new file.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
