Pricing

Pay per usage

llms.txt + llms-full.txt Generator for Any Docs Site

Crawl any documentation site and emit the 2026 GEO-standard llms.txt + llms-full.txt files. Makes your docs machine-friendly for ChatGPT / Claude / Perplexity / Gemini retrieval.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Yanlong Mu

Actor stats

Bookmarked

Total users

Monthly active users

5 hours ago

Last modified

llms.txt + llms-full.txt Generator

The 2026 GEO standard, generated for any documentation site in seconds.

What does this Actor do?

This Actor crawls any documentation website you give it and produces two files:

llms.txt — a concise, machine-friendly index of the site's pages, headings, and descriptions
llms-full.txt — the full text content of every page concatenated into a single retrieval-friendly Markdown file

These files are the 2026 standard adopted by ChatGPT, Claude, Perplexity, Gemini, and Bing Copilot to find and cite your documentation. Without them, your docs are invisible to AI assistants. With them, you become an authoritative source they cite when users ask questions.

Why use this Actor?

Almost no documentation sites have shipped llms.txt yet. The first 1,000 sites to publish them are going to dominate the AI citation share for their topic. This Actor lets you ship them in minutes instead of hand-writing them.

Business use cases:

SaaS vendors — make your API docs cited by ChatGPT when users ask "how do I do X with Y?"
Open-source maintainers — get your project recommended over competitors when a user asks Claude for "the best library for X"
Internal knowledge bases — feed your company's wiki into LLM-powered tools that respect llms.txt
AI-first agencies — generate llms.txt for your clients as a productized service ($50-500 per site)

How to use

Paste your docs root URL in the Documentation root URL field (e.g. https://docs.example.com)
Set Max pages (start at 100 for a smoke test, raise to 1000+ for full sites)
Click Start
Wait for the crawl to finish (typically 1-10 min for 100 pages)
Download llms.txt and llms-full.txt from the Storage tab → Key-Value Store
Upload both files to your site's root (e.g., https://yoursite.com/llms.txt)

Input

startUrl (required) — the root URL of the documentation site
maxPages — stop after this many pages (default 100, max 5000)
sameDomainOnly — restrict crawl to the same hostname (default true)
includeFullContent — include page bodies in llms-full.txt (default true; set false for outline-only)

Output

The Actor produces two outputs:

Key-value store — llms.txt and llms-full.txt as downloadable Markdown
Dataset — a single row summarizing the run (pages crawled, byte counts, root URL)

Example llms.txt produced for https://docs.example.com:

# Example Docs

> A comprehensive guide to the Example platform.

This is a machine-friendly summary of docs.example.com for AI agents and LLMs.
Full content of every page is in `llms-full.txt`.

## getting-started

- [Quickstart](https://docs.example.com/getting-started/quickstart) — Get up and running in 5 minutes.
- [Authentication](https://docs.example.com/getting-started/auth) — Setup API keys and OAuth.

## api

- [Reference](https://docs.example.com/api/reference) — Full REST API surface.
- [Webhooks](https://docs.example.com/api/webhooks) — Real-time event delivery.

Pricing

This Actor uses the Apify Pay-Per-Event model — you pay per page crawled.

First 100 pages: free trial
Per-page rate: $0.005 per page after that
Cost estimate: $0.50 for a 100-page site, $5 for a 1,000-page site

Tips

Big sites: start with maxPages: 100 to validate the crawl works on your domain, then bump up
JavaScript-heavy docs sites: if llms-full.txt looks empty, the site may require browser rendering — open an issue and we'll add a Playwright variant
Auth-walled docs: not supported yet; will refuse with a clear error
Image-heavy sites: images are not included in the text output (use a separate image scraper if needed)

FAQ

What is `llms.txt`?

llms.txt is a proposed standard (analogous to robots.txt) that tells LLMs how to consume your site. See https://llmstxt.org for the spec. Major model providers index sites with llms.txt and prefer them when answering user questions.

Why does this matter for GEO (Generative Engine Optimization)?

When ChatGPT/Claude/Perplexity answer a question, they cite sources. Sites with llms.txt are easier to cite (clean structure, no scraping noise), so they get cited more often. More citations = more authority in the AI's training/retrieval corpus = more traffic over time.

How is this different from a regular sitemap.xml?

sitemap.xml is for crawlers indexing search results. llms.txt is for LLMs answering questions. Different consumers, different format (Markdown vs XML), different content (full text vs just URLs).

Does this comply with the target site's terms of service?

The Actor only crawls publicly available pages (no auth bypass, no rate-limit evasion). You're responsible for ensuring you have the right to crawl the target site — typically true for your own docs, your employer's docs, or open-source project docs.

Support

Issues / feature requests: open in the Issues tab on the Apify console for this Actor.

Built by Ian Mu — github.com/ianymu — also author of verify-before-stop, the open-source Claude Code Stop hook against "lies of completion".

Documentation Crawler for RAG

liquid_bark/docs-crawler-for-rag

Specialized crawler for developer documentation sites. Detects frameworks (Docusaurus, GitBook, ReadTheDocs, MkDocs, Sphinx), extracts clean content, and outputs semantically chunked Markdown optimized for RAG pipelines.

Izz

LLMs.txt Generator

onescales/the-llms-txt-generator

The most powerful LLMs.txt Generator tool online. Generates LLMs.txt , llms-full.txt and markdown .md files within seconds! Get your website discovered, and recommended by ChatGPT, Claude, Google Gemini, Perplexity, Grok, and every AI. (Great for AEO, AIO, GEO, AI SEO) Made by Hi LLMs

One Scales

106

5.0

(2)

AI Readiness Auditor

rationalistic_counsel/ai-readiness-auditor

Check how AI-ready any website is. Get an AI Readiness Score (0-100) checking llms.txt, robots.txt AI crawler directives, Schema.org structured data, and meta tags. No API key needed.

J N

LLMs.txt Checker - AI Readiness Scanner

alizarin_refrigerator-owner/llms-txt-checker

Batch check websites for llms.txt files & identify AI optimization opportunities. A emerging standard that websites use to communicate with AI systems. It helps LLMs understand, What the company does, How to present information about the business, Contact information & key pages Brand guidelines

The Howlers

Ai Visibility Suite - Dark Visitors Alternative

alizarin_refrigerator-owner/ai-visibility-suite---dark-visitors-alternative

Comprehensive AI bot monitoring, robots.txt analysis, LLMs.txt generation & AI shopping optimization. Monitor AI crawlers visits, check AI compliance, generate AI-friendly configurations, and optimize for AI shopping agents. AI Bot Directory Robots.txt LLMs.txt AI Shopping Competitor AI Audit

The Howlers

LLMs.txt File Generator

justa/llms-txt-file-generator

Generate an llms.txt file from a website sitemap. Crawls all URLs, extracts titles and meta descriptions, and creates a Markdown-formatted file following the llms.txt specification. Upload then the output of your file directly on your website (Webflow, Wordpress etc.)

Benoit Eveillard

Website Metadata Extractor(sitemap, socialLinks, robotsTxt)

codescraper/website-metadata-extractor

A very fast metadata extractor to get all meta tags, robots.txt, sitemaps, social links, H1s, word count, and JSON-LD data. Also provides technology detection for a full analysis. Get your data fast for just $3/month.

CodeScraper

Rag Knowledge Graph Builder

cspnair/rag-knowledge-graph-builder

Transform websites into RAG-ready datasets. Crawls pages, chunks content into semantic segments (500-1000 tokens), and generates hypothetical questions for each chunk. No API key needed with native mode. Output: pre-indexed JSON optimized for AI retrieval with 3x better accuracy than raw text.

csp

128

5.0

(7)

Firecrawl MCP

red.cars/firecrawl-mcp

AI agents that need web data without anti-bot headaches. 20 tools for API-based web scraping, crawl, search, and extract — no proxy rotation, no stealth needed.

AutomateLab

Agentic Accessibility Checker

myro-e54de05da1/Agentic-Accessibility-Checker

Lighthouse for AI Agents. Audit your Agent-Ready Score (ARS) using 5-pillar analysis. Features structural classification (Docs, E-com, SaaS), Shadow-Markdown simulation (SNR), Semantic Desync detection, and full browser rendering. Ensure your site is discovery-ready for the agentic web.

Myro

Website Content Crawler

mikolabs/website-content-crawler

Deep-crawl websites to extract clean text, Markdown, or HTML for AI/LLM apps, RAG pipelines, and vector databases. Supports adaptive crawling, HTML cleaning, file downloads, and structured dataset output. Easily integrates with LangChain, LlamaIndex, and other LLM tools.