llms.txt + llms-full.txt Generator for Any Docs Site avatar

llms.txt + llms-full.txt Generator for Any Docs Site

Pricing

Pay per usage

Go to Apify Store
llms.txt + llms-full.txt Generator for Any Docs Site

llms.txt + llms-full.txt Generator for Any Docs Site

Crawl any documentation site and emit the 2026 GEO-standard llms.txt + llms-full.txt files. Makes your docs machine-friendly for ChatGPT / Claude / Perplexity / Gemini retrieval.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Yanlong Mu

Yanlong Mu

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 hours ago

Last modified

Share

llms.txt + llms-full.txt Generator

The 2026 GEO standard, generated for any documentation site in seconds.

What does this Actor do?

This Actor crawls any documentation website you give it and produces two files:

  • llms.txt — a concise, machine-friendly index of the site's pages, headings, and descriptions
  • llms-full.txt — the full text content of every page concatenated into a single retrieval-friendly Markdown file

These files are the 2026 standard adopted by ChatGPT, Claude, Perplexity, Gemini, and Bing Copilot to find and cite your documentation. Without them, your docs are invisible to AI assistants. With them, you become an authoritative source they cite when users ask questions.

Why use this Actor?

Almost no documentation sites have shipped llms.txt yet. The first 1,000 sites to publish them are going to dominate the AI citation share for their topic. This Actor lets you ship them in minutes instead of hand-writing them.

Business use cases:

  • SaaS vendors — make your API docs cited by ChatGPT when users ask "how do I do X with Y?"
  • Open-source maintainers — get your project recommended over competitors when a user asks Claude for "the best library for X"
  • Internal knowledge bases — feed your company's wiki into LLM-powered tools that respect llms.txt
  • AI-first agencies — generate llms.txt for your clients as a productized service ($50-500 per site)

How to use

  1. Paste your docs root URL in the Documentation root URL field (e.g. https://docs.example.com)
  2. Set Max pages (start at 100 for a smoke test, raise to 1000+ for full sites)
  3. Click Start
  4. Wait for the crawl to finish (typically 1-10 min for 100 pages)
  5. Download llms.txt and llms-full.txt from the Storage tab → Key-Value Store
  6. Upload both files to your site's root (e.g., https://yoursite.com/llms.txt)

Input

  • startUrl (required) — the root URL of the documentation site
  • maxPages — stop after this many pages (default 100, max 5000)
  • sameDomainOnly — restrict crawl to the same hostname (default true)
  • includeFullContent — include page bodies in llms-full.txt (default true; set false for outline-only)

Output

The Actor produces two outputs:

  1. Key-value storellms.txt and llms-full.txt as downloadable Markdown
  2. Dataset — a single row summarizing the run (pages crawled, byte counts, root URL)

Example llms.txt produced for https://docs.example.com:

# Example Docs
> A comprehensive guide to the Example platform.
This is a machine-friendly summary of docs.example.com for AI agents and LLMs.
Full content of every page is in `llms-full.txt`.
## getting-started
- [Quickstart](https://docs.example.com/getting-started/quickstart) — Get up and running in 5 minutes.
- [Authentication](https://docs.example.com/getting-started/auth) — Setup API keys and OAuth.
## api
- [Reference](https://docs.example.com/api/reference) — Full REST API surface.
- [Webhooks](https://docs.example.com/api/webhooks) — Real-time event delivery.

Pricing

This Actor uses the Apify Pay-Per-Event model — you pay per page crawled.

  • First 100 pages: free trial
  • Per-page rate: $0.005 per page after that
  • Cost estimate: $0.50 for a 100-page site, $5 for a 1,000-page site

Tips

  • Big sites: start with maxPages: 100 to validate the crawl works on your domain, then bump up
  • JavaScript-heavy docs sites: if llms-full.txt looks empty, the site may require browser rendering — open an issue and we'll add a Playwright variant
  • Auth-walled docs: not supported yet; will refuse with a clear error
  • Image-heavy sites: images are not included in the text output (use a separate image scraper if needed)

FAQ

What is llms.txt?

llms.txt is a proposed standard (analogous to robots.txt) that tells LLMs how to consume your site. See https://llmstxt.org for the spec. Major model providers index sites with llms.txt and prefer them when answering user questions.

Why does this matter for GEO (Generative Engine Optimization)?

When ChatGPT/Claude/Perplexity answer a question, they cite sources. Sites with llms.txt are easier to cite (clean structure, no scraping noise), so they get cited more often. More citations = more authority in the AI's training/retrieval corpus = more traffic over time.

How is this different from a regular sitemap.xml?

sitemap.xml is for crawlers indexing search results. llms.txt is for LLMs answering questions. Different consumers, different format (Markdown vs XML), different content (full text vs just URLs).

Does this comply with the target site's terms of service?

The Actor only crawls publicly available pages (no auth bypass, no rate-limit evasion). You're responsible for ensuring you have the right to crawl the target site — typically true for your own docs, your employer's docs, or open-source project docs.

Support

Issues / feature requests: open in the Issues tab on the Apify console for this Actor.

Built by Ian Mu — github.com/ianymu — also author of verify-before-stop, the open-source Claude Code Stop hook against "lies of completion".