llms.txt + llms-full.txt Generator for Any Docs Site
Pricing
Pay per usage
llms.txt + llms-full.txt Generator for Any Docs Site
Crawl any documentation site and emit the 2026 GEO-standard llms.txt + llms-full.txt files. Makes your docs machine-friendly for ChatGPT / Claude / Perplexity / Gemini retrieval.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Yanlong Mu
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 hours ago
Last modified
Categories
Share
llms.txt + llms-full.txt Generator
The 2026 GEO standard, generated for any documentation site in seconds.
What does this Actor do?
This Actor crawls any documentation website you give it and produces two files:
llms.txt— a concise, machine-friendly index of the site's pages, headings, and descriptionsllms-full.txt— the full text content of every page concatenated into a single retrieval-friendly Markdown file
These files are the 2026 standard adopted by ChatGPT, Claude, Perplexity, Gemini, and Bing Copilot to find and cite your documentation. Without them, your docs are invisible to AI assistants. With them, you become an authoritative source they cite when users ask questions.
Why use this Actor?
Almost no documentation sites have shipped llms.txt yet. The first 1,000 sites to publish them are going to dominate the AI citation share for their topic. This Actor lets you ship them in minutes instead of hand-writing them.
Business use cases:
- SaaS vendors — make your API docs cited by ChatGPT when users ask "how do I do X with Y?"
- Open-source maintainers — get your project recommended over competitors when a user asks Claude for "the best library for X"
- Internal knowledge bases — feed your company's wiki into LLM-powered tools that respect
llms.txt - AI-first agencies — generate
llms.txtfor your clients as a productized service ($50-500 per site)
How to use
- Paste your docs root URL in the Documentation root URL field (e.g.
https://docs.example.com) - Set Max pages (start at 100 for a smoke test, raise to 1000+ for full sites)
- Click Start
- Wait for the crawl to finish (typically 1-10 min for 100 pages)
- Download
llms.txtandllms-full.txtfrom the Storage tab → Key-Value Store - Upload both files to your site's root (e.g.,
https://yoursite.com/llms.txt)
Input
startUrl(required) — the root URL of the documentation sitemaxPages— stop after this many pages (default 100, max 5000)sameDomainOnly— restrict crawl to the same hostname (default true)includeFullContent— include page bodies inllms-full.txt(default true; set false for outline-only)
Output
The Actor produces two outputs:
- Key-value store —
llms.txtandllms-full.txtas downloadable Markdown - Dataset — a single row summarizing the run (pages crawled, byte counts, root URL)
Example llms.txt produced for https://docs.example.com:
# Example Docs> A comprehensive guide to the Example platform.This is a machine-friendly summary of docs.example.com for AI agents and LLMs.Full content of every page is in `llms-full.txt`.## getting-started- [Quickstart](https://docs.example.com/getting-started/quickstart) — Get up and running in 5 minutes.- [Authentication](https://docs.example.com/getting-started/auth) — Setup API keys and OAuth.## api- [Reference](https://docs.example.com/api/reference) — Full REST API surface.- [Webhooks](https://docs.example.com/api/webhooks) — Real-time event delivery.
Pricing
This Actor uses the Apify Pay-Per-Event model — you pay per page crawled.
- First 100 pages: free trial
- Per-page rate: $0.005 per page after that
- Cost estimate: $0.50 for a 100-page site, $5 for a 1,000-page site
Tips
- Big sites: start with
maxPages: 100to validate the crawl works on your domain, then bump up - JavaScript-heavy docs sites: if
llms-full.txtlooks empty, the site may require browser rendering — open an issue and we'll add a Playwright variant - Auth-walled docs: not supported yet; will refuse with a clear error
- Image-heavy sites: images are not included in the text output (use a separate image scraper if needed)
FAQ
What is llms.txt?
llms.txt is a proposed standard (analogous to robots.txt) that tells LLMs how to consume your site. See https://llmstxt.org for the spec. Major model providers index sites with llms.txt and prefer them when answering user questions.
Why does this matter for GEO (Generative Engine Optimization)?
When ChatGPT/Claude/Perplexity answer a question, they cite sources. Sites with llms.txt are easier to cite (clean structure, no scraping noise), so they get cited more often. More citations = more authority in the AI's training/retrieval corpus = more traffic over time.
How is this different from a regular sitemap.xml?
sitemap.xml is for crawlers indexing search results. llms.txt is for LLMs answering questions. Different consumers, different format (Markdown vs XML), different content (full text vs just URLs).
Does this comply with the target site's terms of service?
The Actor only crawls publicly available pages (no auth bypass, no rate-limit evasion). You're responsible for ensuring you have the right to crawl the target site — typically true for your own docs, your employer's docs, or open-source project docs.
Support
Issues / feature requests: open in the Issues tab on the Apify console for this Actor.
Built by Ian Mu — github.com/ianymu — also author of verify-before-stop, the open-source Claude Code Stop hook against "lies of completion".