llms.txt + llms-full.txt Generator for Any Docs Site
Pricing
Pay per usage
llms.txt + llms-full.txt Generator for Any Docs Site
llms-txt-converter is an Apify Actor that crawls any documentation site and outputs the two GEO-standard files (llms.txt and llms-full.txt) used by ChatGPT, Claude, Perplexity, Gemini, and Bing Copilot to find and cite your docs.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Yanlong Mu
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
21 days ago
Last modified
Categories
Share
llms.txt + llms-full.txt Generator
llms-txt-converter is an Apify Actor that crawls any documentation site and outputs the two standard files (llms.txt and llms-full.txt) that ChatGPT, Claude, Perplexity, Gemini, and Bing Copilot use to find and cite your docs. It runs in minutes, costs cents, and produces clean Markdown — no hand-writing required.
Best for / Not for
| ✓ Best for | ✗ Not for |
|---|---|
| SaaS / API docs teams who want ChatGPT to cite them when users ask "how do I do X with Y?" | Auth-walled docs sites (no login support — public pages only) |
Open-source maintainers shipping llms.txt so Claude recommends their library | Image-heavy or video-heavy content (text-only extraction) |
Agencies productizing llms.txt generation as a $50–$500 per-site offering | Sites where you don't have the right to crawl — verify ToS first |
Example input → output
Input:
{"startUrl": "https://docs.example.com","maxPages": 100,"sameDomainOnly": true,"includeFullContent": true}
Output (llms.txt, truncated):
# Example Docs> A comprehensive guide to the Example platform.This is a machine-friendly summary of docs.example.com for AI agents and LLMs.Full content of every page is in `llms-full.txt`.## getting-started- [Quickstart](https://docs.example.com/getting-started/quickstart) — Get up and running in 5 minutes.- [Authentication](https://docs.example.com/getting-started/auth) — Setup API keys and OAuth.## api- [Reference](https://docs.example.com/api/reference) — Full REST API surface.- [Webhooks](https://docs.example.com/api/webhooks) — Real-time event delivery.
A second file, llms-full.txt, contains the full body text of every page concatenated together.
FAQ
Does this work with Claude Code?
Yes. Drop the generated llms.txt at your site root (e.g. https://yoursite.com/llms.txt) and any Claude Code session that touches your docs URL can read it directly via WebFetch. No special integration needed.
Is it free?
The first 100 pages of your first run are a free trial. After that the Actor uses Apify's pay-per-event model at $0.005 per page (so $0.50 for a 100-page site, $5 for a 1,000-page site).
Does it scan private repos or auth-walled docs?
No. The Actor only crawls publicly reachable URLs. There is no auth bypass, no rate-limit evasion. If your docs are behind a login, this Actor will refuse them with a clear error.
Can I run it in CI?
Yes — Apify exposes the Actor as a REST API endpoint. Trigger it from GitHub Actions, GitLab CI, or any cron after a doc publish to keep llms.txt fresh automatically.
Output format?
Two Markdown files in the run's Key-Value Store: llms.txt (index) and llms-full.txt (full text). Plus a JSON dataset row summarizing the run (pages crawled, byte counts, root URL). All downloadable via Apify Console or REST API.
Rate limits?
The Actor itself enforces a polite crawl rate (configurable concurrency) to avoid hammering the target. On the Apify side, default plans permit dozens of concurrent runs — sufficient for any single-site use case.
What is llms.txt and why does it matter for GEO?
llms.txt is a proposed standard (analogous to robots.txt) that tells LLMs how to consume your site. See https://llmstxt.org for the spec. Major model providers prefer sites with llms.txt when answering questions, so publishing one materially improves your AI citation share — and almost no sites have shipped one yet.
How is this different from a regular sitemap.xml?
sitemap.xml is for crawlers indexing search results. llms.txt is for LLMs answering questions. Different consumers, different format (Markdown vs XML), different content (full text vs just URLs).
Does this comply with the target site's terms of service?
The Actor only crawls publicly available pages (no auth bypass, no rate-limit evasion). You're responsible for ensuring you have the right to crawl the target site — typically true for your own docs, your employer's docs, or open-source project docs.
How to use
- Paste your docs root URL in the Documentation root URL field (e.g.
https://docs.example.com) - Set Max pages (start at 100 for a smoke test, raise to 1000+ for full sites)
- Click Start
- Wait for the crawl to finish (typically 1-10 min for 100 pages)
- Download
llms.txtandllms-full.txtfrom the Storage tab → Key-Value Store - Upload both files to your site's root (e.g.,
https://yoursite.com/llms.txt)
Input
startUrl(required) — the root URL of the documentation sitemaxPages— stop after this many pages (default 100, max 5000)sameDomainOnly— restrict crawl to the same hostname (default true)includeFullContent— include page bodies inllms-full.txt(default true; set false for outline-only)
Output
The Actor produces two outputs:
- Key-value store —
llms.txtandllms-full.txtas downloadable Markdown - Dataset — a single row summarizing the run (pages crawled, byte counts, root URL)
Pricing
This Actor uses the Apify Pay-Per-Event model — you pay per page crawled.
- First 100 pages: free trial
- Per-page rate: $0.005 per page after that
- Cost estimate: $0.50 for a 100-page site, $5 for a 1,000-page site
Tips
- Big sites: start with
maxPages: 100to validate the crawl works on your domain, then bump up - JavaScript-heavy docs sites: if
llms-full.txtlooks empty, the site may require browser rendering — open an issue and we'll add a Playwright variant - Auth-walled docs: not supported; will refuse with a clear error
- Image-heavy sites: images are not included in the text output (use a separate image scraper if needed)
Support
Issues / feature requests: open in the Issues tab on the Apify console for this Actor.
Built by Ian Mu — github.com/ianymu — also author of verify-before-stop, the open-source Claude Code Stop hook against "lies of completion".