LLMs.txt File Generator
Pricing
Pay per usage
LLMs.txt File Generator
Generate an llms.txt file from a website sitemap. Crawls all URLs, extracts titles and meta descriptions, and creates a Markdown-formatted file following the llms.txt specification. Upload then the output of your file directly on your website (Webflow, Wordpress etc.)
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Benoit Eveillard
Actor stats
0
Bookmarked
4
Total users
3
Monthly active users
19 days ago
Last modified
Categories
Share
llms.txt Generator
Generate an llms.txt file from any website sitemap. This Actor crawls your sitemap, extracts page titles and meta descriptions, and creates a Markdown-formatted file that helps LLMs understand your website's content.
What is llms.txt?
The llms.txt file is a standardized way to provide LLMs (Large Language Models) with information about your website. Just like robots.txt helps search engines, llms.txt helps AI assistants understand your site's structure and content.
The format follows a simple Markdown structure:
# Website NameBrief description of the website.## Pages- [Page Title](https://example.com/page): Page description- [Another Page](https://example.com/another): Another description
Features
- Automatic sitemap crawling - Parses XML sitemaps including nested sitemap index files
- Smart content extraction - Extracts titles from
<title>tags and descriptions from meta tags - Flexible URL filtering - Include or exclude URLs using glob patterns
- Robots.txt compliance - Optionally respects
robots.txtdirectives - Progress tracking - Real-time status updates during crawl
- Direct download link - Get a shareable URL to your generated
llms.txtfile
Use cases
- AI-ready documentation - Make your website content easily accessible to AI assistants like ChatGPT, Claude, and others
- Content inventory - Generate a structured overview of all pages on your website
- SEO auditing - Review titles and descriptions across your entire site
- Site migration - Create a comprehensive list of pages before migrating to a new platform
- Developer documentation - Help AI coding assistants understand your project documentation
How to use
- Find your sitemap URL - Usually located at
https://yoursite.com/sitemap.xml - Enter the sitemap URL in the input field
- Configure filters (optional) - Use glob patterns to include/exclude specific URLs
- Run the Actor - Click "Start" and wait for the crawl to complete
- Download your file - Get the
llms.txtfile from the output or use the direct download link
Input
| Field | Type | Description |
|---|---|---|
sitemapUrl | string | URL of the XML sitemap to crawl (required) |
maxConcurrency | integer | Maximum concurrent requests, 1-50 (default: 5) |
maxRequestsPerCrawl | integer | Maximum pages to crawl, 0 = unlimited (default: 1000) |
respectRobotsTxt | boolean | Honor robots.txt restrictions (default: true) |
includeUrlPatterns | array | Glob patterns for URLs to include (default: ["**"]) |
excludeUrlPatterns | array | Glob patterns for URLs to exclude (default: []) |
Input example
{"sitemapUrl": "https://docs.apify.com/sitemap.xml","maxConcurrency": 10,"maxRequestsPerCrawl": 500,"includeUrlPatterns": ["**/academy/**", "**/platform/**"],"excludeUrlPatterns": ["**/api-reference/**"]}
URL pattern examples
| Pattern | Matches |
|---|---|
** | All URLs |
**/blog/** | URLs containing /blog/ |
**/docs/* | Direct children of /docs/ |
**/*.html | URLs ending with .html |
!**/tag/** | Exclude URLs containing /tag/ |
Output
The Actor produces two outputs:
1. llms.txt file
The generated llms.txt file is stored in the Key-Value Store and can be downloaded directly:
https://api.apify.com/v2/key-value-stores/{storeId}/records/llms.txt
You can find the direct link in the run output after completion.
2. Crawl statistics
The Dataset contains crawl results with detailed statistics:
{"llmsTxtUrl": "https://api.apify.com/v2/key-value-stores/abc123/records/llms.txt","statistics": {"totalDiscovered": 150,"totalAfterFiltering": 120,"successCount": 118,"errorCount": 2,"startedAt": "2024-01-15T10:00:00.000Z","finishedAt": "2024-01-15T10:01:30.000Z","durationMs": 90000}}
Cost estimation
This Actor uses minimal compute resources. Typical costs:
| Pages crawled | Estimated cost |
|---|---|
| 100 pages | ~$0.01 |
| 500 pages | ~$0.05 |
| 1,000 pages | ~$0.10 |
| 5,000 pages | ~$0.50 |
Costs may vary based on page size and response times.
Tips for best results
- Start small - Test with
maxRequestsPerCrawl: 50first to verify the output - Use filters wisely - Exclude tag pages, archives, and other low-value URLs
- Check your sitemap - Ensure your sitemap is valid and up-to-date
- Review the output - Some pages may have missing or poor descriptions
FAQ
Q: My sitemap has nested sitemaps. Will this work? A: Yes! The Actor automatically handles sitemap index files that reference other sitemaps.
Q: Can I filter out certain pages?
A: Yes, use excludeUrlPatterns with glob patterns like **/tag/** or **/author/**.
Q: What if a page has no description? A: The Actor tries multiple sources (meta description, og:description). If none are found, the page is listed without a description.
Q: How do I use the generated file?
A: Download the file and place it at the root of your website (e.g., https://yoursite.com/llms.txt).
Q: Does this respect robots.txt?
A: Yes, by default. You can disable this with respectRobotsTxt: false if needed.
Resources
Support
If you have questions or encounter issues:
- Open an issue on GitHub
- Contact support through the Apify platform
This Actor is open source and licensed under Apache-2.0.