LLMs.txt File Generator avatar
LLMs.txt File Generator

Pricing

Pay per usage

Go to Apify Store
LLMs.txt File Generator

LLMs.txt File Generator

Generate an llms.txt file from a website sitemap. Crawls all URLs, extracts titles and meta descriptions, and creates a Markdown-formatted file following the llms.txt specification. Upload then the output of your file directly on your website (Webflow, Wordpress etc.)

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Benoit Eveillard

Benoit Eveillard

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

3

Monthly active users

19 days ago

Last modified

Share

llms.txt Generator

Generate an llms.txt file from any website sitemap. This Actor crawls your sitemap, extracts page titles and meta descriptions, and creates a Markdown-formatted file that helps LLMs understand your website's content.

What is llms.txt?

The llms.txt file is a standardized way to provide LLMs (Large Language Models) with information about your website. Just like robots.txt helps search engines, llms.txt helps AI assistants understand your site's structure and content.

The format follows a simple Markdown structure:

# Website Name
Brief description of the website.
## Pages
- [Page Title](https://example.com/page): Page description
- [Another Page](https://example.com/another): Another description

Features

  • Automatic sitemap crawling - Parses XML sitemaps including nested sitemap index files
  • Smart content extraction - Extracts titles from <title> tags and descriptions from meta tags
  • Flexible URL filtering - Include or exclude URLs using glob patterns
  • Robots.txt compliance - Optionally respects robots.txt directives
  • Progress tracking - Real-time status updates during crawl
  • Direct download link - Get a shareable URL to your generated llms.txt file

Use cases

  • AI-ready documentation - Make your website content easily accessible to AI assistants like ChatGPT, Claude, and others
  • Content inventory - Generate a structured overview of all pages on your website
  • SEO auditing - Review titles and descriptions across your entire site
  • Site migration - Create a comprehensive list of pages before migrating to a new platform
  • Developer documentation - Help AI coding assistants understand your project documentation

How to use

  1. Find your sitemap URL - Usually located at https://yoursite.com/sitemap.xml
  2. Enter the sitemap URL in the input field
  3. Configure filters (optional) - Use glob patterns to include/exclude specific URLs
  4. Run the Actor - Click "Start" and wait for the crawl to complete
  5. Download your file - Get the llms.txt file from the output or use the direct download link

Input

FieldTypeDescription
sitemapUrlstringURL of the XML sitemap to crawl (required)
maxConcurrencyintegerMaximum concurrent requests, 1-50 (default: 5)
maxRequestsPerCrawlintegerMaximum pages to crawl, 0 = unlimited (default: 1000)
respectRobotsTxtbooleanHonor robots.txt restrictions (default: true)
includeUrlPatternsarrayGlob patterns for URLs to include (default: ["**"])
excludeUrlPatternsarrayGlob patterns for URLs to exclude (default: [])

Input example

{
"sitemapUrl": "https://docs.apify.com/sitemap.xml",
"maxConcurrency": 10,
"maxRequestsPerCrawl": 500,
"includeUrlPatterns": ["**/academy/**", "**/platform/**"],
"excludeUrlPatterns": ["**/api-reference/**"]
}

URL pattern examples

PatternMatches
**All URLs
**/blog/**URLs containing /blog/
**/docs/*Direct children of /docs/
**/*.htmlURLs ending with .html
!**/tag/**Exclude URLs containing /tag/

Output

The Actor produces two outputs:

1. llms.txt file

The generated llms.txt file is stored in the Key-Value Store and can be downloaded directly:

https://api.apify.com/v2/key-value-stores/{storeId}/records/llms.txt

You can find the direct link in the run output after completion.

2. Crawl statistics

The Dataset contains crawl results with detailed statistics:

{
"llmsTxtUrl": "https://api.apify.com/v2/key-value-stores/abc123/records/llms.txt",
"statistics": {
"totalDiscovered": 150,
"totalAfterFiltering": 120,
"successCount": 118,
"errorCount": 2,
"startedAt": "2024-01-15T10:00:00.000Z",
"finishedAt": "2024-01-15T10:01:30.000Z",
"durationMs": 90000
}
}

Cost estimation

This Actor uses minimal compute resources. Typical costs:

Pages crawledEstimated cost
100 pages~$0.01
500 pages~$0.05
1,000 pages~$0.10
5,000 pages~$0.50

Costs may vary based on page size and response times.

Tips for best results

  1. Start small - Test with maxRequestsPerCrawl: 50 first to verify the output
  2. Use filters wisely - Exclude tag pages, archives, and other low-value URLs
  3. Check your sitemap - Ensure your sitemap is valid and up-to-date
  4. Review the output - Some pages may have missing or poor descriptions

FAQ

Q: My sitemap has nested sitemaps. Will this work? A: Yes! The Actor automatically handles sitemap index files that reference other sitemaps.

Q: Can I filter out certain pages? A: Yes, use excludeUrlPatterns with glob patterns like **/tag/** or **/author/**.

Q: What if a page has no description? A: The Actor tries multiple sources (meta description, og:description). If none are found, the page is listed without a description.

Q: How do I use the generated file? A: Download the file and place it at the root of your website (e.g., https://yoursite.com/llms.txt).

Q: Does this respect robots.txt? A: Yes, by default. You can disable this with respectRobotsTxt: false if needed.

Resources

Support

If you have questions or encounter issues:

  • Open an issue on GitHub
  • Contact support through the Apify platform

This Actor is open source and licensed under Apache-2.0.