Pricing

Pay per usage

LLMs.txt File Generator

Generate an llms.txt file from a website sitemap. Crawls all URLs, extracts titles and meta descriptions, and creates a Markdown-formatted file following the llms.txt specification. Upload then the output of your file directly on your website (Webflow, Wordpress etc.)

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Benoit Eveillard

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

llms.txt Generator

Generate an llms.txt file from any website sitemap. This Actor crawls your sitemap, extracts page titles and meta descriptions, and creates a Markdown-formatted file that helps LLMs understand your website's content.

What is llms.txt?

The llms.txt file is a standardized way to provide LLMs (Large Language Models) with information about your website. Just like robots.txt helps search engines, llms.txt helps AI assistants understand your site's structure and content.

The format follows a simple Markdown structure:

# Website Name

Brief description of the website.

## Pages

- [Page Title](https://example.com/page): Page description
- [Another Page](https://example.com/another): Another description

Features

Automatic sitemap crawling - Parses XML sitemaps including nested sitemap index files
Smart content extraction - Extracts titles from <title> tags and descriptions from meta tags
Flexible URL filtering - Include or exclude URLs using glob patterns
Robots.txt compliance - Optionally respects robots.txt directives
Progress tracking - Real-time status updates during crawl
Direct download link - Get a shareable URL to your generated llms.txt file

Use cases

AI-ready documentation - Make your website content easily accessible to AI assistants like ChatGPT, Claude, and others
Content inventory - Generate a structured overview of all pages on your website
SEO auditing - Review titles and descriptions across your entire site
Site migration - Create a comprehensive list of pages before migrating to a new platform
Developer documentation - Help AI coding assistants understand your project documentation

How to use

Find your sitemap URL - Usually located at https://yoursite.com/sitemap.xml
Enter the sitemap URL in the input field
Configure filters (optional) - Use glob patterns to include/exclude specific URLs
Run the Actor - Click "Start" and wait for the crawl to complete
Download your file - Get the llms.txt file from the output or use the direct download link

Input

Field	Type	Description
`sitemapUrl`	string	URL of the XML sitemap to crawl (required)
`maxConcurrency`	integer	Maximum concurrent requests, 1-50 (default: 5)
`maxRequestsPerCrawl`	integer	Maximum pages to crawl, 0 = unlimited (default: 1000)
`respectRobotsTxt`	boolean	Honor robots.txt restrictions (default: true)
`includeUrlPatterns`	array	Glob patterns for URLs to include (default: `["**"]`)
`excludeUrlPatterns`	array	Glob patterns for URLs to exclude (default: `[]`)

Input example

{
  "sitemapUrl": "https://docs.apify.com/sitemap.xml",
  "maxConcurrency": 10,
  "maxRequestsPerCrawl": 500,
  "includeUrlPatterns": ["**/academy/**", "**/platform/**"],
  "excludeUrlPatterns": ["**/api-reference/**"]
}

URL pattern examples

Pattern	Matches
`**`	All URLs
`/blog/`	URLs containing `/blog/`
`*/docs/`	Direct children of `/docs/`
`*/.html`	URLs ending with `.html`
`!/tag/`	Exclude URLs containing `/tag/`

Output

The Actor produces two outputs:

1. llms.txt file

The generated llms.txt file is stored in the Key-Value Store and can be downloaded directly:

https://api.apify.com/v2/key-value-stores/{storeId}/records/llms.txt

You can find the direct link in the run output after completion.

2. Crawl statistics

The Dataset contains crawl results with detailed statistics:

{
  "llmsTxtUrl": "https://api.apify.com/v2/key-value-stores/abc123/records/llms.txt",
  "statistics": {
    "totalDiscovered": 150,
    "totalAfterFiltering": 120,
    "successCount": 118,
    "errorCount": 2,
    "startedAt": "2024-01-15T10:00:00.000Z",
    "finishedAt": "2024-01-15T10:01:30.000Z",
    "durationMs": 90000
  }
}

Cost estimation

This Actor uses minimal compute resources. Typical costs:

Pages crawled	Estimated cost
100 pages	~$0.01
500 pages	~$0.05
1,000 pages	~$0.10
5,000 pages	~$0.50

Costs may vary based on page size and response times.

Tips for best results

Start small - Test with maxRequestsPerCrawl: 50 first to verify the output
Use filters wisely - Exclude tag pages, archives, and other low-value URLs
Check your sitemap - Ensure your sitemap is valid and up-to-date
Review the output - Some pages may have missing or poor descriptions

FAQ

Q: My sitemap has nested sitemaps. Will this work? A: Yes! The Actor automatically handles sitemap index files that reference other sitemaps.

Q: Can I filter out certain pages? A: Yes, use excludeUrlPatterns with glob patterns like **/tag/** or **/author/**.

Q: What if a page has no description? A: The Actor tries multiple sources (meta description, og:description). If none are found, the page is listed without a description.

Q: How do I use the generated file? A: Download the file and place it at the root of your website (e.g., https://yoursite.com/llms.txt).

Q: Does this respect robots.txt? A: Yes, by default. You can disable this with respectRobotsTxt: false if needed.

Resources

Support

If you have questions or encounter issues:

Open an issue on GitHub
Contact support through the Apify platform

This Actor is open source and licensed under Apache-2.0.

/llms.txt Generator

jakub.kopecky/llmstxt-generator

The /llms.txt Generator 🕸️📄 extracts website content to create an llms.txt file for AI apps 🤖✨ like LLM fine-tuning and indexing. Output is available 📥 in the Key-Value Store for easy download and integration into workflows. 🚀

Jakub Kopecký

736

3.1

Llms Txt Generator

francis.businessfm/llms-txt-generator

Francis Marzyński

LLMs.txt generator

antonio_espresso/llms-txt-generator

Generates a clean and structured markdown format for your AI Agents LLMs.txt file. Helps define AI agent interactions, improve SEO, and ensure compatibility across platforms. Ready to integrate into your website for better transparency and control.

Antonio Blago

The LLMS.TXT Generator | Hi LLMS

onescales/the-llms-txt-generator

The most powerful tool online to generate LLMS - llms.txt , llms-full.txt and markdown .md files within seconds! Get your website discovered, and recommended by ChatGPT, Claude, Google Gemini, Perplexity, Grok, and every AI. (Great for AEO, AIO, GEO, SEO) Transform your site to AI-friendly today!

One Scales

5.0

Robots.txt Checker - CMS-Aware Analysis with AI Recommendations

alizarin_refrigerator-owner/robots-txt-checker

The Robots.txt Checker provides comprehensive analysis of your robots.txt file: Syntax Validation CMS Detection - Identify WordPress, Shopify, Drupal,& 6+ other CMS platforms Best Practice Check Companion File Checks - sitemap.xml, llms.txt, security.txt AI Recommendations - CMS-specific suggestions

The Howlers

LLMs.txt Checker - AI Readiness Scanner

alizarin_refrigerator-owner/llms-txt-checker

Batch check websites for llms.txt files & identify AI optimization opportunities. A emerging standard that websites use to communicate with AI systems. It helps LLMs understand, What the company does, How to present information about the business, Contact information & key pages Brand guidelines

The Howlers

File Converter API

vivid_astronaut/file-converter

Fabio Suizu

GitHub Repo to Text Converter

express_kingfisher/github-repo-to-text

Convert an entire public GitHub repository into a single .txt file. Optimize your developer workflow by giving LLMs full context of a codebase in one paste.

Prince Raj

Security.txt Checker

automation-lab/security-txt-checker

This actor checks websites for a security.txt file as defined by RFC 9116. It looks in `/.well-known/security.txt` and `/security.txt`, parses contact information, encryption keys, expiration dates, and validates compliance with the standard.