LLMs.txt File Generator
Pricing
Pay per usage
LLMs.txt File Generator
Generate an llms.txt file from a website sitemap. Crawls all URLs, extracts titles and meta descriptions, and creates a Markdown-formatted file following the llms.txt specification. Upload then the output of your file directly on your website (Webflow, Wordpress etc.)
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Benoit Eveillard
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
llms.txt Generator
Generate an llms.txt file from any website sitemap. This Apify Actor crawls all URLs from a sitemap, extracts page titles and meta descriptions, and creates a Markdown-formatted file that helps LLMs understand your website's content.
What is llms.txt?
The llms.txt file is a standardized way to provide LLMs (Large Language Models) with information about your website. It follows a simple Markdown format:
# Website NameBrief description of the website.## Pages- [Page Title](https://example.com/page): Page description- [Another Page](https://example.com/another): Another description
Features
- Crawls XML sitemaps (including sitemap index files with nested sitemaps)
- Extracts page titles from
<title>tags - Extracts descriptions from
<meta name="description">or<meta property="og:description"> - Supports glob patterns for URL filtering (include/exclude)
- Respects
robots.txtdirectives - Configurable concurrency and request limits
- Progress tracking with status updates
Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
sitemapUrl | string | Yes | - | URL of the XML sitemap to crawl |
maxConcurrency | integer | No | 5 | Maximum concurrent requests (1-50) |
maxRequestsPerCrawl | integer | No | 1000 | Maximum pages to crawl (0 = unlimited) |
respectRobotsTxt | boolean | No | true | Honor robots.txt restrictions |
includeUrlPatterns | array | No | ["**"] | Glob patterns for URLs to include |
excludeUrlPatterns | array | No | [] | Glob patterns for URLs to exclude |
Example Input
{"sitemapUrl": "https://example.com/sitemap.xml","maxConcurrency": 5,"maxRequestsPerCrawl": 500,"includeUrlPatterns": ["**/blog/**", "**/docs/**"],"excludeUrlPatterns": ["**/tag/**", "**/author/**"],"respectRobotsTxt": true}
URL Pattern Examples
**- Match all URLs**/blog/**- Match URLs containing/blog/**/docs/*- Match direct children of/docs/**/*.html- Match URLs ending with.html
Output
The Actor produces two outputs:
1. llms.txt File (Key-Value Store)
The generated llms.txt file is stored in the default Key-Value Store and can be downloaded directly:
https://api.apify.com/v2/key-value-stores/{storeId}/records/llms.txt
2. Crawl Results (Dataset)
The Dataset contains a single item with crawl statistics:
{"llmsTxtUrl": "https://api.apify.com/v2/key-value-stores/{storeId}/records/llms.txt","statistics": {"totalDiscovered": 150,"totalAfterFiltering": 120,"successCount": 118,"errorCount": 2,"robotsSkippedCount": 0,"limitSkippedCount": 0,"startedAt": "2024-01-15T10:00:00.000Z","finishedAt": "2024-01-15T10:01:30.000Z","durationMs": 90000},"errors": [{ "url": "https://example.com/broken", "message": "404 Not Found" }]}
Local Development
Prerequisites
- Node.js 18+
- npm
Setup
# Install dependenciesnpm install# Create input filemkdir -p storage/key_value_stores/defaultcat > storage/key_value_stores/default/INPUT.json << 'EOF'{"sitemapUrl": "https://crawlee.dev/sitemap.xml","maxConcurrency": 5,"maxRequestsPerCrawl": 50}EOF# Run locallynpm run start:dev
Available Scripts
| Command | Description |
|---|---|
npm run start:dev | Run Actor locally with tsx |
npm run build | Compile TypeScript |
npm run lint | Run ESLint |
npm run lint:fix | Fix ESLint issues |
npm test | Run tests |
Project Structure
.actor/actor.json # Actor configurationinput_schema.json # Input validation schemaoutput_schema.json # Output schema definitiondataset_schema.json # Dataset structurekey_value_store_schema.json # KV store structuresrc/main.ts # Entry point and orchestrationtypes.ts # TypeScript interfacesservices/crawler.ts # CheerioCrawler configurationsitemap.ts # Sitemap loading utilitiesurl-filter.ts # Glob-based URL filteringllms-txt.ts # llms.txt generationutils/constants.ts # Default values and configstorage/ # Local storage (dev only)
Deploy to Apify
Using Git
- Push your code to a Git repository
- Go to Apify Console
- Click "Link Git Repository"
- Select your repository
Using CLI
# Login to Apifyapify login# Deployapify push
Technologies
- Apify SDK - Actor lifecycle and storage
- Crawlee - Web scraping framework
- CheerioCrawler - Fast HTML crawler
- picomatch - Glob pattern matching
Resources
License
Apache-2.0