Sitemap Generator
Pricing
from $0.10 / 1,000 results
Sitemap Generator
Automatically crawl a website and generate an SEO-ready sitemap in XML, HTML, or TXT format. Supports crawl depth limits, URL include/exclude patterns, and optional merging with an existing sitemap.xml. Ideal for SEO audits, site migrations, and automation.
Pricing
from $0.10 / 1,000 results
Rating
0.0
(0)
Developer

Datawinder
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
21 days ago
Last modified
Categories
Share
A powerful Apify Actor that automatically generates sitemaps for websites by crawling and discovering all accessible pages. Supports multiple output formats (XML, HTML, Text) and can merge with existing sitemaps.
๐ Features
Automatic Page Discovery
- Intelligently crawls websites following internal links and navigation patterns
- Discovers all accessible pages automatically
- Only follows links from the same domain to prevent crawling external sites
Customizable Crawling
- Crawl Depth Control: Set maximum depth of crawling (0 = homepage only, 1 = homepage + direct links, etc.)
- URL Filtering: Include or exclude specific page types or directories using glob patterns
- Request Limits: Control the maximum number of pages to crawl
Multiple Sitemap Formats
- XML: Standard XML sitemap format compliant with Google and Bing specifications
- HTML: User-friendly HTML sitemap for website visitors
- Text: Plain text format, one URL per line
Sitemap Merging (XML Only)
- Fetch and merge with existing sitemap.xml files
- Preserves existing URLs while adding newly discovered ones
- New crawl data takes precedence over existing sitemap metadata
Built-in Validation
- Ensures sitemaps comply with Google and Bing specifications
- Proper priority settings based on page depth
- ISO 8601 date format for last-modified dates
- Validates XML structure and warns if exceeding 50,000 URLs (Google's limit)
๐ฅ Input Parameters
Required
| Field | Type | Description |
|---|---|---|
websiteUrl | string | The URL of the website you want to generate a sitemap for. Example: https://example.com |
Optional
| Field | Type | Default | Description |
|---|---|---|---|
sitemapUrl | string | - | URL to an existing sitemap.xml file. Only used when format is XML. If provided, the Actor will fetch and merge existing URLs with newly discovered ones. Ignored for HTML and Text formats. Example: https://example.com/sitemap.xml |
sitemapFormat | string | "xml" | The file format for the generated sitemap. Options: "xml", "html", "text" |
maxCrawlDepth | integer | 10 | Maximum depth of crawling. 0 = only start URL, 1 = start URL + all links from it, etc. Range: 0-50 |
includePatterns | array | [] | Array of glob patterns for URLs to include. If empty, all URLs are included. Example: ["/blog/*", "/products/*"] |
excludePatterns | array | [] | Array of glob patterns for URLs to exclude. Example: ["/admin/*", "*.pdf", "/private/*"] |
maxRequestsPerCrawl | integer | 1000 | Maximum number of requests that can be made by this crawler. |
Example Input
{"websiteUrl": "https://example.com","sitemapFormat": "xml","maxCrawlDepth": 3,"excludePatterns": ["/admin/*", "*.pdf"],"maxRequestsPerCrawl": 500}
๐ค Output Data
Key-Value Store
The Actor saves the generated sitemap file to the Key-Value Store:
- XML Format:
sitemap.xml(Content-Type:application/xml) - HTML Format:
sitemap.html(Content-Type:text/html) - Text Format:
sitemap.txt(Content-Type:text/plain)
Dataset
The Actor also saves detailed metadata for each discovered URL to the Dataset:
| Field | Type | Description |
|---|---|---|
url | string | The URL of the page |
title | string | The title of the page (extracted from <title> tag) |
lastModified | string | ISO 8601 date when the page was last modified (crawl timestamp) |
priority | string | Priority value for the sitemap (0.0 to 1.0). Calculated based on depth: homepage = 1.0, each level deeper decreases by 0.1 |
depth | integer | Crawl depth of the page (0 = homepage, 1 = first level, etc.) |
Example Dataset Entry
{"url": "https://example.com/about","title": "About Us - Example.com","lastModified": "2025-12-31T12:00:00.000Z","priority": "0.9","depth": 1}
๐ Sitemap Format Details
XML Format
Standard XML sitemap compliant with sitemaps.org protocol:
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><url><loc>https://example.com/</loc><lastmod>2025-12-31T12:00:00.000Z</lastmod><changefreq>weekly</changefreq><priority>1.0</priority></url><!-- More URLs... --></urlset>
Features:
- Valid XML structure with proper namespace
- Priority values (0.1 to 1.0) based on page depth
- ISO 8601 date format for last-modified dates
- Change frequency set to "weekly" for all URLs
- Validated against Google/Bing specifications
HTML Format
User-friendly HTML sitemap with clean styling:
<!DOCTYPE html><html lang="en"><head><title>Sitemap</title><!-- Styling included --></head><body><h1>Sitemap</h1><p>Total pages: 150</p><ul><li><a href="https://example.com/">Homepage</a></li><!-- More links... --></ul></body></html>
Features:
- Responsive design
- Clickable links with page titles
- Total page count displayed
- Clean, readable format
Text Format
Simple plain text format:
https://example.com/https://example.com/abouthttps://example.com/contact
Features:
- One URL per line
- Simple and easy to parse
- Sorted by depth, then alphabetically
๐ Sitemap Merging (XML Only)
When sitemapFormat is "xml" and sitemapUrl is provided:
- The Actor crawls the website and discovers new URLs
- Fetches the existing sitemap from the provided URL
- Parses all URLs from the existing sitemap
- Merges the URLs:
- New URLs from crawl are added with fresh metadata
- Existing URLs that are re-discovered keep the new crawl metadata (newer lastModified, updated priority)
- Existing URLs that aren't re-discovered are preserved with their original metadata
- Generates an updated sitemap with all URLs
Note: Sitemap merging only works with direct sitemap files (not sitemap index files). If a sitemap index is detected, a warning is logged.
๐ก Use Cases
- SEO Optimization: Generate comprehensive sitemaps to improve search engine indexing
- Website Maintenance: Automatically update sitemaps when new pages are added
- E-commerce Sites: Create sitemaps for large product catalogs
- Content Management: Keep sitemaps synchronized with website content
- Multi-format Support: Generate different formats for different needs (XML for search engines, HTML for users)
- Sitemap Updates: Merge new discoveries with existing sitemaps without losing old URLs
๐ฏ Example Scenarios
Basic Sitemap Generation
{"websiteUrl": "https://example.com","sitemapFormat": "xml"}
Generate HTML Sitemap with Limited Depth
{"websiteUrl": "https://example.com","sitemapFormat": "html","maxCrawlDepth": 2,"maxRequestsPerCrawl": 100}
Update Existing Sitemap
{"websiteUrl": "https://example.com","sitemapUrl": "https://example.com/sitemap.xml","sitemapFormat": "xml","maxCrawlDepth": 5}
Exclude Specific Paths
{"websiteUrl": "https://example.com","sitemapFormat": "xml","excludePatterns": ["/admin/*", "/private/*", "*.pdf", "*.zip"]}
โ๏ธ Technical Details
- Crawler: Uses CheerioCrawler for fast HTML parsing (10x faster than browser-based crawlers)
- Domain Filtering: Automatically filters to only crawl links from the same domain
- Priority Calculation: Homepage (depth 0) = 1.0, each level deeper decreases by 0.1, minimum = 0.1
- Validation: Built-in validation ensures compliance with Google and Bing sitemap specifications
- Performance: Optimized for large websites with configurable request limits
๐ Notes
- The Actor only follows internal links (same domain) to prevent crawling external websites
- For very large websites, consider using
maxRequestsPerCrawlto limit the crawl scope - XML sitemaps with more than 50,000 URLs will generate a warning (Google's recommended limit)
- Sitemap merging is only available for XML format
- The Actor respects the crawl depth setting, so deeper pages may not be discovered if depth is too low