Pricing

from $0.10 / 1,000 results

Sitemap Generator

Automatically crawl a website and generate an SEO-ready sitemap in XML, HTML, or TXT format. Supports crawl depth limits, URL include/exclude patterns, and optional merging with an existing sitemap.xml. Ideal for SEO audits, site migrations, and automation.

Pricing

from $0.10 / 1,000 results

Rating

0.0

(0)

Developer

Datawinder

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

🚀 Features

Automatic Page Discovery

Intelligently crawls websites following internal links and navigation patterns
Discovers all accessible pages automatically
Only follows links from the same domain to prevent crawling external sites

Customizable Crawling

Crawl Depth Control: Set maximum depth of crawling (0 = homepage only, 1 = homepage + direct links, etc.)
URL Filtering: Include or exclude specific page types or directories using glob patterns
Request Limits: Control the maximum number of pages to crawl

Multiple Sitemap Formats

XML: Standard XML sitemap format compliant with Google and Bing specifications
HTML: User-friendly HTML sitemap for website visitors
Text: Plain text format, one URL per line

Sitemap Merging (XML Only)

Fetch and merge with existing sitemap.xml files
Preserves existing URLs while adding newly discovered ones
New crawl data takes precedence over existing sitemap metadata

Built-in Validation

Ensures sitemaps comply with Google and Bing specifications
Proper priority settings based on page depth
ISO 8601 date format for last-modified dates
Validates XML structure and warns if exceeding 50,000 URLs (Google's limit)

📥 Input Parameters

Required

Field	Type	Description
`websiteUrl`	string	The URL of the website you want to generate a sitemap for. Example: `https://example.com`

Optional

Field	Type	Default	Description
`sitemapUrl`	string	-	URL to an existing sitemap.xml file. Only used when format is XML. If provided, the Actor will fetch and merge existing URLs with newly discovered ones. Ignored for HTML and Text formats. Example: `https://example.com/sitemap.xml`
`sitemapFormat`	string	`"xml"`	The file format for the generated sitemap. Options: `"xml"`, `"html"`, `"text"`
`maxCrawlDepth`	integer	`10`	Maximum depth of crawling. `0` = only start URL, `1` = start URL + all links from it, etc. Range: 0-50
`includePatterns`	array	`[]`	Array of glob patterns for URLs to include. If empty, all URLs are included. Example: `["/blog/", "/products/"]`
`excludePatterns`	array	`[]`	Array of glob patterns for URLs to exclude. Example: `["/admin/", ".pdf", "/private/*"]`
`maxRequestsPerCrawl`	integer	`1000`	Maximum number of requests that can be made by this crawler.

Example Input

{
    "websiteUrl": "https://example.com",
    "sitemapFormat": "xml",
    "maxCrawlDepth": 3,
    "excludePatterns": ["/admin/*", "*.pdf"],
    "maxRequestsPerCrawl": 500
}

📤 Output Data

Key-Value Store

The Actor saves the generated sitemap file to the Key-Value Store:

XML Format: sitemap.xml (Content-Type: application/xml)
HTML Format: sitemap.html (Content-Type: text/html)
Text Format: sitemap.txt (Content-Type: text/plain)

Dataset

The Actor also saves detailed metadata for each discovered URL to the Dataset:

Field	Type	Description
`url`	string	The URL of the page
`title`	string	The title of the page (extracted from `<title>` tag)
`lastModified`	string	ISO 8601 date when the page was last modified (crawl timestamp)
`priority`	string	Priority value for the sitemap (0.0 to 1.0). Calculated based on depth: homepage = 1.0, each level deeper decreases by 0.1
`depth`	integer	Crawl depth of the page (0 = homepage, 1 = first level, etc.)

Example Dataset Entry

{
    "url": "https://example.com/about",
    "title": "About Us - Example.com",
    "lastModified": "2025-12-31T12:00:00.000Z",
    "priority": "0.9",
    "depth": 1
}

📋 Sitemap Format Details

XML Format

Standard XML sitemap compliant with sitemaps.org protocol:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2025-12-31T12:00:00.000Z</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <!-- More URLs... -->
</urlset>

Features:

Valid XML structure with proper namespace
Priority values (0.1 to 1.0) based on page depth
ISO 8601 date format for last-modified dates
Change frequency set to "weekly" for all URLs
Validated against Google/Bing specifications

HTML Format

User-friendly HTML sitemap with clean styling:

<!DOCTYPE html>
<html lang="en">
<head>
  <title>Sitemap</title>
  <!-- Styling included -->
</head>
<body>
  <h1>Sitemap</h1>
  <p>Total pages: 150</p>
  <ul>
    <li><a href="https://example.com/">Homepage</a></li>
    <!-- More links... -->
  </ul>
</body>
</html>

Features:

Responsive design
Clickable links with page titles
Total page count displayed
Clean, readable format

Text Format

Simple plain text format:

https://example.com/
https://example.com/about
https://example.com/contact

Features:

One URL per line
Simple and easy to parse
Sorted by depth, then alphabetically

🔄 Sitemap Merging (XML Only)

When sitemapFormat is "xml" and sitemapUrl is provided:

The Actor crawls the website and discovers new URLs
Fetches the existing sitemap from the provided URL
Parses all URLs from the existing sitemap
Merges the URLs:
- New URLs from crawl are added with fresh metadata
- Existing URLs that are re-discovered keep the new crawl metadata (newer lastModified, updated priority)
- Existing URLs that aren't re-discovered are preserved with their original metadata
Generates an updated sitemap with all URLs

Note: Sitemap merging only works with direct sitemap files (not sitemap index files). If a sitemap index is detected, a warning is logged.

💡 Use Cases

SEO Optimization: Generate comprehensive sitemaps to improve search engine indexing
Website Maintenance: Automatically update sitemaps when new pages are added
E-commerce Sites: Create sitemaps for large product catalogs
Content Management: Keep sitemaps synchronized with website content
Multi-format Support: Generate different formats for different needs (XML for search engines, HTML for users)
Sitemap Updates: Merge new discoveries with existing sitemaps without losing old URLs

🎯 Example Scenarios

Basic Sitemap Generation

{
    "websiteUrl": "https://example.com",
    "sitemapFormat": "xml"
}

Generate HTML Sitemap with Limited Depth

{
    "websiteUrl": "https://example.com",
    "sitemapFormat": "html",
    "maxCrawlDepth": 2,
    "maxRequestsPerCrawl": 100
}

Update Existing Sitemap

{
    "websiteUrl": "https://example.com",
    "sitemapUrl": "https://example.com/sitemap.xml",
    "sitemapFormat": "xml",
    "maxCrawlDepth": 5
}

Exclude Specific Paths

{
    "websiteUrl": "https://example.com",
    "sitemapFormat": "xml",
    "excludePatterns": ["/admin/*", "/private/*", "*.pdf", "*.zip"]
}

⚙️ Technical Details

Crawler: Uses CheerioCrawler for fast HTML parsing (10x faster than browser-based crawlers)
Domain Filtering: Automatically filters to only crawl links from the same domain
Priority Calculation: Homepage (depth 0) = 1.0, each level deeper decreases by 0.1, minimum = 0.1
Validation: Built-in validation ensures compliance with Google and Bing sitemap specifications
Performance: Optimized for large websites with configurable request limits

📝 Notes

The Actor only follows internal links (same domain) to prevent crawling external websites
For very large websites, consider using maxRequestsPerCrawl to limit the crawl scope
XML sitemaps with more than 50,000 URLs will generate a warning (Google's recommended limit)
Sitemap merging is only available for XML format
The Actor respects the crawl depth setting, so deeper pages may not be discovered if depth is too low

🔗 Resources

Sitemap Generator - Crawl Website & Create XML Sitemap

scrappy_garden/sitemap-generator

Generate an XML sitemap for any website. Crawls internal pages from start URLs (with depth + page limits), deduplicates URLs, and stores a ready-to-submit sitemap.xml plus a structured dataset and summary for SEO audits.

Bikram Adhikari

XML Sitemap URL Extractor

andok/sitemap-extractor

Recursively crawl and extract every single URL from a website’s sitemap.xml. Automate your SEO audits and scraping queues.

Andok

Fast Sitemap Generator

eunit/sitemap-generator

Boost SEO with this automatic Sitemap Generator. Crawl any site to create XML, HTML, & TXT sitemaps. Supports custom depth, regex filters, & robots.txt. Compatible with Google Search Console.

Emmanuel Uchenna

5.0

Sitemap Generator

himalyancoder/Sitemap-generator

Sameer Pun

5.0

Sitemap URL Extractor

onescales/sitemap-url-extractor

Provide a website link to a sitemap.xml and the app will extract and list all URLs in the sitemap as well as additional data in the sitemap (i.e. https://onescales.com/sitemap.xml).

One Scales

409

5.0

Sitemap API

vivid_astronaut/sitemap

Fabio Suizu

Sitemap Scraper

pvillalva/sitemap-scraper

The Sitemap Scraper extracts and outputs all URLs from a given sitemap.

Percival Villalva

233

5.0

Robots.txt & Sitemap Analyzer

automation-lab/robots-sitemap-analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

Stas Persiianenko

Xml Sitemap Validator

zerobreak/xml-sitemap-validator

XML sitemap validator that crawls every URL in your sitemap and flags broken links, redirect chains, and structural errors — so SEO teams can audit sitemap health in seconds.

ZeroBreak

Sitemap Detector

coder_zoro/sitemap-detector

Find sitemap URLs fast with our free Sitemap Finder tool. Instantly detect sitemaps from any website for SEO audits, indexing checks, and crawl planning. Improve visibility, site structure insights, and search engine performance in just seconds

Zoro

165

5.0

Sitemap Generator

🚀 Features

Automatic Page Discovery

Customizable Crawling

Multiple Sitemap Formats

Sitemap Merging (XML Only)

Built-in Validation

📥 Input Parameters

Required

Optional

Example Input

📤 Output Data

Key-Value Store

Dataset

Example Dataset Entry

📋 Sitemap Format Details

XML Format

HTML Format

Text Format

🔄 Sitemap Merging (XML Only)

💡 Use Cases

🎯 Example Scenarios

Basic Sitemap Generation

Generate HTML Sitemap with Limited Depth

Update Existing Sitemap

Exclude Specific Paths

⚙️ Technical Details

📝 Notes

🔗 Resources

You might also like

Sitemap Generator - Crawl Website & Create XML Sitemap

XML Sitemap URL Extractor

Fast Sitemap Generator

Sitemap Generator

Sitemap URL Extractor

Sitemap API

Sitemap Scraper

Robots.txt & Sitemap Analyzer

Xml Sitemap Validator

Sitemap Detector