Pricing

Pay per event

Sitemap Generator

Crawl any website and automatically generate a standards-compliant XML sitemap. Discovers all internal pages, extracts last-modified dates and page titles, and lets you download sitemap.xml directly. Configurable crawl depth and URL filters. Ideal for SEO audits and site migrations.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

24 days ago

Last modified

What Does It Do?

Sitemap Generator crawls your website starting from the URL you provide. It follows every internal link it discovers, collects page metadata (title, content type, last-modified date), and compiles all of this into a well-formed XML sitemap that follows the Sitemaps protocol.

Key capabilities:

Crawls up to 10,000 pages per run
Generates XML sitemap saved directly to the key-value store for download
Optionally includes image URLs as <image:image> elements for Google Image Search
Extracts page titles and last-modified dates from HTTP headers
Tracks crawl depth per URL
Stays strictly within the same hostname (never follows external links)
Normalizes URLs to avoid duplicates
Skips non-page assets: PDFs, images, CSS, JS, zip files, and more

Who Is It For?

SEO professionals who need a fresh sitemap to submit to search engines after a site rebuild or migration.

Developers who want to generate a sitemap for a new site quickly without installing any software.

Content managers who want an inventory of all pages on their site for auditing purposes.

Digital agencies managing multiple client websites who need to automate sitemap generation at scale.

Why Use This Actor?

Zero setup — no software to install, no command line required
Fast — uses lightweight HTTP crawling (no browser), processes up to 10 parallel requests
Standards-compliant — produces valid XML following the Sitemaps 0.9 protocol
Downloadable output — sitemap.xml is saved to the key-value store for direct download
Structured data — every URL also saved to a dataset for analysis in spreadsheets or databases
Pay per use — charged per page crawled, not a flat fee. Small sites cost fractions of a cent.
Image sitemaps — optional Google Image Sitemap extension support

Data Fields

Each URL found during the crawl produces one record in the dataset:

Field	Type	Description
`url`	string	The full, normalized URL of the page
`title`	string / null	The HTML `<title>` of the page
`statusCode`	number	HTTP status code returned (e.g., 200)
`depth`	number	Crawl depth from the start URL (0 = start page)
`changefreq`	string	The change frequency set for this URL
`priority`	number	The priority value (start URL always 1.0, others configurable)
`lastmod`	string / null	Date in YYYY-MM-DD format from Last-Modified header or today's date
`contentType`	string / null	MIME type of the page (e.g., text/html)
`images`	string[]	List of image URLs found on the page (only if includeImages is enabled)

Pricing

This actor uses Pay Per Event billing:

Event	Cost
Run started	$0.01 (one-time per run)
Per page crawled	$0.001

Example costs:

50-page site: $0.01 + 50 × $0.001 = $0.06
500-page site: $0.01 + 500 × $0.001 = $0.51
5,000-page site: $0.01 + 5,000 × $0.001 = $5.01

How to Use

Step 1: Enter your website URL

In the Website URL field, enter the full URL of the site you want to crawl, including the protocol:

https://example.com

Step 2: Set the page limit

The Max pages to crawl field controls how many pages the actor visits. The default is 100. For large sites, increase this up to 10,000.

Step 3: Configure sitemap options

Include image URLs — Enable this if you want Google Image Search indexing support. Each page's images will be listed in the sitemap.
Change frequency — Choose how often your pages typically change. This hint helps search engines schedule re-crawling.
URL priority — Set the relative importance of pages (0.0–1.0). The start URL is always set to 1.0 automatically.

Step 4: Run the actor

Click Start. The actor will begin crawling immediately. You can watch progress in the logs.

Step 5: Download your sitemap

When the run completes:

Click Storage in the run details
Go to Key-value store
Find sitemap.xml and click to download

You can also access it directly via the Apify API.

Input Configuration

{
  "startUrl": "https://example.com",
  "maxPages": 100,
  "includeImages": false,
  "changefreq": "weekly",
  "priority": 0.5
}

Input fields

Field	Type	Required	Default	Description
`startUrl`	string	Yes	—	Full website URL to crawl
`maxPages`	integer	No	100	Maximum pages to crawl (1–10,000)
`includeImages`	boolean	No	false	Include image URLs in sitemap
`changefreq`	string	No	weekly	Change frequency for all URLs
`priority`	number	No	0.5	Default URL priority (0–1)

changefreq options: always, hourly, daily, weekly, monthly, yearly, never

Output

Key-value store: `sitemap.xml`

A complete XML sitemap following the Sitemaps 0.9 protocol:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2024-01-15</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1</priority>
  </url>
  <url>
    <loc>https://example.com/about</loc>
    <lastmod>2024-01-15</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.5</priority>
  </url>
</urlset>

Dataset

Every URL is also stored as a structured record in the dataset. You can export this to JSON, CSV, or Excel directly from the Apify platform.

Tips and Best Practices

Start with a small limit first. Run with maxPages: 20 to verify the crawler is finding the right pages before committing to a full crawl.

Enable image sitemaps for e-commerce and photography sites. If you sell products with images or run a portfolio site, enabling includeImages can boost image search visibility.

Set changefreq based on your publishing cadence. A news site updates daily; a brochure site might be monthly. Accurate hints help search engines prioritize.

Submit your sitemap to Search Console. After downloading sitemap.xml, go to Google Search Console → Sitemaps and paste your sitemap URL, or upload the file.

For very large sites, increase timeout. If your site has thousands of pages, the default 600-second timeout should be sufficient. The actor processes pages in parallel (10 concurrent requests).

The start URL always gets priority 1.0. All other pages use the priority value you set. You do not need to manually set the homepage priority.

Robots.txt is respected by default. The underlying CheerioCrawler respects robots.txt directives, so the actor will not crawl pages blocked by robots.txt.

Integrations

Zapier / Make

Use the Apify integration to trigger sitemap generation on a schedule (weekly, monthly) and automatically download or email the resulting sitemap.xml file.

Google Search Console API

Combine this actor with the Search Console API to automatically submit your freshly generated sitemap after every crawl.

Apify Scheduler

Create a scheduled run in Apify to regenerate your sitemap automatically — for example, every Monday morning.

API Usage

Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });

const run = await client.actor('automation-lab/sitemap-generator').call({
  startUrl: 'https://example.com',
  maxPages: 100,
  includeImages: false,
  changefreq: 'weekly',
  priority: 0.5,
});

console.log('Run finished:', run.status);
console.log('Dataset ID:', run.defaultDatasetId);

// Download the sitemap
const store = client.keyValueStore(run.defaultKeyValueStoreId);
const sitemap = await store.getRecord('sitemap.xml');
console.log(sitemap.value);

Python

from apify_client import ApifyClient

client = ApifyClient('YOUR_API_TOKEN')

run = client.actor('automation-lab/sitemap-generator').call(run_input={
    'startUrl': 'https://example.com',
    'maxPages': 100,
    'includeImages': False,
    'changefreq': 'weekly',
    'priority': 0.5,
})

print(f"Run finished: {run['status']}")
print(f"Dataset ID: {run['defaultDatasetId']}")

# Download the sitemap
store = client.key_value_store(run['defaultKeyValueStoreId'])
record = store.get_record('sitemap.xml')
print(record['value'])

cURL

# Start a run
curl -X POST \
  "https://api.apify.com/v2/acts/automation-lab~sitemap-generator/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "startUrl": "https://example.com",
    "maxPages": 100,
    "includeImages": false,
    "changefreq": "weekly",
    "priority": 0.5
  }'

# Get run status (replace RUN_ID)
curl "https://api.apify.com/v2/actor-runs/RUN_ID?token=YOUR_API_TOKEN"

# Download sitemap.xml (replace STORE_ID)
curl "https://api.apify.com/v2/key-value-stores/STORE_ID/records/sitemap.xml?token=YOUR_API_TOKEN"

Use with Claude AI (MCP)

This actor is available as a tool in Claude AI through the Model Context Protocol (MCP). Add it to Claude Desktop, Cursor, Windsurf, or any MCP-compatible client.

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com"

Setup for Claude Desktop, Cursor, or VS Code

Add this to your MCP config file:

{
    "mcpServers": {
        "apify": {
            "url": "https://mcp.apify.com"
        }
    }
}

Example prompts

"Generate an XML sitemap for example.com crawling up to 200 pages and give me the download link."
"Create a sitemap for my e-commerce site at shop.example.com with image URLs included for Google Image Search."
"Crawl https://docs.example.com and generate a sitemap with weekly change frequency for all pages found."

Learn more in the Apify MCP documentation.

Legality

Web crawling for sitemap generation is generally considered legitimate use and is commonly practiced by SEO tools, search engines, and webmasters. The actor:

Respects robots.txt directives by default
Crawls only the site you specify (stays on-domain)
Uses standard HTTP requests without bypassing login walls or paywalls
Does not store any personal data

Always ensure you have permission to crawl a website you do not own. Some websites' terms of service restrict automated crawling.

FAQ

Q: Why doesn't the actor find all pages on my site?

A: The actor discovers pages by following links. If your site uses JavaScript-rendered navigation (single-page apps), links may not appear in the HTML. Consider using a JavaScript-enabled crawling approach for SPAs.

Q: Can I crawl multiple websites in one run?

A: No, each run is scoped to one starting URL and hostname. Run the actor multiple times for multiple sites.

Q: The sitemap only has a few URLs — my site has many more.

A: Increase maxPages and run again. The actor stops when it reaches the page limit, not when the site is fully crawled.

Q: Does it follow redirects?

A: Yes. The actor follows HTTP redirects automatically and records the final URL in the sitemap.

Q: Can I set different priority values per page?

A: Not currently — the actor applies a uniform priority to all pages except the start URL (always 1.0). Priority per-page is a planned future feature.

Q: Is the XML output valid?

A: Yes. The output follows the Sitemaps 0.9 protocol and includes the correct XML namespace declarations.

Broken Link Checker — Find 404 errors and broken links across your site
Canonical URL Checker — Audit canonical tags and detect duplicate content issues
Robots & Sitemap Analyzer — Analyze your robots.txt and existing sitemap configuration
Accessibility Checker — Check pages for WCAG accessibility compliance

Sitemap Generator - Crawl Website & Create XML Sitemap

scrappy_garden/sitemap-generator

Generate an XML sitemap for any website. Crawls internal pages from start URLs (with depth + page limits), deduplicates URLs, and stores a ready-to-submit sitemap.xml plus a structured dataset and summary for SEO audits.

Bikram Adhikari

Sitemap Generator

datawinder/sitemap-generator

Automatically crawl a website and generate an SEO-ready sitemap in XML, HTML, or TXT format. Supports crawl depth limits, URL include/exclude patterns, and optional merging with an existing sitemap.xml. Ideal for SEO audits, site migrations, and automation.

Datawinder

Sitemap Generator

gentle_cloud/sitemap-generator

Crawl websites and generate XML sitemaps with configurable depth and page limits. Discover all pages, extract metadata, and output a ready-to-use sitemap.xml.

Monkey Coder

Sitemap Generator

alizarin_refrigerator-owner/sitemap-generator

Generate XML sitemaps by crawling any website. Discover all pages, images, & videos with configurable crawl depth, URL filters, & multiple output formats. Full Site Crawling ,Image Sitemap, Video Sitemap, Multiple Output Formats, URL Filtering, Configurable Depth, Last Modified, Webhook Integration

The Howlers

Sitemap Generator - Creates sitemap.xml for any domain

wisteria_banjo/sitemap-generator---creates-sitemap-xml-for-any-domain

Generate a clean, standards-compliant sitemap.xml for a website. This actor crawls a single website, discovers all indexable pages, and produces: ✅ A ready-to-submit sitemap.xml (Google-compliant) ✅ A structured JSON dataset of discovered URLs (for auditing, reporting, and billing)

Chris Xavier

Sitemap & URL Discovery - Find All URLs on Any Site

santamaria-automations/sitemap-url-discovery

Discover every URL on any website by parsing sitemap.xml, robots.txt, and sitemap indexes. Extract URLs with last modified dates, change frequency, and priority. Perfect for SEO audits, content analysis, crawling preparation, and site mapping.