# Sitemap Validator (`maximedupre/sitemap-validator`) Actor

Validate XML sitemaps and sitemap indexes. Check listed URLs for HTTP status, redirects, final URL, response time, malformed URLs, and sitemap metadata.

- **URL**: https://apify.com/maximedupre/sitemap-validator.md
- **Developed by:** [Maxime Dupré](https://apify.com/maximedupre) (community)
- **Categories:** Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

$0.90 / 1,000 checked urls

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

### 🗺️ Sitemap validator for URL health checks

Sitemap Validator checks public XML sitemaps, sitemap indexes, website roots, bare domains, and `robots.txt` files. Add a target such as [apify.com/sitemap.xml](https://apify.com/sitemap.xml), and the Actor parses the sitemap, follows child sitemap indexes within your depth limit, checks the listed URLs, and saves one row per checked URL.

Use this sitemap validator when you need a fast technical SEO check before a migration, release, crawl-budget review, client audit, or broken-link cleanup. Each row keeps the page URL, source sitemap URL, parent sitemap index, HTTP status, final URL after redirects, redirect count, response time, sitemap metadata, and a plain issue category when something needs attention.

For a quick first run, keep the prefilled Apify sitemap target and the default `Maximum checked URLs` value. You will get a focused dataset you can inspect in Apify, export as JSON, CSV, Excel, XML, RSS, or HTML, or consume through the Apify API, schedules, webhooks, and integrations.

### ✅ What this Actor does

- Accepts direct sitemap URLs, sitemap-index URLs, website roots, bare domains, and `robots.txt` URLs.
- Discovers sitemap files from `robots.txt` and common sitemap paths when you submit a website root.
- Parses XML sitemap URL sets, XML sitemap indexes, plain-text sitemaps, and gzipped sitemap responses.
- Follows nested sitemap indexes up to your `Maximum index depth`.
- Checks sitemap-listed URLs for HTTP status, redirects, final URL, response time, malformed URLs, and network issues.
- Preserves sitemap-native `lastmod`, `changefreq`, and `priority` values when the source sitemap provides them.
- Saves one dataset row per checked sitemap-listed URL.
- Logs empty or unreachable targets without saving placeholder rows.

This Actor validates URLs that are already listed in public sitemap files. It does not crawl arbitrary internal links, scrape page content, generate sitemaps, submit sitemaps to search engines, or check whether search engines have indexed a URL.

### 📊 Data you get

Each dataset item represents one checked URL from a parsed sitemap. Rows include:

- `pageUrl` - URL listed in the sitemap.
- `host` - host parsed from the listed URL.
- `sourceSitemapUrl` - sitemap file that declared the URL.
- `parentSitemapIndexUrl` - sitemap index that linked to the source sitemap, or `null`.
- `indexDepth` - depth of the source sitemap below the submitted or discovered target.
- `sitemapLastmod`, `changefreq`, and `priority` - sitemap metadata when present.
- `urlStatus` - `ok`, `redirect`, `broken`, `timeout`, or `malformed`.
- `httpStatus` - observed HTTP status, or `null` when no response was available.
- `finalUrl` - final URL after redirects, or `null` when unavailable.
- `redirectCount` - number of redirects followed.
- `responseTimeMs` - elapsed time for the URL check.
- `issue` - issue category and message, or `null` for healthy URLs.

### 🚀 How to run it

1. Open the Input tab.
2. Add one or more sitemap, website, domain, or `robots.txt` targets.
3. Keep `Maximum checked URLs` small for your first run, then raise it when the output looks right.
4. Use `Maximum index depth` to control nested sitemap-index expansion. Use `0` to check only the submitted sitemap target.
5. Run the Actor and open the dataset.

No cookies, login credentials, source API key, or custom proxy settings are needed from you. Targets must expose public sitemap assets over `http` or `https`.

### ✍️ Input example

```json
{
	"targets": [
		"https://apify.com/sitemap.xml",
		"https://apify.com",
		"example.com/robots.txt"
	],
	"maxCheckedUrls": 550,
	"maxIndexDepth": 2
}
````

`Sitemap or website targets` is the only required input. You can mix known sitemap URLs, sitemap indexes, website roots, bare domains, and `robots.txt` URLs in the same run.

`Maximum checked URLs` caps how many sitemap-listed URLs are checked across all targets. Large sitemap indexes can contain thousands of URLs, so this limit keeps first runs predictable.

`Maximum index depth` controls how many sitemap-index levels are followed. A value of `2` covers common sitemap index structures. A value of `0` keeps validation to the submitted or directly discovered sitemap.

### 📦 Output example

```json
{
	"pageUrl": "https://apify.com/actors",
	"host": "apify.com",
	"sourceSitemapUrl": "https://apify.com/sitemap/pages.xml",
	"parentSitemapIndexUrl": "https://apify.com/sitemap.xml",
	"indexDepth": 1,
	"sitemapLastmod": "2026-06-20T15:31:00.000Z",
	"changefreq": "weekly",
	"priority": 0.8,
	"urlStatus": "redirect",
	"httpStatus": 301,
	"finalUrl": "https://apify.com/store",
	"redirectCount": 1,
	"responseTimeMs": 184,
	"issue": {
		"category": "redirect",
		"message": "Sitemap URL redirects to a different final URL."
	}
}
```

Healthy URLs use `urlStatus: "ok"` and `issue: null`. Redirects, broken responses, timeouts, network issues, and malformed sitemap-listed URLs are still saved as validation results because they are the rows you need to review.

### 💳 Pricing

This Actor uses pay-per-event pricing. You are charged once for each sitemap-listed URL checked and saved to the dataset. The pricing event is called `Checked URL`.

Failed target discovery, unreachable sitemap files, empty sitemaps, and invalid submitted targets are logged and skipped instead of being saved as charged output rows.

### ⚠️ Limits and caveats

- Sitemap files must be publicly reachable over `http` or `https`.
- The Actor checks URLs listed in sitemaps. It does not crawl pages that are not listed in a sitemap.
- Sitemap metadata is only as complete as the source file. Missing `lastmod`, `changefreq`, or `priority` values are returned as `null`.
- Very large sitemap indexes can contain many child sitemaps and URLs. Use `Maximum checked URLs` and `Maximum index depth` to keep runs bounded.
- HTTP status and response time are observed at run time and can change as the source site changes.
- The Actor reports URL health signals. It does not prove that Google, Bing, or another search engine has indexed the URL.

### ❓ FAQ

#### 🔐 Do I need login credentials or an API key?

No. Sitemap Validator reads public sitemap assets and checks public URLs. You do not need to provide cookies, login credentials, a source API key, or custom proxy settings.

#### 🧭 Can it crawl my whole website?

No. It checks URLs found in sitemap files. If you need a rendered page crawl and link map, use Website URL Crawler.

#### 🧩 Can it validate sitemap indexes?

Yes. The Actor parses sitemap indexes and follows child sitemaps up to your `Maximum index depth`.

#### 📉 Why did my run save no rows?

The submitted target may not expose a public sitemap, the sitemap may be empty, or the target may be unreachable at run time. Those cases are logged and skipped instead of creating placeholder dataset rows.

### 📝 Changelog

- 0.0: Initial release.

### 🆘 Support

For issues, questions, or feature requests, [file a ticket](https://console.apify.com/actors/maximedupre~sitemap-validator/issues) and I'll fix or implement it in less than 24h 🫡

### 🔗 Other actors

- [Sitemap Sniffer ↗](https://apify.com/maximedupre/sitemap-sniffer) - Find sitemap files and export sitemap URL inventory before validation.
- [Website URL Crawler ↗](https://apify.com/maximedupre/website-url-crawler) - Crawl rendered public pages and export a website link map.
- [Webpage Text Extractor ↗](https://apify.com/maximedupre/webpage-text-extractor) - Extract clean text or Markdown from public web pages.
- [SSL Certificate Checker ↗](https://apify.com/maximedupre/ssl-certificate-checker) - Check public HTTPS certificates, expiry, trust, and TLS details.
- [Robots.txt Generator ↗](https://apify.com/maximedupre/robots-txt-generator) - Generate deployable `robots.txt` files with sitemap directives.

**Made with ❤️ by Maxime Dupré**

# Actor input Schema

## `targets` (type: `array`):

Add the public sitemaps, sitemap indexes, websites, domains, or robots.txt URLs to validate.

## `maxCheckedUrls` (type: `integer`):

Limit how many sitemap-listed URLs are checked across all targets.

## `maxIndexDepth` (type: `integer`):

Set how many nested sitemap-index levels to follow. Use 0 to check only the submitted sitemap.

## Actor input object example

```json
{
  "targets": [
    "https://apify.com/sitemap.xml",
    "https://docs.apify.com/sitemap.xml"
  ],
  "maxCheckedUrls": 450,
  "maxIndexDepth": 2
}
```

# Actor output Schema

## `results` (type: `string`):

Open the dataset with one row for each sitemap-listed URL checked by the run.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "targets": [
        "https://apify.com/sitemap.xml",
        "https://docs.apify.com/sitemap.xml"
    ],
    "maxCheckedUrls": 450,
    "maxIndexDepth": 2
};

// Run the Actor and wait for it to finish
const run = await client.actor("maximedupre/sitemap-validator").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "targets": [
        "https://apify.com/sitemap.xml",
        "https://docs.apify.com/sitemap.xml",
    ],
    "maxCheckedUrls": 450,
    "maxIndexDepth": 2,
}

# Run the Actor and wait for it to finish
run = client.actor("maximedupre/sitemap-validator").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "targets": [
    "https://apify.com/sitemap.xml",
    "https://docs.apify.com/sitemap.xml"
  ],
  "maxCheckedUrls": 450,
  "maxIndexDepth": 2
}' |
apify call maximedupre/sitemap-validator --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=maximedupre/sitemap-validator",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Sitemap Validator",
        "description": "Validate XML sitemaps and sitemap indexes. Check listed URLs for HTTP status, redirects, final URL, response time, malformed URLs, and sitemap metadata.",
        "version": "0.0",
        "x-build-id": "Mbeiv7jzNNuqKRuOv"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/maximedupre~sitemap-validator/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-maximedupre-sitemap-validator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/maximedupre~sitemap-validator/runs": {
            "post": {
                "operationId": "runs-sync-maximedupre-sitemap-validator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/maximedupre~sitemap-validator/run-sync": {
            "post": {
                "operationId": "run-sync-maximedupre-sitemap-validator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "targets"
                ],
                "properties": {
                    "targets": {
                        "title": "Sitemap or website targets",
                        "minItems": 1,
                        "maxItems": 100,
                        "uniqueItems": true,
                        "type": "array",
                        "description": "Add the public sitemaps, sitemap indexes, websites, domains, or robots.txt URLs to validate.",
                        "items": {
                            "type": "string",
                            "minLength": 1,
                            "maxLength": 2000
                        }
                    },
                    "maxCheckedUrls": {
                        "title": "Maximum checked URLs",
                        "minimum": 1,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Limit how many sitemap-listed URLs are checked across all targets.",
                        "default": 450
                    },
                    "maxIndexDepth": {
                        "title": "Maximum index depth",
                        "minimum": 0,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Set how many nested sitemap-index levels to follow. Use 0 to check only the submitted sitemap.",
                        "default": 2
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
