Pricing

$4.99/month + usage

Xml Sitemap Validator

XML sitemap validator that crawls every URL in your sitemap and flags broken links, redirect chains, and structural errors — so SEO teams can audit sitemap health in seconds.

Pricing

$4.99/month + usage

Rating

0.0

(0)

Developer

ZeroBreak

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

XML Sitemap Validator — Find Broken Links, Redirects & Errors in Any Sitemap

XML Sitemap Validator is an Apify actor that fetches any XML sitemap, checks every listed URL for HTTP status codes, and produces a detailed per-URL report — just like the validation tools at xml-sitemaps.com or seoptimer.com's sitemap checker, but fully automated and exportable. Point it at a sitemap URL and get back a structured dataset showing which pages are accessible, which are broken (404), which redirect, and how fast each one loads.

Whether you're running an SEO audit on a large e-commerce site, validating a sitemap before a site migration, or monitoring URL health on a weekly schedule, this actor handles it all — including sitemap index files with nested child sitemaps.

Use Cases

SEO auditing — Automatically detect broken links and redirect chains that harm your search rankings before Google finds them first
Pre-launch validation — Crawl your sitemap after a redesign or CMS migration to confirm every URL returns 200 OK
Sitemap index support — Validate large sites like BBC or Shopify that split their sitemaps across dozens of child sitemap files
Response time monitoring — Flag slow-loading pages (high responseTimeMs) that may affect Core Web Vitals
Redirect chain detection — Identify URLs in your sitemap that still point to old addresses that have since been permanently moved
Scheduled health checks — Run on a cron trigger and pipe results to Google Sheets or Slack to monitor sitemap health over time

Input

Parameter	Type	Default	Description
`sitemapUrl`	string	—	Required. URL of the XML sitemap to validate. Supports standard sitemaps and sitemap index files.
`sitemapUrls`	array	`[]`	Additional sitemap URLs to validate in the same run.
`checkUrls`	boolean	`true`	Fetch each listed URL to verify HTTP status. Disable to validate XML structure only.
`followRedirects`	boolean	`true`	Follow HTTP redirects and record the final destination URL.
`concurrency`	integer	`10`	Number of URLs to check in parallel. Higher values are faster but may trigger rate limiting.
`maxUrls`	integer	`100`	Maximum number of URLs to process per run. Set to `0` for no limit.
`timeoutSecs`	integer	`300`	Total actor runtime limit in seconds.
`requestTimeoutSecs`	integer	`30`	Per-URL request timeout in seconds. URLs exceeding this are flagged as timeouts.

Example Input — Validate a Single Sitemap

{
    "sitemapUrl": "https://www.shopify.com/sitemap.xml",
    "checkUrls": true,
    "concurrency": 15,
    "maxUrls": 200,
    "requestTimeoutSecs": 20
}

Example Input — Validate Multiple Sitemaps at Once

{
    "sitemapUrl": "https://www.bbc.com/sitemap.xml",
    "sitemapUrls": [
        "https://techcrunch.com/news-sitemap.xml",
        "https://www.smashingmagazine.com/sitemap_index.xml"
    ],
    "checkUrls": true,
    "followRedirects": true,
    "concurrency": 10,
    "maxUrls": 500
}

Example Input — Structure-Only Validation (No URL Requests)

{
    "sitemapUrl": "https://www.theverge.com/sitemap.xml",
    "checkUrls": false
}

What Data Does This Actor Extract?

The actor stores one record per URL found in the sitemap. Each entry contains:

{
    "sitemapUrl": "https://www.shopify.com/sitemap.xml",
    "url": "https://www.shopify.com/blog/what-is-shopify",
    "lastmod": "2024-11-15",
    "changefreq": "weekly",
    "priority": 0.8,
    "httpStatus": 200,
    "isAccessible": true,
    "finalUrl": "https://www.shopify.com/blog/what-is-shopify",
    "isRedirected": false,
    "responseTimeMs": 312,
    "isValidUrl": true,
    "issue": "",
    "checkedAt": "2025-03-01T10:22:05.412Z"
}

Example — Broken Link Detected

{
    "sitemapUrl": "https://www.shopify.com/sitemap.xml",
    "url": "https://www.shopify.com/blog/old-post-removed",
    "lastmod": "2022-06-01",
    "changefreq": "monthly",
    "priority": 0.5,
    "httpStatus": 404,
    "isAccessible": false,
    "finalUrl": "https://www.shopify.com/blog/old-post-removed",
    "isRedirected": false,
    "responseTimeMs": 198,
    "isValidUrl": true,
    "issue": "Broken link — page returned 404 Not Found",
    "checkedAt": "2025-03-01T10:22:11.093Z"
}

Example — Redirect Detected

{
    "sitemapUrl": "https://www.bbc.com/sitemap.xml",
    "url": "http://www.bbc.com/news/technology",
    "lastmod": "2024-12-01",
    "changefreq": "hourly",
    "priority": 0.9,
    "httpStatus": 301,
    "isAccessible": false,
    "finalUrl": "https://www.bbc.com/news/technology",
    "isRedirected": true,
    "responseTimeMs": 145,
    "isValidUrl": true,
    "issue": "Redirect — HTTP 301 to a different URL",
    "checkedAt": "2025-03-01T10:22:08.774Z"
}

Field	Type	Description
`sitemapUrl`	string	Source sitemap the URL was found in
`url`	string	Page URL as declared in the sitemap
`lastmod`	string	Last modified date from the sitemap
`changefreq`	string	Crawl frequency hint (daily, weekly, monthly, etc.)
`priority`	number	Sitemap priority value between 0.0 and 1.0
`httpStatus`	integer	HTTP status code returned (200, 301, 404, 500, etc.)
`isAccessible`	boolean	`true` if the URL returned a 2xx response
`finalUrl`	string	Destination URL after following redirects
`isRedirected`	boolean	`true` if the request was redirected
`responseTimeMs`	integer	Server response time in milliseconds
`isValidUrl`	boolean	`true` if the URL is a well-formed absolute URL
`issue`	string	Human-readable description of any detected problem
`checkedAt`	string	ISO 8601 timestamp of when the URL was checked

How It Works

Fetch the sitemap — The actor downloads the XML sitemap from the provided URL, handling both <urlset> (standard sitemap) and <sitemapindex> (index file with child sitemaps) formats
Parse all URLs — Every <loc> entry is extracted along with optional metadata: <lastmod>, <changefreq>, and <priority>
Recursively expand sitemap indexes — If the root sitemap is an index file, child sitemaps are fetched and parsed up to 3 levels deep, as seen on large sites like BBC and Shopify
Validate URLs — Each URL is checked for correct format (absolute http/https URL)
Check HTTP status — When checkUrls is enabled, the actor sends a HEAD request (falling back to GET for servers that reject HEAD) to each URL and records the status code, final URL, and response time
Report issues — Broken links (404), server errors (5xx), timeouts, redirects, and malformed URLs are flagged with a plain-English issue description
Push results — Each URL is stored as a separate dataset row for easy filtering, sorting, and export

Integrations

Connect XML Sitemap Validator with other apps and services using Apify integrations. You can integrate with Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and many more. You can also use webhooks to trigger actions whenever results are available.

For example, run the actor on a weekly schedule, pipe broken links directly into a Google Sheet, and send a Slack notification whenever new 404 errors are found — fully automated sitemap monitoring without writing a single line of glue code.

FAQ

Can this actor validate sitemap index files (nested sitemaps)? Yes. If the provided sitemap URL points to a <sitemapindex> document, the actor automatically fetches and validates all child sitemaps listed inside it — up to 3 levels deep. This covers large sites like BBC, Shopify, and TechCrunch that split their sitemaps across many files.

What is the difference between isAccessible and httpStatus? isAccessible is a boolean convenience field — it is true only when httpStatus is in the 200–299 range. httpStatus gives you the exact HTTP code so you can distinguish between a 301 permanent redirect and a 302 temporary redirect, or a 404 Not Found and a 410 Gone.

How many URLs can the actor check in a single run? The maxUrls input caps the number of URLs processed per run (default 100, maximum 10,000). For very large sitemaps, increase maxUrls and consider raising the timeoutSecs to give the actor enough time to complete.

Why does the actor use HEAD requests instead of GET requests? HEAD requests are faster and cheaper — they retrieve HTTP headers (including status code and redirect location) without downloading the full page body. The actor automatically falls back to GET if a server returns 405 Method Not Allowed for HEAD, which some servers do.

Can I use this for sitemap validation before a website migration? Absolutely. Run the actor against your current sitemap before migration, export the results to CSV or Google Sheets, then run it again after migration and compare to ensure all URLs still return 200 OK and no new broken links were introduced.

Sitemap API

vivid_astronaut/sitemap

Fabio Suizu

Sitemap Health Validator

predictable_function/my-actor

Validates sitemap.xml files and checks health of listed URLs

riya rawat

5.0

Sitemap Crawler - XML Sitemap URL Extractor

miccho27/sitemap-crawler

Extract all URLs from XML sitemaps (including sitemap index) and optionally audit each page

Tatsuya Mizuno

Sitemap URL Extractor

onescales/sitemap-url-extractor

Provide a website link to a sitemap.xml and the app will extract and list all URLs in the sitemap as well as additional data in the sitemap (i.e. https://onescales.com/sitemap.xml).

One Scales

564

5.0

Sitemap URL Extractor

getdataforu/sitemap-url-extractor

Provide a website link to a sitemap.xml and the app will extract and list all URLs in the sitemap as well as additional data in the sitemap (i.e. https://onescales.com/sitemap.xml).

EMT Crawler

5.0

Sitemap Scraper

pvillalva/sitemap-scraper

The Sitemap Scraper extracts and outputs all URLs from a given sitemap.

Percival Villalva

267

Sitemap to URL Crawler — Extract Sitemap.xml URLs for RAG

logiover/sitemap-to-url-crawler

Extract all URLs from any sitemap.xml recursively. Export sitemap URLs to CSV/JSON for RAG pipelines, SEO audits, and LLM training datasets.

Logiover

Sitemap Generator - Crawl Website & Create XML Sitemap

scrappy_garden/sitemap-generator

Generate an XML sitemap for any website. Crawls internal pages from start URLs (with depth + page limits), deduplicates URLs, and stores a ready-to-submit sitemap.xml plus a structured dataset and summary for SEO audits.

Bikram Adhikari

Sitemap URL Extractor - List All URLs in a Sitemap

dltik/sitemap-url-extractor

Extract every URL from any XML sitemap, with lastmod, changefreq and priority. Resolves sitemap indexes recursively. Pass a sitemap.xml or just a site root to auto-discover its sitemaps. Pure HTTP, no browser — fast and cheap.