Xml Sitemap Validator avatar

Xml Sitemap Validator

Pricing

$4.99/month + usage

Go to Apify Store
Xml Sitemap Validator

Xml Sitemap Validator

XML sitemap validator that crawls every URL in your sitemap and flags broken links, redirect chains, and structural errors — so SEO teams can audit sitemap health in seconds.

Pricing

$4.99/month + usage

Rating

0.0

(0)

Developer

ZeroBreak

ZeroBreak

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

XML Sitemap Validator — Find Broken Links, Redirects & Errors in Any Sitemap

XML Sitemap Validator is an Apify actor that fetches any XML sitemap, checks every listed URL for HTTP status codes, and produces a detailed per-URL report — just like the validation tools at xml-sitemaps.com or seoptimer.com's sitemap checker, but fully automated and exportable. Point it at a sitemap URL and get back a structured dataset showing which pages are accessible, which are broken (404), which redirect, and how fast each one loads.

Whether you're running an SEO audit on a large e-commerce site, validating a sitemap before a site migration, or monitoring URL health on a weekly schedule, this actor handles it all — including sitemap index files with nested child sitemaps.

Use Cases

  • SEO auditing — Automatically detect broken links and redirect chains that harm your search rankings before Google finds them first
  • Pre-launch validation — Crawl your sitemap after a redesign or CMS migration to confirm every URL returns 200 OK
  • Sitemap index support — Validate large sites like BBC or Shopify that split their sitemaps across dozens of child sitemap files
  • Response time monitoring — Flag slow-loading pages (high responseTimeMs) that may affect Core Web Vitals
  • Redirect chain detection — Identify URLs in your sitemap that still point to old addresses that have since been permanently moved
  • Scheduled health checks — Run on a cron trigger and pipe results to Google Sheets or Slack to monitor sitemap health over time

Input

ParameterTypeDefaultDescription
sitemapUrlstringRequired. URL of the XML sitemap to validate. Supports standard sitemaps and sitemap index files.
sitemapUrlsarray[]Additional sitemap URLs to validate in the same run.
checkUrlsbooleantrueFetch each listed URL to verify HTTP status. Disable to validate XML structure only.
followRedirectsbooleantrueFollow HTTP redirects and record the final destination URL.
concurrencyinteger10Number of URLs to check in parallel. Higher values are faster but may trigger rate limiting.
maxUrlsinteger100Maximum number of URLs to process per run. Set to 0 for no limit.
timeoutSecsinteger300Total actor runtime limit in seconds.
requestTimeoutSecsinteger30Per-URL request timeout in seconds. URLs exceeding this are flagged as timeouts.

Example Input — Validate a Single Sitemap

{
"sitemapUrl": "https://www.shopify.com/sitemap.xml",
"checkUrls": true,
"concurrency": 15,
"maxUrls": 200,
"requestTimeoutSecs": 20
}

Example Input — Validate Multiple Sitemaps at Once

{
"sitemapUrl": "https://www.bbc.com/sitemap.xml",
"sitemapUrls": [
"https://techcrunch.com/news-sitemap.xml",
"https://www.smashingmagazine.com/sitemap_index.xml"
],
"checkUrls": true,
"followRedirects": true,
"concurrency": 10,
"maxUrls": 500
}

Example Input — Structure-Only Validation (No URL Requests)

{
"sitemapUrl": "https://www.theverge.com/sitemap.xml",
"checkUrls": false
}

What Data Does This Actor Extract?

The actor stores one record per URL found in the sitemap. Each entry contains:

{
"sitemapUrl": "https://www.shopify.com/sitemap.xml",
"url": "https://www.shopify.com/blog/what-is-shopify",
"lastmod": "2024-11-15",
"changefreq": "weekly",
"priority": 0.8,
"httpStatus": 200,
"isAccessible": true,
"finalUrl": "https://www.shopify.com/blog/what-is-shopify",
"isRedirected": false,
"responseTimeMs": 312,
"isValidUrl": true,
"issue": "",
"checkedAt": "2025-03-01T10:22:05.412Z"
}
{
"sitemapUrl": "https://www.shopify.com/sitemap.xml",
"url": "https://www.shopify.com/blog/old-post-removed",
"lastmod": "2022-06-01",
"changefreq": "monthly",
"priority": 0.5,
"httpStatus": 404,
"isAccessible": false,
"finalUrl": "https://www.shopify.com/blog/old-post-removed",
"isRedirected": false,
"responseTimeMs": 198,
"isValidUrl": true,
"issue": "Broken link — page returned 404 Not Found",
"checkedAt": "2025-03-01T10:22:11.093Z"
}

Example — Redirect Detected

{
"sitemapUrl": "https://www.bbc.com/sitemap.xml",
"url": "http://www.bbc.com/news/technology",
"lastmod": "2024-12-01",
"changefreq": "hourly",
"priority": 0.9,
"httpStatus": 301,
"isAccessible": false,
"finalUrl": "https://www.bbc.com/news/technology",
"isRedirected": true,
"responseTimeMs": 145,
"isValidUrl": true,
"issue": "Redirect — HTTP 301 to a different URL",
"checkedAt": "2025-03-01T10:22:08.774Z"
}
FieldTypeDescription
sitemapUrlstringSource sitemap the URL was found in
urlstringPage URL as declared in the sitemap
lastmodstringLast modified date from the sitemap
changefreqstringCrawl frequency hint (daily, weekly, monthly, etc.)
prioritynumberSitemap priority value between 0.0 and 1.0
httpStatusintegerHTTP status code returned (200, 301, 404, 500, etc.)
isAccessiblebooleantrue if the URL returned a 2xx response
finalUrlstringDestination URL after following redirects
isRedirectedbooleantrue if the request was redirected
responseTimeMsintegerServer response time in milliseconds
isValidUrlbooleantrue if the URL is a well-formed absolute URL
issuestringHuman-readable description of any detected problem
checkedAtstringISO 8601 timestamp of when the URL was checked

How It Works

  1. Fetch the sitemap — The actor downloads the XML sitemap from the provided URL, handling both <urlset> (standard sitemap) and <sitemapindex> (index file with child sitemaps) formats
  2. Parse all URLs — Every <loc> entry is extracted along with optional metadata: <lastmod>, <changefreq>, and <priority>
  3. Recursively expand sitemap indexes — If the root sitemap is an index file, child sitemaps are fetched and parsed up to 3 levels deep, as seen on large sites like BBC and Shopify
  4. Validate URLs — Each URL is checked for correct format (absolute http/https URL)
  5. Check HTTP status — When checkUrls is enabled, the actor sends a HEAD request (falling back to GET for servers that reject HEAD) to each URL and records the status code, final URL, and response time
  6. Report issues — Broken links (404), server errors (5xx), timeouts, redirects, and malformed URLs are flagged with a plain-English issue description
  7. Push results — Each URL is stored as a separate dataset row for easy filtering, sorting, and export

Integrations

Connect XML Sitemap Validator with other apps and services using Apify integrations. You can integrate with Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and many more. You can also use webhooks to trigger actions whenever results are available.

For example, run the actor on a weekly schedule, pipe broken links directly into a Google Sheet, and send a Slack notification whenever new 404 errors are found — fully automated sitemap monitoring without writing a single line of glue code.

FAQ

Can this actor validate sitemap index files (nested sitemaps)? Yes. If the provided sitemap URL points to a <sitemapindex> document, the actor automatically fetches and validates all child sitemaps listed inside it — up to 3 levels deep. This covers large sites like BBC, Shopify, and TechCrunch that split their sitemaps across many files.

What is the difference between isAccessible and httpStatus? isAccessible is a boolean convenience field — it is true only when httpStatus is in the 200–299 range. httpStatus gives you the exact HTTP code so you can distinguish between a 301 permanent redirect and a 302 temporary redirect, or a 404 Not Found and a 410 Gone.

How many URLs can the actor check in a single run? The maxUrls input caps the number of URLs processed per run (default 100, maximum 10,000). For very large sitemaps, increase maxUrls and consider raising the timeoutSecs to give the actor enough time to complete.

Why does the actor use HEAD requests instead of GET requests? HEAD requests are faster and cheaper — they retrieve HTTP headers (including status code and redirect location) without downloading the full page body. The actor automatically falls back to GET if a server returns 405 Method Not Allowed for HEAD, which some servers do.

Can I use this for sitemap validation before a website migration? Absolutely. Run the actor against your current sitemap before migration, export the results to CSV or Google Sheets, then run it again after migration and compare to ensure all URLs still return 200 OK and no new broken links were introduced.