Sitemap Validator
Pricing
$0.90 / 1,000 checked urls
Sitemap Validator
Validate XML sitemaps and sitemap indexes. Check listed URLs for HTTP status, redirects, final URL, response time, malformed URLs, and sitemap metadata.
Pricing
$0.90 / 1,000 checked urls
Rating
0.0
(0)
Developer
Maxime Dupré
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
🗺️ Sitemap validator for URL health checks
Sitemap Validator checks public XML sitemaps, sitemap indexes, website roots, bare domains, and robots.txt files. Add a target such as apify.com/sitemap.xml, and the Actor parses the sitemap, follows child sitemap indexes within your depth limit, checks the listed URLs, and saves one row per checked URL.
Use this sitemap validator when you need a fast technical SEO check before a migration, release, crawl-budget review, client audit, or broken-link cleanup. Each row keeps the page URL, source sitemap URL, parent sitemap index, HTTP status, final URL after redirects, redirect count, response time, sitemap metadata, and a plain issue category when something needs attention.
For a quick first run, keep the prefilled Apify sitemap target and the default Maximum checked URLs value. You will get a focused dataset you can inspect in Apify, export as JSON, CSV, Excel, XML, RSS, or HTML, or consume through the Apify API, schedules, webhooks, and integrations.
✅ What this Actor does
- Accepts direct sitemap URLs, sitemap-index URLs, website roots, bare domains, and
robots.txtURLs. - Discovers sitemap files from
robots.txtand common sitemap paths when you submit a website root. - Parses XML sitemap URL sets, XML sitemap indexes, plain-text sitemaps, and gzipped sitemap responses.
- Follows nested sitemap indexes up to your
Maximum index depth. - Checks sitemap-listed URLs for HTTP status, redirects, final URL, response time, malformed URLs, and network issues.
- Preserves sitemap-native
lastmod,changefreq, andpriorityvalues when the source sitemap provides them. - Saves one dataset row per checked sitemap-listed URL.
- Logs empty or unreachable targets without saving placeholder rows.
This Actor validates URLs that are already listed in public sitemap files. It does not crawl arbitrary internal links, scrape page content, generate sitemaps, submit sitemaps to search engines, or check whether search engines have indexed a URL.
📊 Data you get
Each dataset item represents one checked URL from a parsed sitemap. Rows include:
pageUrl- URL listed in the sitemap.host- host parsed from the listed URL.sourceSitemapUrl- sitemap file that declared the URL.parentSitemapIndexUrl- sitemap index that linked to the source sitemap, ornull.indexDepth- depth of the source sitemap below the submitted or discovered target.sitemapLastmod,changefreq, andpriority- sitemap metadata when present.urlStatus-ok,redirect,broken,timeout, ormalformed.httpStatus- observed HTTP status, ornullwhen no response was available.finalUrl- final URL after redirects, ornullwhen unavailable.redirectCount- number of redirects followed.responseTimeMs- elapsed time for the URL check.issue- issue category and message, ornullfor healthy URLs.
🚀 How to run it
- Open the Input tab.
- Add one or more sitemap, website, domain, or
robots.txttargets. - Keep
Maximum checked URLssmall for your first run, then raise it when the output looks right. - Use
Maximum index depthto control nested sitemap-index expansion. Use0to check only the submitted sitemap target. - Run the Actor and open the dataset.
No cookies, login credentials, source API key, or custom proxy settings are needed from you. Targets must expose public sitemap assets over http or https.
✍️ Input example
{"targets": ["https://apify.com/sitemap.xml","https://apify.com","example.com/robots.txt"],"maxCheckedUrls": 550,"maxIndexDepth": 2}
Sitemap or website targets is the only required input. You can mix known sitemap URLs, sitemap indexes, website roots, bare domains, and robots.txt URLs in the same run.
Maximum checked URLs caps how many sitemap-listed URLs are checked across all targets. Large sitemap indexes can contain thousands of URLs, so this limit keeps first runs predictable.
Maximum index depth controls how many sitemap-index levels are followed. A value of 2 covers common sitemap index structures. A value of 0 keeps validation to the submitted or directly discovered sitemap.
📦 Output example
{"pageUrl": "https://apify.com/actors","host": "apify.com","sourceSitemapUrl": "https://apify.com/sitemap/pages.xml","parentSitemapIndexUrl": "https://apify.com/sitemap.xml","indexDepth": 1,"sitemapLastmod": "2026-06-20T15:31:00.000Z","changefreq": "weekly","priority": 0.8,"urlStatus": "redirect","httpStatus": 301,"finalUrl": "https://apify.com/store","redirectCount": 1,"responseTimeMs": 184,"issue": {"category": "redirect","message": "Sitemap URL redirects to a different final URL."}}
Healthy URLs use urlStatus: "ok" and issue: null. Redirects, broken responses, timeouts, network issues, and malformed sitemap-listed URLs are still saved as validation results because they are the rows you need to review.
💳 Pricing
This Actor uses pay-per-event pricing. You are charged once for each sitemap-listed URL checked and saved to the dataset. The pricing event is called Checked URL.
Failed target discovery, unreachable sitemap files, empty sitemaps, and invalid submitted targets are logged and skipped instead of being saved as charged output rows.
⚠️ Limits and caveats
- Sitemap files must be publicly reachable over
httporhttps. - The Actor checks URLs listed in sitemaps. It does not crawl pages that are not listed in a sitemap.
- Sitemap metadata is only as complete as the source file. Missing
lastmod,changefreq, orpriorityvalues are returned asnull. - Very large sitemap indexes can contain many child sitemaps and URLs. Use
Maximum checked URLsandMaximum index depthto keep runs bounded. - HTTP status and response time are observed at run time and can change as the source site changes.
- The Actor reports URL health signals. It does not prove that Google, Bing, or another search engine has indexed the URL.
❓ FAQ
🔐 Do I need login credentials or an API key?
No. Sitemap Validator reads public sitemap assets and checks public URLs. You do not need to provide cookies, login credentials, a source API key, or custom proxy settings.
🧭 Can it crawl my whole website?
No. It checks URLs found in sitemap files. If you need a rendered page crawl and link map, use Website URL Crawler.
🧩 Can it validate sitemap indexes?
Yes. The Actor parses sitemap indexes and follows child sitemaps up to your Maximum index depth.
📉 Why did my run save no rows?
The submitted target may not expose a public sitemap, the sitemap may be empty, or the target may be unreachable at run time. Those cases are logged and skipped instead of creating placeholder dataset rows.
📝 Changelog
- 0.0: Initial release.
🆘 Support
For issues, questions, or feature requests, file a ticket and I'll fix or implement it in less than 24h 🫡
🔗 Other actors
- Sitemap Sniffer ↗ - Find sitemap files and export sitemap URL inventory before validation.
- Website URL Crawler ↗ - Crawl rendered public pages and export a website link map.
- Webpage Text Extractor ↗ - Extract clean text or Markdown from public web pages.
- SSL Certificate Checker ↗ - Check public HTTPS certificates, expiry, trust, and TLS details.
- Robots.txt Generator ↗ - Generate deployable
robots.txtfiles with sitemap directives.
Made with ❤️ by Maxime Dupré