Sitemap Finder & Checker Tool
Pricing
$4.99/month + usage
Sitemap Finder & Checker Tool
Find, validate, and audit XML sitemaps for any website. Deep-checks accessibility, XML validity, response time, file size, encoding, and health score.
Pricing
$4.99/month + usage
Rating
0.0
(0)
Developer

ZeroBreak
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 days ago
Last modified
Categories
Share
The Sitemap Finder & Checker Tool is an advanced Apify actor that automatically discovers, validates, and audits XML sitemaps for any website. Whether you are running an SEO audit, monitoring sitemap health across multiple domains, or building a web crawling pipeline, this tool gives you a complete sitemap analysis in one run.
It goes beyond simple detection — it deep-checks each sitemap for accessibility, valid XML structure, correct content type, response speed, file size, encoding, redirect behavior, and assigns a health score (0–100) so you can instantly spot issues.
Why Use This Tool?
XML sitemaps are critical for search engine optimization (SEO). A broken, misconfigured, or missing sitemap can prevent Google, Bing, and other search engines from crawling and indexing your pages. This tool helps you:
- Find hidden sitemaps that are not listed in
robots.txt - Validate XML structure to ensure crawlers can parse your sitemap correctly
- Detect misconfigurations like wrong content types, missing UTF-8 encoding, or unnecessary redirects
- Measure performance with response time and file size metrics
- Audit at scale — check hundreds of websites in a single bulk run
- Get actionable health scores to prioritize which sitemaps need fixing
Features
- Discovers sitemaps from
robots.txtand 14+ common fallback paths (including WordPress, news, image, video sitemaps) - Classifies sitemap type:
url_sitemap,sitemap_index,news_sitemap,image_sitemap,video_sitemap,text_sitemap - Validates XML structure and UTF-8 encoding declaration
- Checks HTTP status, content type header, and XML content type
- Counts
<url>entries and<sitemap>sub-sitemap entries separately - Measures response time in milliseconds
- Reports file size in human-readable format (B / KB / MB)
- Detects redirects and reports the final resolved URL
- Reports
Last-Modifiedheader for freshness analysis - Calculates a 0–100 health score per sitemap
- Outputs a flat list — one row per sitemap for easy export to CSV, Google Sheets, or databases
- Supports single URL or bulk URL input
- Async and lightweight — processes multiple sites concurrently
Input
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | No* | A single website URL to audit |
urls | array | No* | Multiple website URLs (one per line) for bulk audit |
timeout | integer | No | Request timeout per HTTP call in seconds (default: 30) |
*At least one of url or urls must be provided.
URLs are automatically normalized — partial domains like example.com, www.example.com, or http://example.com all work.
Example Input
{"url": "example.com","timeout": 30}
Output
Each discovered sitemap produces one row in the output dataset — a flat list structure that is easy to filter, sort, and export.
| Field | Type | Description |
|---|---|---|
source_website | string | The normalized website URL that was audited |
sitemap_url | string | The discovered sitemap URL |
health_score | number | Overall health score from 0 to 100 |
is_accessible | boolean | Whether the sitemap returned HTTP 200 |
http_status | number | The HTTP status code returned by the server |
is_valid_xml | boolean | Whether the content is a valid XML sitemap |
sitemap_type | string | Type: url_sitemap, sitemap_index, news_sitemap, etc. |
content_type | string | The Content-Type header from the server response |
is_xml_content_type | boolean | Whether Content-Type contains "xml" |
has_utf8_encoding | boolean | Whether XML declaration specifies UTF-8 encoding |
url_count | number | Number of <url> entries found |
sitemap_count | number | Number of <sitemap> sub-sitemap entries found |
total_entries | number | Total entries (url_count + sitemap_count) |
file_size | string | Human-readable file size (e.g., "45.2 KB") |
response_time_ms | number | Server response time in milliseconds |
last_modified | string | Last-Modified header value (if provided by server) |
is_redirected | boolean | Whether the sitemap URL was redirected |
final_url | string | The final resolved URL after redirects |
found_in_robots_txt | boolean | Whether this sitemap was declared in robots.txt |
robots_txt_has_sitemap | boolean | Whether the site's robots.txt contains any sitemap entries |
Example Output
{"source_website": "https://example.com","sitemap_url": "https://example.com/sitemap.xml","health_score": 95,"is_accessible": true,"http_status": 200,"is_valid_xml": true,"sitemap_type": "sitemap_index","content_type": "application/xml; charset=UTF-8","is_xml_content_type": true,"has_utf8_encoding": true,"url_count": 0,"sitemap_count": 12,"total_entries": 12,"file_size": "3.4 KB","response_time_ms": 187,"last_modified": "Mon, 17 Feb 2025 10:00:00 GMT","is_redirected": false,"final_url": "https://example.com/sitemap.xml","found_in_robots_txt": true,"robots_txt_has_sitemap": true}
If no sitemaps are found, a single row is returned with sitemap_url set to "None found" and health_score set to 0.
Health Score Breakdown
The health score (0–100) is calculated based on these criteria:
| Check | Points |
|---|---|
| Sitemap is accessible (HTTP 200) | +25 |
| Valid XML structure | +25 |
| Correct XML content type | +15 |
| Listed in robots.txt | +15 |
| UTF-8 encoding declared | +10 |
| Has at least 1 entry | +10 |
| Slow response (> 3s) | -10 |
| Redirected URL | -5 |
Use Cases
- SEO Auditing — Quickly check if client websites have properly configured sitemaps before starting an SEO campaign.
- Competitor Analysis — Discover how competitor sites structure their sitemaps, what types they use, and how many pages they index.
- Site Migration Monitoring — Verify sitemaps still work correctly after a domain migration or CMS change.
- Bulk Website Monitoring — Feed hundreds of URLs and get a CSV-ready report of sitemap health across all properties.
- Web Crawling Pipelines — Use the output to feed sitemap URLs into downstream crawlers or scrapers.