Sitemap Finder & Checker Tool avatar

Sitemap Finder & Checker Tool

Pricing

$4.99/month + usage

Go to Apify Store
Sitemap Finder & Checker Tool

Sitemap Finder & Checker Tool

Find, validate, and audit XML sitemaps for any website. Deep-checks accessibility, XML validity, response time, file size, encoding, and health score.

Pricing

$4.99/month + usage

Rating

0.0

(0)

Developer

ZeroBreak

ZeroBreak

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Categories

Share

The Sitemap Finder & Checker Tool is an advanced Apify actor that automatically discovers, validates, and audits XML sitemaps for any website. Whether you are running an SEO audit, monitoring sitemap health across multiple domains, or building a web crawling pipeline, this tool gives you a complete sitemap analysis in one run.

It goes beyond simple detection — it deep-checks each sitemap for accessibility, valid XML structure, correct content type, response speed, file size, encoding, redirect behavior, and assigns a health score (0–100) so you can instantly spot issues.


Why Use This Tool?

XML sitemaps are critical for search engine optimization (SEO). A broken, misconfigured, or missing sitemap can prevent Google, Bing, and other search engines from crawling and indexing your pages. This tool helps you:

  • Find hidden sitemaps that are not listed in robots.txt
  • Validate XML structure to ensure crawlers can parse your sitemap correctly
  • Detect misconfigurations like wrong content types, missing UTF-8 encoding, or unnecessary redirects
  • Measure performance with response time and file size metrics
  • Audit at scale — check hundreds of websites in a single bulk run
  • Get actionable health scores to prioritize which sitemaps need fixing

Features

  • Discovers sitemaps from robots.txt and 14+ common fallback paths (including WordPress, news, image, video sitemaps)
  • Classifies sitemap type: url_sitemap, sitemap_index, news_sitemap, image_sitemap, video_sitemap, text_sitemap
  • Validates XML structure and UTF-8 encoding declaration
  • Checks HTTP status, content type header, and XML content type
  • Counts <url> entries and <sitemap> sub-sitemap entries separately
  • Measures response time in milliseconds
  • Reports file size in human-readable format (B / KB / MB)
  • Detects redirects and reports the final resolved URL
  • Reports Last-Modified header for freshness analysis
  • Calculates a 0–100 health score per sitemap
  • Outputs a flat list — one row per sitemap for easy export to CSV, Google Sheets, or databases
  • Supports single URL or bulk URL input
  • Async and lightweight — processes multiple sites concurrently

Input

ParameterTypeRequiredDescription
urlstringNo*A single website URL to audit
urlsarrayNo*Multiple website URLs (one per line) for bulk audit
timeoutintegerNoRequest timeout per HTTP call in seconds (default: 30)

*At least one of url or urls must be provided.

URLs are automatically normalized — partial domains like example.com, www.example.com, or http://example.com all work.

Example Input

{
"url": "example.com",
"timeout": 30
}

Output

Each discovered sitemap produces one row in the output dataset — a flat list structure that is easy to filter, sort, and export.

FieldTypeDescription
source_websitestringThe normalized website URL that was audited
sitemap_urlstringThe discovered sitemap URL
health_scorenumberOverall health score from 0 to 100
is_accessiblebooleanWhether the sitemap returned HTTP 200
http_statusnumberThe HTTP status code returned by the server
is_valid_xmlbooleanWhether the content is a valid XML sitemap
sitemap_typestringType: url_sitemap, sitemap_index, news_sitemap, etc.
content_typestringThe Content-Type header from the server response
is_xml_content_typebooleanWhether Content-Type contains "xml"
has_utf8_encodingbooleanWhether XML declaration specifies UTF-8 encoding
url_countnumberNumber of <url> entries found
sitemap_countnumberNumber of <sitemap> sub-sitemap entries found
total_entriesnumberTotal entries (url_count + sitemap_count)
file_sizestringHuman-readable file size (e.g., "45.2 KB")
response_time_msnumberServer response time in milliseconds
last_modifiedstringLast-Modified header value (if provided by server)
is_redirectedbooleanWhether the sitemap URL was redirected
final_urlstringThe final resolved URL after redirects
found_in_robots_txtbooleanWhether this sitemap was declared in robots.txt
robots_txt_has_sitemapbooleanWhether the site's robots.txt contains any sitemap entries

Example Output

{
"source_website": "https://example.com",
"sitemap_url": "https://example.com/sitemap.xml",
"health_score": 95,
"is_accessible": true,
"http_status": 200,
"is_valid_xml": true,
"sitemap_type": "sitemap_index",
"content_type": "application/xml; charset=UTF-8",
"is_xml_content_type": true,
"has_utf8_encoding": true,
"url_count": 0,
"sitemap_count": 12,
"total_entries": 12,
"file_size": "3.4 KB",
"response_time_ms": 187,
"last_modified": "Mon, 17 Feb 2025 10:00:00 GMT",
"is_redirected": false,
"final_url": "https://example.com/sitemap.xml",
"found_in_robots_txt": true,
"robots_txt_has_sitemap": true
}

If no sitemaps are found, a single row is returned with sitemap_url set to "None found" and health_score set to 0.


Health Score Breakdown

The health score (0–100) is calculated based on these criteria:

CheckPoints
Sitemap is accessible (HTTP 200)+25
Valid XML structure+25
Correct XML content type+15
Listed in robots.txt+15
UTF-8 encoding declared+10
Has at least 1 entry+10
Slow response (> 3s)-10
Redirected URL-5

Use Cases

  • SEO Auditing — Quickly check if client websites have properly configured sitemaps before starting an SEO campaign.
  • Competitor Analysis — Discover how competitor sites structure their sitemaps, what types they use, and how many pages they index.
  • Site Migration Monitoring — Verify sitemaps still work correctly after a domain migration or CMS change.
  • Bulk Website Monitoring — Feed hundreds of URLs and get a CSV-ready report of sitemap health across all properties.
  • Web Crawling Pipelines — Use the output to feed sitemap URLs into downstream crawlers or scrapers.