Pricing

from $2.00 / 1,000 results

Sitemap Audit

Get a Sitemap Health Score (0-100) for any website. Discover, parse, and validate XML sitemaps. Find 404s, redirects, canonical mismatches, noindex conflicts, hreflang issues, missing pages and estimate crawl budget waste.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Andy Page

Actor stats

Bookmarked

Total users

Monthly active users

5 months ago

Last modified

Sitemap Audit: Health Score, SEO Validator & Missing Pages Finder

Audit one or many websites' XML sitemaps end-to-end. Get a Sitemap Health Score (0-100) for each domain, validate every URL for 404s, redirects, and robots.txt conflicts, detect canonical mismatches and noindex-in-sitemap conflicts, find pages missing from sitemaps, estimate crawl budget waste, and get prioritized SEO recommendations to fix everything.

Why Use This Actor?

Health Score — A single 0-100 metric that instantly tells you how healthy your sitemaps are, with error/warning counts and a detailed breakdown
Auto-discovery — Finds all sitemaps from robots.txt and 6 common sitemap paths automatically
Full URL validation — HEAD-checks every URL for 404s, 301/302 redirects, 5xx server errors, and robots.txt conflicts
Deep page inspection — Fetches a sample of pages to detect canonical tag mismatches, noindex directives on URLs in the sitemap, and hreflang inconsistencies
Missing pages finder — Crawls your site to find linked pages that aren't in any sitemap
Crawl budget analysis — Estimates how much of Google's crawl budget is wasted on broken, redirected, or blocked URLs
Prioritized recommendations — Get actionable fixes sorted by impact: critical → high → medium → low
Letter grade (A-F) — Instantly sort and filter sites by grade. Perfect for batch audits and cold outreach
Outreach-ready output — Pre-written executive summary and cold email snippet for every domain. Copy-paste into your outreach sequence
Sitemap freshness analysis — Detects stale sitemaps (e.g., "hasn't been updated in 247 days") and protocol mismatches (HTTP vs HTTPS)
Hreflang support — Full hreflang validation for international sites, including cross-checking XML annotations against on-page tags
CSV export — Flattened output for Google Sheets, Excel, or data pipelines

Features

Feature	Description
Sitemap Health Score	0-100 score measuring overall sitemap quality
Auto-Discovery	Finds sitemaps from robots.txt + 6 common paths
XML/Gzip Parsing	Handles urlset, sitemap index, gzipped, and image sitemaps
URL Validation	HEAD-checks every URL for status codes, redirects, and robots.txt conflicts
Duplicate Detection	Finds duplicate URLs (case-insensitive, trailing slash normalization)
Deep Page Inspection	Checks canonical tags, meta robots noindex, and hreflang on actual pages
Missing Pages Finder	Crawls the site to discover pages not in any sitemap
Crawl Budget Waste	Estimates wasted crawl budget from 404s, redirects, blocked, and noindexed URLs
Hreflang Validation	Cross-checks sitemap hreflang annotations against on-page hreflang tags
Error Classification	Categorizes issues as errors (critical) or warnings (non-critical)
Recommendations Engine	Prioritized, actionable SEO fix recommendations
URL Distribution	Shows URL breakdown by path prefix and priority value
Letter Grade (A-F)	Instant letter grade for quick sorting in spreadsheets
Executive Summary	2-3 sentence audit summary — ready to paste into emails or reports
Outreach Snippet	Pre-written cold email paragraph for each domain
Top 3 Issues	Quick wins list with priority, count, effort, and impact
Issue Severity Counts	Critical/High/Medium/Low counts at top level for easy filtering
Sitemap Freshness	Days since last update, stale URL count, freshness verdict
Protocol Check	Detects HTTP/HTTPS mismatches in sitemap URLs
Proxy Support	Optional Apify proxy integration for rate-limited sites
CSV Export	Flattened output for spreadsheet analysis

Input Parameters

Parameter	Type	Required	Default	Description
`urls`	string[]	Yes	—	Websites to audit (e.g., `["https://example.com", "https://test.org"]`). One report per domain.
`sitemapUrls`	string[]	No	`[]`	Explicit sitemap URLs to audit (added to auto-discovered)
`autoDiscover`	boolean	No	`true`	Auto-discover sitemaps from robots.txt and common paths
`validateUrls`	boolean	No	`true`	HEAD-check every URL for status codes and redirects
`deepInspection`	boolean	No	`true`	Inspect pages for canonical, noindex, and hreflang issues
`inspectionSampleSize`	integer	No	`100`	Pages to inspect (10-500, used when deepInspection is on)
`crawlDepth`	integer	No	`3`	Crawl depth for missing pages discovery (0 = disabled)
`maxPagesToCrawl`	integer	No	`500`	Max pages to crawl for missing pages (50-5000)
`maxUrlsToValidate`	integer	No	`2000`	Max URLs to validate via HEAD requests per domain (100-50000). If the sitemap has more, a random sample is validated.
`maxUrlsToParse`	integer	No	`50000`	Max URLs to collect during sitemap parsing (1000-500000). Prevents out-of-memory on enterprise-scale sitemaps.
`maxConcurrency`	integer	No	`10`	Max concurrent HTTP requests (1-50)
`outputFormat`	string	No	`"json"`	Output format: `json` or `csv`
`proxyConfig`	object	No	Apify residential	Proxy configuration for HTTP requests

Example Input

{
    "urls": ["https://example.com", "https://blog.example.com"],
    "validateUrls": true,
    "deepInspection": true,
    "inspectionSampleSize": 100,
    "crawlDepth": 3,
    "maxPagesToCrawl": 500,
    "maxConcurrency": 10,
    "outputFormat": "json"
}

Output Schema

Each domain produces one comprehensive audit report in the dataset (so auditing 5 domains produces 5 dataset items):

{
    "domain": "example.com",
    "url": "https://example.com",
    "auditDate": "2026-02-17T12:00:00.000Z",
    "healthScore": 78,
    "grade": "C",
    "gradeLabel": "Needs Improvement",
    "hasSitemap": true,
    "totalSitemaps": 3,
    "totalUrls": 1250,

    "executiveSummary": "example.com scored 78/100 (Grade: C) on its sitemap health audit with 1,250 URLs across its sitemaps. We found 33 errors, 37 warnings, 42 pages missing from the sitemap. This wastes approximately 340 crawl requests/month (5.2% of estimated budget).",

    "outreachSnippet": "I ran a quick technical SEO audit on example.com and found some issues with your XML sitemap that could be hurting your search rankings. Your sitemap scored 78/100 (Grade C). Top issues: remove dead URLs from sitemap, fix robots.txt conflicts, and update redirecting URLs in sitemap. This is wasting roughly 340 Google crawl requests per month on dead or misconfigured URLs. We also found 42 pages on your site that aren't in the sitemap at all.",

    "topIssues": [
        {
            "title": "Remove dead URLs from sitemap",
            "description": "25 URL(s) return 404/410 status. Remove them from the sitemap to stop wasting crawl budget.",
            "priority": "critical",
            "count": 25,
            "effort": "medium",
            "impact": "high"
        },
        {
            "title": "Fix robots.txt conflicts",
            "description": "3 URL(s) are blocked by robots.txt but included in the sitemap.",
            "priority": "critical",
            "count": 3,
            "effort": "low",
            "impact": "high"
        },
        {
            "title": "Add missing pages to sitemap",
            "description": "42 page(s) found on the site are missing from the sitemap.",
            "priority": "high",
            "count": 42,
            "effort": "medium",
            "impact": "medium"
        }
    ],

    "issueSeverityCounts": {
        "critical": 2,
        "high": 3,
        "medium": 4,
        "low": 2,
        "total": 11
    },

    "sitemapFreshness": {
        "hasLastmod": true,
        "urlsWithLastmod": 1100,
        "urlsWithoutLastmod": 150,
        "pctWithoutLastmod": 12,
        "newestLastmod": "2026-01-15T00:00:00.000Z",
        "oldestLastmod": "2024-03-01T00:00:00.000Z",
        "daysSinceLastUpdate": 33,
        "staleUrlCount": 200,
        "staleUrlPct": 16,
        "veryStaleUrlCount": 50,
        "freshnessVerdict": "Sitemap was updated 33 days ago. Consider more frequent updates."
    },

    "protocolIssues": {
        "httpUrlCount": 0,
        "httpsUrlCount": 1250,
        "hasMixedProtocol": false,
        "siteUsesHttps": true,
        "sampleHttpUrls": [],
        "verdict": "All sitemap URLs use HTTPS. ✓"
    },

    "sitemapsDiscovered": [
        "https://example.com/sitemap.xml",
        "https://example.com/post-sitemap.xml",
        "https://example.com/page-sitemap.xml"
    ],
    "sitemapDetails": [
        {
            "url": "https://example.com/sitemap.xml",
            "type": "index",
            "urlCount": 0,
            "imageCount": 0,
            "hreflangCount": 0,
            "byteLength": 1024,
            "gzipped": false,
            "error": null
        },
        {
            "url": "https://example.com/post-sitemap.xml",
            "type": "urlset",
            "urlCount": 850,
            "imageCount": 200,
            "hreflangCount": 0,
            "byteLength": 45000,
            "gzipped": false,
            "error": null
        }
    ],

    "urlValidation": {
        "total": 1250,
        "status2xx": 1180,
        "status3xx": 35,
        "status4xx": 25,
        "status5xx": 5,
        "blockedByRobots": 3,
        "duplicateCount": 2,
        "details": [
            {
                "url": "https://example.com/old-page",
                "statusCode": 404,
                "error": null
            },
            {
                "url": "https://example.com/moved-page",
                "statusCode": 301,
                "redirectUrl": "https://example.com/new-page",
                "isRedirect": true
            }
        ]
    },

    "deepInspection": {
        "sampleSize": 100,
        "totalUrls": 1250,
        "summary": {
            "canonicalMatch": 85,
            "canonicalMismatch": 8,
            "canonicalMissing": 7,
            "noindexCount": 3,
            "hreflangMissingOnPage": 0
        },
        "details": [
            {
                "url": "https://example.com/product/123",
                "canonical": {
                    "status": "mismatch",
                    "expected": "https://example.com/product/123",
                    "found": "https://example.com/product/123?ref=home"
                },
                "noindex": { "hasNoindex": false },
                "hreflang": { "missingOnPage": [] }
            }
        ]
    },

    "missingPages": {
        "crawledCount": 500,
        "missingCount": 42,
        "pages": [
            "https://example.com/about",
            "https://example.com/contact",
            "https://example.com/blog/popular-post"
        ]
    },

    "errors": [
        {
            "type": "url_404",
            "severity": "error",
            "message": "URL returns 404",
            "url": "https://example.com/old-page",
            "statusCode": 404
        }
    ],
    "errorCount": 33,
    "warnings": [
        {
            "type": "url_redirect",
            "severity": "warning",
            "message": "URL redirects to another location",
            "url": "https://example.com/moved-page",
            "redirectUrl": "https://example.com/new-page"
        }
    ],
    "warningCount": 37,

    "crawlBudgetWaste": {
        "wastedUrls": 70,
        "wastePercentage": 5.6,
        "siteSizeTier": "medium",
        "breakdown": {
            "notFound": 25,
            "redirects": 35,
            "blocked": 3,
            "noindex": 7
        }
    },

    "urlDistribution": {
        "byPathPrefix": {
            "/blog": 400,
            "/product": 350,
            "/category": 200,
            "/page": 150,
            "/other": 150
        },
        "byPriority": {
            "1.0": 10,
            "0.8": 200,
            "0.5": 800,
            "none": 240
        }
    },

    "recommendations": [
        {
            "priority": "critical",
            "category": "broken_urls",
            "title": "Remove 25 URLs returning 404 from sitemaps",
            "description": "25 URLs in your sitemaps return HTTP 404 (Not Found). These waste crawl budget and signal poor sitemap maintenance to search engines.",
            "affectedUrls": 25
        },
        {
            "priority": "high",
            "category": "redirects",
            "title": "Update 35 redirecting URLs to final destinations",
            "description": "35 URLs redirect to other locations. Replace them with the final destination URLs to avoid wasting crawl budget on redirect chains.",
            "affectedUrls": 35
        },
        {
            "priority": "medium",
            "category": "missing_pages",
            "title": "Add 42 linked pages to your sitemap",
            "description": "42 pages are linked from your site but missing from all sitemaps. Adding them helps search engines discover and index these pages faster.",
            "affectedUrls": 42
        }
    ],

    "robotsTxtFound": true,
    "settings": {
        "validateUrls": true,
        "deepInspection": true,
        "crawlDepth": 3,
        "maxConcurrency": 10,
        "maxPagesToCrawl": 500,
        "maxUrlsToParse": 50000,
        "inspectionSampleSize": 100,
        "maxUrlsToValidate": 2000,
        "outputFormat": "json"
    },
    "elapsedMs": 45000
}

Sitemap Health Score & Grade

The Health Score is a 0-100 metric that measures how well-maintained your sitemaps are. Each score maps to a letter grade for instant readability in spreadsheets and outreach:

Score	Grade	Label	Meaning
90-100	A	Excellent	Sitemaps are clean and well-maintained
80-89	B	Good	Minor issues to address
65-79	C	Needs Improvement	Significant problems impacting crawl efficiency
45-64	D	Poor	Serious issues, sitemaps need major cleanup
0-44	F	Critical	Sitemaps are causing active SEO damage

What Reduces the Score

Issue	Impact	Severity
URLs returning 404	-5 per URL (capped)	Error
URLs returning 5xx	-5 per URL (capped)	Error
URLs blocked by robots.txt	-5 per URL	Error
Canonical tag mismatches	-3 per URL	Error
Noindex pages in sitemap	-5 per URL	Error
Redirecting URLs (301/302)	-2 per URL (capped)	Warning
Duplicate URLs	-1 per URL	Warning
Missing lastmod metadata	-5 flat	Warning
Uniform/missing priority	-3 flat	Warning
>10% error rate bonus penalty	-10	—

Example Use Cases

1. Quick SEO Health Check

Get an instant health score for any site:

{
    "urls": ["https://example.com"]
}

2. Batch Audit Multiple Domains

Audit your site and all your competitors in a single run:

{
    "urls": [
        "https://mysite.com",
        "https://competitor1.com",
        "https://competitor2.com",
        "https://competitor3.com"
    ]
}

3. Parse-Only Audit (No Network Requests to URLs)

Fast audit that only discovers and parses sitemaps without validating URLs:

{
    "urls": ["https://example.com"],
    "validateUrls": false,
    "deepInspection": false,
    "crawlDepth": 0
}

4. Audit Specific Sitemaps

Audit specific sitemap URLs at non-standard locations:

{
    "urls": ["https://example.com"],
    "sitemapUrls": [
        "https://cdn.example.com/sitemaps/main.xml",
        "https://example.com/custom-sitemap.xml.gz"
    ],
    "autoDiscover": false
}

5. Deep Inspection for International Site

Full hreflang validation for a multi-language site:

{
    "urls": ["https://example.com"],
    "deepInspection": true,
    "inspectionSampleSize": 200,
    "crawlDepth": 0
}

6. Large Site Audit with High Throughput

Audit a large site with aggressive concurrency and generous parsing limits:

{
    "urls": ["https://large-site.com"],
    "maxConcurrency": 20,
    "maxPagesToCrawl": 2000,
    "maxUrlsToValidate": 5000,
    "maxUrlsToParse": 100000,
    "inspectionSampleSize": 300,
    "crawlDepth": 2
}

7. CSV Export for Reporting

Generate a CSV for spreadsheet analysis:

{
    "urls": ["https://example.com", "https://blog.example.com"],
    "outputFormat": "csv"
}

8. Agency Client Batch Audit

Audit all client sites in one run with minimal resource usage:

{
    "urls": [
        "https://client1.com",
        "https://client2.com",
        "https://client3.com",
        "https://client4.com",
        "https://client5.com"
    ],
    "deepInspection": false,
    "crawlDepth": 0,
    "maxConcurrency": 5
}

9. Cold Outreach Lead Qualification (SEO Agencies)

Run a batch of prospect websites, then export the CSV. Sort by grade (D/F = hottest leads), and use the outreachSnippet column directly in your email sequences:

{
    "urls": [
        "https://prospect1.com",
        "https://prospect2.com",
        "https://prospect3.com",
        "https://prospect4.com",
        "https://prospect5.com"
    ],
    "validateUrls": true,
    "deepInspection": false,
    "crawlDepth": 0,
    "maxConcurrency": 10,
    "outputFormat": "csv"
}

Outreach workflow:

Load 50-500 prospect domains into the urls array
Run the actor → export dataset as CSV
Open in Google Sheets → sort by grade column (D/F first)
Filter by issueSeverityCounts.critical > 0 for the strongest leads
Copy the outreachSnippet column into your cold email tool — each snippet is a personalized paragraph citing the prospect's specific issues

Key outreach columns in the output:

Column	What It Contains
`grade`	Letter grade A-F — sort by this to find worst sites
`hasSitemap`	`false` = "You don't even have a sitemap" = easiest pitch
`executiveSummary`	2-3 sentence summary for reports/Slack
`outreachSnippet`	Ready-to-paste cold email paragraph with specific issues
`topIssues`	Top 3 issues with titles — perfect for email bullet points
`issueSeverityCounts.critical`	Number of critical issues — higher = hotter lead
`sitemapFreshness.daysSinceLastUpdate`	"Your sitemap hasn't been updated in X days"
`sitemapFreshness.freshnessVerdict`	Human-readable freshness assessment
`protocolIssues.verdict`	HTTP/HTTPS mismatch detection

How It Works

The actor runs a 7-phase pipeline:

Discover — Fetches robots.txt, extracts Sitemap: directives, and probes 6 common sitemap paths (/sitemap.xml, /sitemap_index.xml, /sitemap.xml.gz, /sitemaps/sitemap.xml, /sitemap/sitemap.xml, /wp-sitemap.xml). Merges with any explicit sitemap URLs you provide.
Parse — Fetches and parses each sitemap. Handles standard <urlset>, <sitemapindex> (recursively follows child sitemaps), gzipped sitemaps, <image:image> entries, and <xhtml:link> hreflang annotations. Uses fast-xml-parser for robust XML handling.
Validate — Sends HEAD requests to every URL with configurable concurrency. Checks HTTP status codes, follows redirects, detects robots.txt conflicts using the parsed rules, and identifies duplicate URLs (case-insensitive, trailing-slash normalized).
Inspect — Fetches the full HTML of a stratified sample of pages. Extracts <link rel="canonical">, <meta name="robots">, and <link rel="alternate" hreflang="..."> tags. Cross-checks canonicals against sitemap URLs, detects noindex directives, and validates hreflang consistency.
Find Missing Pages — Uses Crawlee's CheerioCrawler to spider the site up to a configurable depth. Compares crawled URLs against the sitemap URL set to find pages that exist on the site but are missing from all sitemaps.
Analyze — Classifies every issue as an error or warning, calculates the health score, estimates crawl budget waste, and computes URL distribution statistics.
Recommend — Generates prioritized recommendations sorted by impact (critical → high → medium → low), with affected URL counts and actionable descriptions.

Troubleshooting

No sitemaps found

Check robots.txt: The actor looks for Sitemap: directives in robots.txt first. If your sitemap is at a non-standard location, use the sitemapUrls input parameter.
WordPress: Most WordPress sites have sitemaps at /wp-sitemap.xml or /sitemap.xml — both are checked automatically.
Blocked by robots.txt: Some sites block the sitemap URL itself in robots.txt (uncommon but it happens).

Low health score but site seems fine

Redirects count: If your site recently migrated URLs, old URLs in the sitemap will be flagged as redirects (warnings). Update the sitemap with final URLs.
Staging content: Sometimes sitemaps include staging or draft URLs that return 404.
CDN issues: Some CDNs return different status codes for HEAD vs GET requests. The actor uses HEAD for efficiency.

Actor runs slowly

Large sitemaps: Sites with 50K+ URLs take longer. Increase maxConcurrency (up to 50) for faster validation, or reduce maxUrlsToValidate to validate a random sample instead of every URL.
Enterprise-scale sitemaps: Sites with 100K+ URLs across dozens of sub-sitemaps can be capped with maxUrlsToParse to prevent memory issues while still producing a useful audit.
Deep inspection: Reduce inspectionSampleSize if you don't need comprehensive page-level analysis.
Crawl depth: Set crawlDepth to 0 to skip the missing pages crawl entirely.

Running locally

Install dependencies and run:

cd actors/sitemap-audit
npm install
echo '{"urls":["https://example.com"]}' | npx apify-cli run --purge

Or run unit tests:

$npm test

Limitations

Sequential domain processing — Multiple domains are audited one after another (not in parallel) to keep memory usage predictable. For massive batches (100+ domains), consider splitting into multiple runs.
HEAD request accuracy — Some servers handle HEAD differently than GET. Deep inspection (which uses GET) catches cases where HEAD returns incorrect status codes.
Dynamic content — Pages rendered entirely via JavaScript may not have proper canonical/noindex tags visible to the HTML parser. Consider using a browser-based crawler for heavy SPA sites.
Rate limiting — High concurrency on small servers may trigger rate limiting. Reduce maxConcurrency or enable proxy if you see many 429 responses.
Sitemap size limit — Individual sitemaps larger than 50 MB are skipped (the XML sitemap spec recommends max 50 MB uncompressed). Additionally, total URLs collected across all sub-sitemaps are capped at maxUrlsToParse (default 50,000) to prevent out-of-memory crashes on enterprise-scale sitemap indexes.
Missing pages accuracy — The missing pages finder only discovers pages reachable via internal links within the configured crawl depth. Orphan pages with no internal links won't be found.

FAQ

Q: How long does a typical run take? A: Depends on the site size and settings. A small site (< 1K URLs) with full validation takes 30-60 seconds. A large site (50K URLs) with maxConcurrency: 20 takes 5-15 minutes. Parse-only audits (validation/inspection/crawl disabled) complete in seconds.

Q: Do I need proxies? A: Usually not for sitemap auditing. Proxies help if the target site rate-limits or blocks datacenter IPs. The actor defaults to Apify residential proxies when running on the platform.

Q: What if the site has no sitemap? A: The actor reports totalSitemaps: 0 and generates a recommendation to create one. If crawlDepth > 0, it still crawls the site to show what pages exist that should be in a sitemap.

Q: What's the difference between errors and warnings? A: Errors are issues that directly harm SEO: 404 URLs, robots.txt conflicts, canonical mismatches, noindex pages in sitemap, server errors. Warnings are suboptimal but less damaging: redirects, duplicates, missing metadata.

Q: How is crawl budget waste calculated? A: It counts URLs that waste Googlebot's crawl budget: 404s, redirects, robots-blocked URLs, and noindexed pages in the sitemap. The waste percentage is wastedUrls / totalUrls × 100.

Q: Can I audit a sitemap at a non-standard URL? A: Yes — use the sitemapUrls parameter to provide specific sitemap URLs. Set autoDiscover: false to skip the standard discovery process.

Q: Can I use this for cold outreach / lead generation? A: Yes — that's a primary use case. Load prospect domains into the urls array, export the CSV, sort by grade (D/F = hottest leads), and use the outreachSnippet column in your email sequences. Each snippet is a personalized paragraph citing the prospect's specific sitemap issues.

Q: What's the outreachSnippet field? A: A pre-written cold email paragraph customized for each domain. It mentions the specific health score, top issues, crawl budget waste, and missing pages — ready to paste into any cold email tool.

Q: Does this check hreflang tags? A: Yes. It reads hreflang annotations from the XML sitemap (<xhtml:link>) and, during deep inspection, cross-checks them against the <link rel="alternate" hreflang="..."> tags found on the actual pages.

Dataset Views

The Apify Console provides two pre-configured table views:

Overview — Domain, Grade, Score, Has Sitemap, URLs, Critical/High issue counts, Errors, Warnings, Days Since Update, Budget Waste %, Executive Summary
Outreach View — Domain, Grade, Score, Has Sitemap, Critical/High/Total issue counts, Freshness Verdict, Outreach Email Snippet

Switch between views in the Apify Console when viewing dataset results.

Changelog

v1.0.6 (February 2026)

maxUrlsToParse input — New safety cap (default 50,000) on total URLs collected during sitemap parsing. Prevents out-of-memory crashes on enterprise-scale sitemap indexes (e.g., sites with 100K+ URLs across dozens of sub-sitemaps). The actor now gracefully truncates and continues instead of crashing.
maxUrlsToValidate input — New cap (default 2,000) on URLs validated via HEAD requests. If a sitemap has more URLs than this limit, a random sample is validated. Prevents timeouts on very large sitemaps.
Improved large-site resilience — Tested against stripe.com, postman.com, shopify.com, cloudflare.com, and other enterprise-scale sitemaps without failures.

v1.0.3 (February 2026)

Letter grade (A-F) for instant sorting and outreach qualification
Executive summary — 2-3 sentence audit summary ready for emails/reports
Outreach snippet — Pre-written cold email paragraph per domain
Top 3 issues — Quick wins list with priority, count, effort, and impact
Issue severity counts — Critical/High/Medium/Low at top level for easy filtering
Sitemap freshness analysis — Days since last update, stale URL counts, freshness verdict
Protocol issue detection — HTTP vs HTTPS mismatch detection in sitemap URLs
Outreach dataset view in Apify Console optimized for agency cold outreach workflows

v1.0.0 (February 2026)

Initial public release
Sitemap Health Score (0-100) with detailed error/warning classification
Auto-discovery from robots.txt + 6 common paths
XML/gzip/sitemap index recursive parsing with fast-xml-parser
Full URL validation: HEAD checks, redirect detection, robots.txt conflict analysis
Deep page inspection: canonical tags, meta robots noindex, hreflang cross-validation
Missing pages finder using Crawlee CheerioCrawler
Crawl budget waste estimation with breakdown by issue type
Prioritized recommendations engine (critical → low)
URL distribution analysis by path prefix and priority
Hreflang support (XML + on-page cross-check)
CSV export to key-value store
Proxy support via Apify proxy configuration
State persistence for Actor migration
239 unit tests across 43 test suites
11 integration test scenarios

Support

Issues: Report bugs via GitHub issues or the Apify community forum
Feature requests: Contact us through Apify or open a GitHub issue
Enterprise: For large-scale sitemap monitoring, reach out for custom pricing

Built by A Page Ventures | Apify Store

Sitemap SEO Auditor

213x/sitemap-seo-auditor

Audit every URL in a sitemap for SEO metadata, status codes, titles, descriptions, H1s, canonical tags, noindex, word count, and common issues.

Sitemap Crawler - XML Sitemap URL Extractor

miccho27/sitemap-crawler

Extract all URLs from XML sitemaps (including sitemap index) and optionally audit each page

Tatsuya Mizuno

Sitemap Validator

maximedupre/sitemap-validator

Validate XML sitemaps and sitemap indexes. Check listed URLs for HTTP status, redirects, final URL, response time, malformed URLs, and sitemap metadata.

Maxime Dupré

Sitemap Generator

gentle_cloud/sitemap-generator

Crawl websites and generate XML sitemaps with configurable depth and page limits. Discover all pages, extract metadata, and output a ready-to-use sitemap.xml.

Monkey Coder

Sitemap Finder & Checker Tool

zerobreak/sitemap-finder-checker-tool

Find, validate, and audit XML sitemaps for any website. Deep-checks accessibility, XML validity, response time, file size, encoding, and health score.

ZeroBreak

Sitemap Finder & URL Extractor · Crawl Any XML Sitemap

corent1robert/sitemap-detector

Find and crawl XML sitemaps from any website. Follows sitemap indexes, handles gzip, and exports every page URL with source file and lastmod into a clean dataset. No config needed.

Corentin Robert

🔗 Sitemap & Broken Link Checker — Find Dead Links & 404s

iskoren/sitemap-broken-link-checker

Crawl any website or its sitemap and check every link for broken 404s, redirects, and errors — with the source page for each. Perfect for SEO audits and site QA.

Is Koren

Sitemap URL Extractor

cool_ya/sitemap-url-extractor

Discover and parse XML sitemaps for any website. Returns every URL with lastmod, changefreq and priority. Handles sitemap indexes, gzipped and plain-text sitemaps.

Y A

Sitemap Scraper

scrapers-hub/sitemap-scraper

Sitemap scraper to crawl and extract URLs, pages, and structure from website sitemaps 🌐📊 Perfect for SEO analysis, website auditing, and data extraction. Fast, reliable, and scalable.

Scrapers Hub

Sitemap Intelligence API

bipbip-apis/sitemap-intelligence-api

Analyze sitemap.xml files for URL counts, latest pages, content types, recency, sitemap index issues, and crawl-planning hints.

Daniel Christensen