Pricing

Pay per event

Sitemap URL Status Auditor

Audit XML sitemaps for broken URLs, redirects, HTTP status codes, response timing, content type, canonical tags, and robots metadata.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

4 days ago

Last modified

What does Sitemap URL Status Auditor do?

Sitemap URL Status Auditor starts from one or more XML sitemap URLs, sitemap indexes, website roots, or domains.

It downloads sitemap XML files, follows nested sitemap indexes, extracts <loc> URLs, deduplicates them, and checks each listed page URL.

For each URL, it records status code, final URL, redirect count, redirect chain, response time, content type, content length, and a normalized error category.

Optionally, it can fetch page HTML to extract canonical URLs and robots meta tags.

Who is it for?

SEO specialists use it to catch broken URLs in sitemaps before search engines waste crawl budget.

Web QA teams use it after deployments to confirm that sitemap URLs still resolve.

Migration teams use it to check final URLs and redirect counts after domain, CMS, or URL-structure changes.

Agencies use it for recurring client health checks and exportable audit evidence.

Developers use it as a fast HTTP-only smoke test for public sitemap quality.

Why use this actor?

🗺️ It is sitemap-first, not just a generic URL checker.

🔁 It recursively expands sitemap indexes.

🧹 It deduplicates URLs before checking them.

🚦 It uses HEAD first with GET fallback for servers that reject HEAD.

📊 It returns one clean dataset table for export, dashboards, and alerts.

⚙️ It includes concurrency, caps, timeout, retry, and polite User-Agent controls.

What data can you extract?

Field	Description
`url`	Page URL discovered in the sitemap.
`sourceSitemap`	Sitemap XML file where the URL was found.
`sitemapDepth`	Recursion depth inside sitemap indexes.
`statusCode`	HTTP status code or null for network failures.
`ok`	True for 2xx and 3xx responses.
`method`	HEAD, GET, SITEMAP, or NONE.
`finalUrl`	Final URL after redirects.
`redirectCount`	Number of redirects followed.
`redirectChain`	Redirect URLs exposed by the HTTP client.
`contentType`	Content-Type response header.
`contentLength`	Content-Length header when available.
`responseTimeMs`	Request duration in milliseconds.
`errorCategory`	none, http_error, timeout, dns_error, tls_error, network_error, parse_error, blocked, or not_checked.
`errorMessage`	Human-readable error message.
`canonicalUrl`	Canonical link when metadata extraction is enabled.
`robotsMeta`	Robots meta tag when metadata extraction is enabled.
`xRobotsTag`	X-Robots-Tag response header.
`checkedAt`	ISO timestamp of the check.

How much does it cost to audit sitemap URL status?

This actor uses pay-per-event pricing.

There is a $0.005 start event for each run and a per-URL result event for each dataset row produced.

Plan tier	Per URL result
Free	$0.000029952
Starter / Bronze	$0.000026046
Scale / Silver	$0.000020316
Business / Gold	$0.000015627
Platinum	$0.000010418
Diamond	$0.000010000

For most users, cost scales with the number of sitemap URLs checked.

Use maxUrls for small first tests and increase it after you confirm the sitemap source is correct.

How to use it

Open the actor on Apify.
Add one or more sitemap URLs, website roots, or domains.
Set maxUrls to the number of URLs you want to audit.
Keep headFirst enabled for faster status checks.
Enable includePageMetadata only when you need canonical and robots meta extraction.
Run the actor.
Export the dataset as JSON, CSV, Excel, or connect it to your workflow.

Input settings

Sitemap URLs or websites

Use startUrls for XML sitemap URLs, sitemap index URLs, website roots, or domains.

Examples:

https://example.com/sitemap.xml
https://example.com/sitemap_index.xml
https://example.com/
example.com

Website roots and bare domains automatically resolve to /sitemap.xml.

Additional domains

Use domains when you want a simple list of extra domains in addition to startUrls.

Each domain is converted to a sitemap URL.

Maximum URLs

maxUrls controls the maximum unique page URLs audited.

Start with 100 for a cheap test.

Increase to 1,000 or more for full-site checks.

Maximum sitemap files

maxSitemaps prevents very large sitemap indexes from expanding forever.

Large ecommerce sites can have hundreds of sitemap files.

Maximum sitemap index depth

maxDepth controls how deeply nested sitemap indexes are followed.

The default of 3 is enough for normal sitemap structures.

Concurrency

concurrency controls how many URL checks run in parallel.

Use lower values for small or fragile sites.

Use higher values for durable sites and faster audits.

Request timeout and retries

requestTimeoutSecs and maxRetries balance speed and reliability.

Timeouts are recorded as dataset rows rather than crashing the whole run.

Use HEAD before GET

headFirst checks URLs with HEAD first and falls back to GET when needed.

This keeps most audits fast and lightweight.

Follow redirects

followRedirects records the final URL and redirect count.

This is useful for migrations and canonicalization checks.

Include page metadata

includePageMetadata fetches full HTML pages with GET and extracts canonical and robots meta tags.

Enable it for deeper SEO audits.

Leave it off for faster pure status checks.

User-Agent

The default User-Agent identifies the actor politely.

You can override it for internal policies or target-specific requirements.

Output example

{
  "url": "https://example.com/about/",
  "sourceSitemap": "https://example.com/sitemap.xml",
  "sitemapDepth": 0,
  "statusCode": 200,
  "ok": true,
  "method": "HEAD",
  "finalUrl": "https://example.com/about/",
  "redirectCount": 0,
  "redirectChain": [],
  "contentType": "text/html; charset=utf-8",
  "contentLength": 12345,
  "responseTimeMs": 184,
  "errorCategory": "none",
  "errorMessage": null,
  "canonicalUrl": null,
  "robotsMeta": null,
  "xRobotsTag": null,
  "checkedAt": "2026-06-27T00:00:00.000Z"
}

Common workflows

Broken sitemap URL audit

Run the actor with maxUrls set to your sitemap size.

Filter output where ok is false.

Review errorCategory and errorMessage.

Redirect migration QA

Run before and after a migration.

Compare finalUrl, redirectCount, and statusCode.

Flag URLs with long redirect chains or unexpected final domains.

Canonical and robots review

Enable includePageMetadata.

Filter rows where canonical URLs are missing, unexpected, or off-domain.

Review robotsMeta and xRobotsTag for accidental noindex directives.

Release smoke test

Schedule a small run after deployment.

Use maxUrls to audit the most important sitemap subset.

Send failures into Slack, email, or a dashboard with Apify integrations.

Integrations

Connect the dataset to Google Sheets for SEO reports.

Use Apify webhooks to send failed URL rows to monitoring systems.

Pull results with the Apify API into Looker Studio, BigQuery, Snowflake, or your internal QA tools.

Run it from CI/CD after a website deployment.

Use recurring tasks for daily or weekly sitemap status monitoring.

API usage

Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('automation-lab/sitemap-url-status-auditor').call({
  startUrls: [{ url: 'https://apify.com/sitemap.xml' }],
  maxUrls: 100,
});
console.log(run.defaultDatasetId);

Python

from apify_client import ApifyClient
import os

client = ApifyClient(os.environ['APIFY_TOKEN'])
run = client.actor('automation-lab/sitemap-url-status-auditor').call(run_input={
    'startUrls': [{'url': 'https://apify.com/sitemap.xml'}],
    'maxUrls': 100,
})
print(run['defaultDatasetId'])

cURL

curl -X POST 'https://api.apify.com/v2/acts/automation-lab~sitemap-url-status-auditor/runs?token=YOUR_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{"startUrls":[{"url":"https://apify.com/sitemap.xml"}],"maxUrls":100}'

MCP usage

Use this actor from MCP-compatible clients through Apify MCP Server.

Claude Desktop MCP URL:

https://mcp.apify.com/?tools=automation-lab/sitemap-url-status-auditor

Claude Code MCP URL:

https://mcp.apify.com/?tools=automation-lab/sitemap-url-status-auditor

Claude Code setup command:

$claude mcp add apify-sitemap-auditor https://mcp.apify.com/?tools=automation-lab/sitemap-url-status-auditor

Claude Desktop JSON config example:

{
  "mcpServers": {
    "apify-sitemap-auditor": {
      "url": "https://mcp.apify.com/?tools=automation-lab/sitemap-url-status-auditor"
    }
  }
}

Example prompts:

"Audit this sitemap and summarize broken URLs."
"Check redirect counts for URLs in this sitemap index."
"Find sitemap URLs that return 404, 500, timeout, or blocked responses."
"Run a canonical and robots metadata audit for this sitemap."

Tips for best results

Start with a small maxUrls value.

Use exact sitemap XML URLs when you know them.

Reduce concurrency when a site returns 429 or intermittent errors.

Enable metadata extraction only when canonical or robots tags matter.

Keep sitemap and URL caps aligned with your budget.

Review sitemap error rows; they often reveal invalid sitemap indexes or blocked XML files.

Troubleshooting

The actor says no `<loc>` URLs were found

Check that the input URL points to XML sitemap content, not an HTML page.

If you entered a website root, confirm /sitemap.xml exists.

Many URLs show blocked or 403

Lower concurrency and use a clear User-Agent.

Some sites block automated HEAD requests; the actor falls back to GET for common blocked HEAD statuses.

Metadata fields are empty

Canonical and robots meta fields require includePageMetadata to be enabled.

Headers such as xRobotsTag can still appear without metadata mode.

The run is slower than expected

Large sitemap indexes, slow target servers, metadata extraction, redirects, and retries increase runtime.

Lower maxUrls or increase concurrency carefully.

Data quality notes

HTTP status checks reflect the response seen during the run.

Target websites can rate-limit, geo-vary, or serve different responses to different clients.

The actor records those outcomes instead of hiding them.

Redirect chains depend on what the HTTP client exposes after following redirects.

Legality and ethics

This actor is designed for public XML sitemaps and public URLs.

Use it on websites you own, manage, audit, or are otherwise authorized to check.

Respect target website terms, rate limits, and robots guidance.

Reduce concurrency if a site appears stressed or rate-limited.

FAQ

Can this actor audit sitemap indexes?

Yes. It detects sitemap indexes and recursively expands nested sitemap files up to maxDepth and maxSitemaps.

Does it need a browser?

No. It is an HTTP-only actor for public XML and URL checks.

Can it audit password-protected staging sites?

Not in the default workflow. Public unauthenticated URLs are the intended use case.

Does it use proxies?

No proxy is required by default. It performs plain HTTP requests from the actor runtime.

Can I schedule recurring audits?

Yes. Use Apify tasks and schedules to run the same input daily, weekly, or after deployments.

Can I export results?

Yes. Apify datasets export to JSON, CSV, Excel, XML, RSS, and HTML table formats.

How do I audit more than one domain?

Add multiple sitemap URLs to startUrls or put additional domains in domains.

What counts as OK?

The ok field is true for HTTP 2xx and 3xx responses.

What happens when a sitemap URL is broken?

The actor emits an error row for the sitemap itself with method SITEMAP and a normalized error category.

What happens when a page URL times out?

The actor emits a row for that URL with statusCode null, ok false, and errorCategory set to timeout.

Sitemap Broken Link Auditor

snapperwapper/sitemap-broken-link-auditor

Audit XML sitemaps and sitemap indexes for broken URLs, redirects, status codes, response times, duplicate entries, and structured errors.

snapperwapper

Bulk URL Status Checker

fetch_cat/bulk-url-status-checker

Check large URL lists for status codes, redirects, broken links, response timing, headers, titles, canonical URLs, and robots meta.

Hanna Nosova

Bulk URL Status Checker

automation-lab/bulk-url-status-checker

Bulk check URLs for status codes, redirects, broken links, response times, canonical tags, robots meta, headers, and final destinations.

Stas Persiianenko

XML Sitemap Validator

maximedupre/sitemap-validator

Validate XML sitemaps and sitemap indexes. Check listed URLs for HTTP status, redirects, response time, errors, and sitemap metadata.

Maxime Dupré

Sitemap SEO Auditor

213x/sitemap-seo-auditor

Audit every URL in a sitemap for SEO metadata, status codes, titles, descriptions, H1s, canonical tags, noindex, word count, and common issues.

Sitemap & Robots SEO Index Auditor

glowing_glove/sitemap-robots-index-auditor

Audit robots.txt, sitemap discovery, indexability signals, canonical tags, and SEO crawl readiness for business websites.

Ushba Khan

Website Status Metadata API

intimate_pacu/website-status-metadata-api

Check public website status, redirects, title, meta description, canonical URL, and basic response metadata for monitoring and agent workflows.

Intimate PAcu

Bulk URL Status Checker - Redirect & Broken Link Audit

webdata_labs/bulk-url-status-checker

[💵 $1.00 / 1K] Check URLs in bulk for HTTP status codes, broken links, redirects, response times, final URLs, and redirect chains. Built for SEO audits, migrations, QA, and monitoring. CSV/JSON.

WebData Labs

URL Status Batch Checker

mahogany_songbird/url-status-batch-checker

HTTP status codes and response times for URL lists.

Britton Furness

Sitemap URL Extractor & Auditor (index-aware, status checks)

plotbench/sitemap-url-auditor

Extract every URL from XML sitemaps — sitemap-index and gzip aware — with lastmod/changefreq/priority, plus optional robots-respecting HTTP status sampling for broken-page detection.

Plotbench Studio

Sitemap URL Status Auditor

What does Sitemap URL Status Auditor do?

Who is it for?

Why use this actor?

What data can you extract?

How much does it cost to audit sitemap URL status?

How to use it

Input settings

Sitemap URLs or websites

Additional domains

Maximum URLs

Maximum sitemap files

Maximum sitemap index depth

Concurrency

Request timeout and retries

Use HEAD before GET

Follow redirects

Include page metadata

User-Agent

Output example

Common workflows

Broken sitemap URL audit

Redirect migration QA

Canonical and robots review

Release smoke test

Integrations

API usage

Node.js

Python

cURL

MCP usage

Tips for best results

Troubleshooting

The actor says no <loc> URLs were found

Many URLs show blocked or 403

Metadata fields are empty

The run is slower than expected

Data quality notes

Legality and ethics

Related scrapers and tools

FAQ

Can this actor audit sitemap indexes?

Does it need a browser?

Can it audit password-protected staging sites?

Does it use proxies?

Can I schedule recurring audits?

Can I export results?

How do I audit more than one domain?

What counts as OK?

What happens when a sitemap URL is broken?

What happens when a page URL times out?

You might also like

Sitemap Broken Link Auditor

Bulk URL Status Checker

Bulk URL Status Checker

XML Sitemap Validator

Sitemap SEO Auditor

Sitemap & Robots SEO Index Auditor

Website Status Metadata API

Bulk URL Status Checker - Redirect & Broken Link Audit

URL Status Batch Checker

Sitemap URL Extractor & Auditor (index-aware, status checks)

The actor says no `<loc>` URLs were found