Sitemap Xml Monitor avatar

Sitemap Xml Monitor

Pricing

from $0.005 / sitemap compare

Go to Apify Store
Sitemap Xml Monitor

Sitemap Xml Monitor

Monitor sitemap.xml files for structural, availability, and content changes. Detect critical SEO issues like URL removals, broken sitemaps, index changes, and formatting errors with severity-based alerts.

Pricing

from $0.005 / sitemap compare

Rating

0.0

(0)

Developer

DatawinderLabs

DatawinderLabs

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

9 days ago

Last modified

Share

Sitemap.xml Monitor

Stateful sitemap.xml monitoring Actor with baseline awareness, diff-based detection, and severity-classified alerts.

This Actor is designed for monitoring, not validation or SEO auditing.
It reports only meaningful changes over time and avoids noisy false positives.

This Actor is stateful. Alerts are emitted only after a baseline snapshot exists (from the second run onward).


Snapshot Contract

This Actor uses a versioned, stable snapshot schema.

  • Snapshot version: v1
  • Schema changes require explicit migration
  • Downstream consumers may rely on field names and severity semantics

What this Actor monitors

  • sitemap.xml availability (HTTP reachability)
  • Sitemap type changes (index vs urlset)
  • Large-scale URL removals (mass deletion protection)
  • New URL additions
  • Metadata changes (lastmod regressions, priority updates)
  • Formatting-only edits (comments / whitespace)

The Actor stores a baseline snapshot on first run and compares all subsequent runs against it.


Alert Semantics (Severity Contract)

This Actor follows a strict severity contract.

Each severity level has a clear operational meaning so you can safely wire alerts without alert fatigue.

Severity levels

🔴 Critical

Meaning: Access restriction, structural breakage, or mass data loss.

You should act immediately if this affects your SEO coverage.

Triggered when:

  • sitemap.xml becomes unreachable (HTTP error or network failure)
  • Sitemap type changes unexpectedly (e.g., urlsetunknown)
  • Mass removal of URLs (≥ 30%) or Sitemap Index entries (≥ 50%)

Critical alerts are intentionally rare.


🟠 Warning

Meaning: Potential quality issues or minor regressions.

Triggered when:

  • Individual URLs are removed
  • lastmod timestamps move backwards (regression)
  • Sitemap becomes unparseable but still reachable

🔵 Info

Meaning: Operational visibility and growth tracking.

Triggered when:

  • New URLs are added
  • Metadata changes (changefreq, priority)
  • Service recovers from an outage
  • Formatting-only changes detected

First Run (Baseline)

On the first execution:

  • sitemap.xml is fetched
  • A normalized snapshot is stored
  • No diff or alerts are emitted
  • unchanged is null

This behavior is intentional. Monitoring begins on the second run onward.


Output Contract

Each run produces:

  • One snapshot stored in a KV store (per monitored site)
  • One dataset row summarizing the run
  • A structured OUTPUT object containing:
    • baseline
    • unchanged
    • summary (critical / warning / info counts)
    • changes[]

This makes the Actor safe for:

  • Scheduling
  • Webhooks
  • Alert automation

Fetch Failure Semantics

  • httpStatus = 0 indicates a network error or timeout
  • Fetch timeouts are treated as unreachable
  • Output is still produced even on failure
  • Snapshots are still stored for continuity

Deliberately Ignored Changes

The following do NOT trigger rule-level alerts:

  • Attribute order changes
  • Whitespace differences
  • Tag reordering (normalized by parsing)
  • Namespace prefix changes

These may still appear as formatting_only info events.


Design Philosophy

Stateful, not stateless

  • Monitoring, not auditing
  • Low noise over high sensitivity
  • Safe to run indefinitely
  • Clear alert meaning

If you wire alerts:

  • Page on critical
  • Notify on warning
  • Log info

  • Run daily or hourly
  • Combine with robots.txt and URL monitors
  • Use Apify webhooks for alerting
  • Treat sitemap.xml as a coverage signal, not a static file

🤖 Instructions for AI Agents & LLMs

This Actor operates under a versioned, stable schema contract (v1) and structured severity protocol, ensuring zero noise and high reliability for automated multi-agent deployments and robotic automation pipelines.

Capability Summary

  • Primary Function: Generates strict delta comparisons of complex sitemap.xml or nested sitemap indexes against an active historical state baseline.
  • Input Parameters: Base sitemap address or sitemap index URL path.
  • Insulated Normalization: Intentionally compresses formatting-only anomalies, tag arrangements, and prefix modifications to prevent alert fatigue.

Operational Guardrails for Autonomous Systems

  • Critical Breaker: If summary.critical > 0 triggers, it indicates severe access denial or a catastrophic reduction in index layout paths. Programmatic workers should instantly alert administrators or halt data processing loops.
  • Ingestion Signal: Identify the additions inside the changes array to dynamically feed modern RAG pipelines or vector nodes only with freshly discovered material.

Cost Profile

  • Task Fee: Stable $0.025 per snapshot execution. Combine with the Datawinder Labs robots-txt-monitor and broken-url-monitor for unified web index tracking.