Pricing

from $10.00 / 1,000 1,000 urls processeds

Sitemap Inventory & Diff - URL Extractor with Change Detection

Extract every URL from a site's sitemaps, then diff against the previous run: pages added, removed, or updated since last check. Built for SEO monitoring, RAG freshness, and competitor watching.

Pricing

from $10.00 / 1,000 1,000 urls processeds

Rating

0.0

(0)

Developer

Jimmy A

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

What it does

Discovers sitemaps from robots.txt (falls back to /sitemap.xml and common index paths)
Follows sitemap index files recursively, including gzipped (.xml.gz) sitemaps
Extracts every URL with its lastmod date
Saves a snapshot per domain, then on the next run reports added / removed / changed URLs
Outputs a clean summary to the dataset; optionally the full URL inventory

Use cases

SEO monitoring: catch when a competitor publishes new landing pages, kills old ones, or refreshes content
RAG and AI pipelines: keep a vector index fresh by re-crawling only the URLs that changed instead of the whole site
Content watch: see when a publisher, government site, or documentation portal adds pages on a topic
Site audits: instant URL inventory for any domain, exportable as JSON or CSV
Index bloat checks: compare what a site publishes in sitemaps over time

Input

{
  "domains": ["competitor.com", "docs.example.com"],
  "computeDiff": true,
  "outputInventory": false
}

You can also pass exact sitemap URLs via sitemapUrls if a site keeps them in a non-standard place.

Output

One summary item per domain:

{
  "type": "summary",
  "domain": "competitor.com",
  "sitemapFiles": 7,
  "urlCount": 3741,
  "diff": {
    "previousRunFound": true,
    "added": 12,
    "removed": 3,
    "changed": 41,
    "addedUrls": ["https://competitor.com/new-feature", "..."],
    "removedUrls": ["..."],
    "changedUrls": ["..."]
  }
}

Set outputInventory: true to also get one item per URL (url, lastmod, domain).

The first run for a domain saves the baseline snapshot; diffs start with the second run. Snapshots persist between runs, so a weekly schedule gives you a weekly change report.

Scheduling

Pair this actor with an Apify Schedule (for example weekly per domain). Each scheduled run compares against the previous snapshot automatically. Use the snapshotGroup input to track the same domain on two independent schedules without the snapshots interfering.

API / Standby mode for AI agents

The actor also runs as an HTTP endpoint (Standby). Agents and integrations can call:

GET /?domain=example.com&diff=true

and receive the summary JSON synchronously. Works as an MCP-style tool for agent frameworks that support Apify actors.

Pricing

Pay per event - you only pay for what the run actually does:

Event	Price
Actor start	$0.0001
Per 1,000 URLs extracted	$0.01
Diff computed (per domain)	$0.02
API call (standby mode)	$0.01

A weekly check of a 10,000-URL site costs about $0.12/month.

FAQ

How is this different from a sitemap URL extractor? Extractors give you the URL list. This actor also remembers the last run and tells you what changed: that is the part you actually want on a schedule.

Does it work on sites without robots.txt? Yes. It falls back to /sitemap.xml, /sitemap_index.xml, and /sitemap-index.xml, and you can pass exact sitemap URLs.

Does it handle huge sites? Yes. Sitemap indexes are followed recursively up to 500 sitemap files per domain, with a configurable URL cap (default 100,000).

Does it crawl pages? No. It only reads sitemap files, which makes it fast, cheap, and gentle on the target site. If a URL is not in the sitemaps, it will not appear.

Can I get the result as CSV? Yes, every Apify dataset exports as CSV, JSON, Excel, or via API.

Sitemap Scanner: URL Inventory + Diff

aitoolbreakdown/atb-sitemap-scanner

Point it at a domain. Get the full URL inventory from sitemap.xml + robots.txt, every lastmod date, and a clean JSON feed for SEO audits, content tracking, and change detection.

AI Tool Breakdown

Sitemap Change Detector

tri_angle/sitemap-change-detector

Identify and monitor sitemaps for specified websites. Retrieve only the new, updated, or removed URLs since the last crawl.

Tri⟁angle

Website Change Monitor & Diff Tracker

ryanclinton/website-change-monitor

Monitor any website for content changes with automatic diff detection. Track pricing pages, competitor sites, ToS updates, and more. Compares snapshots, reports added/removed text, and supports CSS selector targeting for precise monitoring.

Ryan Clinton

ChangeWatch — Structured URL Change Detection (MCP)

beacon_labs/changewatch-mcp

Agent-callable MCP server that snapshots a URL's readable content and returns a structured diff since the last check.

Charles Doherty

Sitemap & SEO URL Extractor - Bulk Website URL Finder

pear_fight/sitemap-seo-url-extractor-bulk-website-url-finder

Discover and extract every URL from any website's sitemaps. Automatically finds sitemaps via robots.txt, follows sitemap indexes, and returns each URL with last-modified date, change frequency and priority. Ideal for SEO audits, site migration and content inventory. Export to JSON, CSV, Excel.

Harald

Api Diff API

vivid_astronaut/api-diff

Fabio Suizu

SaaS Pricing & Change Tracker Scraper

taroyamada/saas-change-monitor-actor

SaaS pricing change tracker scraper. Browser-based crawl of competitor pricing and policy pages with precise text-diff extraction. Returns added/removed sections, currentHash, and per-URL change events for recurring competitor watch.

naoki anzai

Sitemap Monitor: New Pages & Content Updates

davidbenittah/sitemap-new-page-monitor

Watch up to 10 sitemaps on a schedule and get only what moved since the last run: brand-new URLs, and pages whose lastmod changed. Sitemaps are found from robots.txt or given directly, index files and gzip included, and removed pages are counted in the run summary.

David

Sitemap Url Extractor

scrapers-hub/sitemap-url-extractor

Sitemap URL extractor to extract all URLs from XML sitemaps quickly and efficiently 🌐📄 Ideal for SEO audits, site analysis, and indexing workflows. Fast, accurate, and easy to use.