Sitemap Inventory & Diff - URL Extractor with Change Detection avatar

Sitemap Inventory & Diff - URL Extractor with Change Detection

Pricing

from $10.00 / 1,000 1,000 urls processeds

Go to Apify Store
Sitemap Inventory & Diff - URL Extractor with Change Detection

Sitemap Inventory & Diff - URL Extractor with Change Detection

Extract every URL from a site's sitemaps, then diff against the previous run: pages added, removed, or updated since last check. Built for SEO monitoring, RAG freshness, and competitor watching.

Pricing

from $10.00 / 1,000 1,000 urls processeds

Rating

0.0

(0)

Developer

Jimmy A

Jimmy A

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

8 hours ago

Last modified

Share

Extract every URL from a website's sitemaps and find out what changed since your last check: pages added, pages removed, pages updated. One run gives you the full URL inventory. Scheduled runs give you a change feed for any site on the internet.

No browser, no proxies, no login. It reads the same sitemap.xml files sites publish for Google.

What it does

  1. Discovers sitemaps from robots.txt (falls back to /sitemap.xml and common index paths)
  2. Follows sitemap index files recursively, including gzipped (.xml.gz) sitemaps
  3. Extracts every URL with its lastmod date
  4. Saves a snapshot per domain, then on the next run reports added / removed / changed URLs
  5. Outputs a clean summary to the dataset; optionally the full URL inventory

Use cases

  • SEO monitoring: catch when a competitor publishes new landing pages, kills old ones, or refreshes content
  • RAG and AI pipelines: keep a vector index fresh by re-crawling only the URLs that changed instead of the whole site
  • Content watch: see when a publisher, government site, or documentation portal adds pages on a topic
  • Site audits: instant URL inventory for any domain, exportable as JSON or CSV
  • Index bloat checks: compare what a site publishes in sitemaps over time

Input

{
"domains": ["competitor.com", "docs.example.com"],
"computeDiff": true,
"outputInventory": false
}

You can also pass exact sitemap URLs via sitemapUrls if a site keeps them in a non-standard place.

Output

One summary item per domain:

{
"type": "summary",
"domain": "competitor.com",
"sitemapFiles": 7,
"urlCount": 3741,
"diff": {
"previousRunFound": true,
"added": 12,
"removed": 3,
"changed": 41,
"addedUrls": ["https://competitor.com/new-feature", "..."],
"removedUrls": ["..."],
"changedUrls": ["..."]
}
}

Set outputInventory: true to also get one item per URL (url, lastmod, domain).

The first run for a domain saves the baseline snapshot; diffs start with the second run. Snapshots persist between runs, so a weekly schedule gives you a weekly change report.

Scheduling

Pair this actor with an Apify Schedule (for example weekly per domain). Each scheduled run compares against the previous snapshot automatically. Use the snapshotGroup input to track the same domain on two independent schedules without the snapshots interfering.

API / Standby mode for AI agents

The actor also runs as an HTTP endpoint (Standby). Agents and integrations can call:

GET /?domain=example.com&diff=true

and receive the summary JSON synchronously. Works as an MCP-style tool for agent frameworks that support Apify actors.

Pricing

Pay per event - you only pay for what the run actually does:

EventPrice
Actor start$0.0001
Per 1,000 URLs extracted$0.01
Diff computed (per domain)$0.02
API call (standby mode)$0.01

A weekly check of a 10,000-URL site costs about $0.12/month.

FAQ

How is this different from a sitemap URL extractor? Extractors give you the URL list. This actor also remembers the last run and tells you what changed: that is the part you actually want on a schedule.

Does it work on sites without robots.txt? Yes. It falls back to /sitemap.xml, /sitemap_index.xml, and /sitemap-index.xml, and you can pass exact sitemap URLs.

Does it handle huge sites? Yes. Sitemap indexes are followed recursively up to 500 sitemap files per domain, with a configurable URL cap (default 100,000).

Does it crawl pages? No. It only reads sitemap files, which makes it fast, cheap, and gentle on the target site. If a URL is not in the sitemaps, it will not appear.

Can I get the result as CSV? Yes, every Apify dataset exports as CSV, JSON, Excel, or via API.