Sitemap Inventory & Diff - URL Extractor with Change Detection
Pricing
from $10.00 / 1,000 1,000 urls processeds
Sitemap Inventory & Diff - URL Extractor with Change Detection
Extract every URL from a site's sitemaps, then diff against the previous run: pages added, removed, or updated since last check. Built for SEO monitoring, RAG freshness, and competitor watching.
Pricing
from $10.00 / 1,000 1,000 urls processeds
Rating
0.0
(0)
Developer
Jimmy A
Maintained by CommunityActor stats
0
Bookmarked
1
Total users
0
Monthly active users
8 hours ago
Last modified
Categories
Share
Extract every URL from a website's sitemaps and find out what changed since your last check: pages added, pages removed, pages updated. One run gives you the full URL inventory. Scheduled runs give you a change feed for any site on the internet.
No browser, no proxies, no login. It reads the same sitemap.xml files sites publish for Google.
What it does
- Discovers sitemaps from
robots.txt(falls back to/sitemap.xmland common index paths) - Follows sitemap index files recursively, including gzipped (
.xml.gz) sitemaps - Extracts every URL with its
lastmoddate - Saves a snapshot per domain, then on the next run reports added / removed / changed URLs
- Outputs a clean summary to the dataset; optionally the full URL inventory
Use cases
- SEO monitoring: catch when a competitor publishes new landing pages, kills old ones, or refreshes content
- RAG and AI pipelines: keep a vector index fresh by re-crawling only the URLs that changed instead of the whole site
- Content watch: see when a publisher, government site, or documentation portal adds pages on a topic
- Site audits: instant URL inventory for any domain, exportable as JSON or CSV
- Index bloat checks: compare what a site publishes in sitemaps over time
Input
{"domains": ["competitor.com", "docs.example.com"],"computeDiff": true,"outputInventory": false}
You can also pass exact sitemap URLs via sitemapUrls if a site keeps them in a non-standard place.
Output
One summary item per domain:
{"type": "summary","domain": "competitor.com","sitemapFiles": 7,"urlCount": 3741,"diff": {"previousRunFound": true,"added": 12,"removed": 3,"changed": 41,"addedUrls": ["https://competitor.com/new-feature", "..."],"removedUrls": ["..."],"changedUrls": ["..."]}}
Set outputInventory: true to also get one item per URL (url, lastmod, domain).
The first run for a domain saves the baseline snapshot; diffs start with the second run. Snapshots persist between runs, so a weekly schedule gives you a weekly change report.
Scheduling
Pair this actor with an Apify Schedule (for example weekly per domain). Each scheduled run compares against the previous snapshot automatically. Use the snapshotGroup input to track the same domain on two independent schedules without the snapshots interfering.
API / Standby mode for AI agents
The actor also runs as an HTTP endpoint (Standby). Agents and integrations can call:
GET /?domain=example.com&diff=true
and receive the summary JSON synchronously. Works as an MCP-style tool for agent frameworks that support Apify actors.
Pricing
Pay per event - you only pay for what the run actually does:
| Event | Price |
|---|---|
| Actor start | $0.0001 |
| Per 1,000 URLs extracted | $0.01 |
| Diff computed (per domain) | $0.02 |
| API call (standby mode) | $0.01 |
A weekly check of a 10,000-URL site costs about $0.12/month.
FAQ
How is this different from a sitemap URL extractor? Extractors give you the URL list. This actor also remembers the last run and tells you what changed: that is the part you actually want on a schedule.
Does it work on sites without robots.txt?
Yes. It falls back to /sitemap.xml, /sitemap_index.xml, and /sitemap-index.xml, and you can pass exact sitemap URLs.
Does it handle huge sites? Yes. Sitemap indexes are followed recursively up to 500 sitemap files per domain, with a configurable URL cap (default 100,000).
Does it crawl pages? No. It only reads sitemap files, which makes it fast, cheap, and gentle on the target site. If a URL is not in the sitemaps, it will not appear.
Can I get the result as CSV? Yes, every Apify dataset exports as CSV, JSON, Excel, or via API.