Site Governance Monitor | Robots, Sitemap & Schema
Pricing
Pay per usage
Site Governance Monitor | Robots, Sitemap & Schema
Recurring robots.txt monitor, sitemap monitor, schema validator monitor, and release QA site monitor for homepage/pricing/docs drift, with one monitored domain summary per checked domain.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
太郎 山田
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Catch robots.txt, sitemap, schema, and homepage/pricing/docs governance drift in one run.
This actor turns abstract "AI discoverability governance" into four concrete recurring checks buyers can immediately understand:
- robots.txt monitor — watch missing robots.txt files, AI crawler allow/block rules, and policy drift
- sitemap monitor — catch missing, stale, or undeclared XML sitemaps before discoverability drops
- schema validator monitor — validate JSON-LD / Microdata on homepage, pricing, docs, and other key templates
- release QA site monitor — compare snapshots over time so launches and template edits do not quietly break site governance
This is not a generic website audit. It is a summary-first site-governance monitor that keeps one monitored domain summary per checked domain, even when that domain has multiple alerts, warnings, and drift signals. That same monitored-domain summary remains the store-facing pricing unit.
Who this actor is for
- Agencies that need one recurring summary row and action queue per client site
- Platform teams that need the main site, docs, support, developer, and status properties to stay aligned
- Release QA teams that need a lightweight pre/post-release check for homepage, pricing, and docs templates
- SEO, content, and discoverability owners that want concrete monitoring for robots.txt, sitemaps, and schema markup
First successful run
Start with one real site and the three paths most buyers recognize first:
{"domains": ["vercel.com"],"samplePaths": ["/", "/pricing", "/docs"],"delivery": "dataset","snapshotKey": "site-governance-homepage-pricing-docs","checkAiBots": true,"checkSchema": true,"checkSitemap": true}
That single run gives you one monitored-domain summary that answers four practical questions:
- Is
robots.txtpresent and are AI crawler rules explicit? - Is the sitemap reachable, fresh, and declared in
robots.txt? - Do homepage, pricing, and docs pages still publish valid schema markup?
- Did anything drift since the last release or weekly checkpoint?
Store quickstart
- Start with
store-input.example.jsonfor a concrete first run againstvercel.comacross/,/pricing, and/docs. - When that matches your workflow, switch to
store-input.templates.jsonand choose one of:Quickstart: Homepage, Pricing & DocsAgency Portfolio Site MonitorRelease QA Site MonitorPlatform Site Governance WatchRobots.txt + Sitemap + Schema Monitor
Dataset delivery is the best first proof. Webhook delivery becomes the next step once you want the action-needed queue in release QA, platform ops, or agency reporting.
What this actor does
For each domain or web property, the actor combines three machine-readable monitors plus drift detection into one monitored-domain summary:
- Robots.txt monitor: parses
robots.txt, evaluates known AI crawler groups, classifies posture, and flags policy drift - Sitemap monitor: discovers sitemap surfaces, expands sitemap indexes, evaluates freshness and lastmod coverage, and flags missing or stale inventories
- Schema validator monitor: samples the supplied
samplePaths[], extracts JSON-LD and Microdata, validates the markup, and detects coverage regressions - Release QA / site-governance drift detection: compares the current run with prior snapshots so post-release changes are easy to spot
The actor then turns those signals into:
- one
governanceScore - one ranked
alerts[]list - one set of
recommendedActions - one portfolio-style
executiveSummaryandactionNeededDigest
That monitored-domain summary is the flagship output and billing-safe unit. Multiple alerts, warnings, changes, and component details stay nested under that single per-domain summary.
Flagship recurring templates
| Template | Best for | What it sharpens |
|---|---|---|
| Quickstart: Homepage, Pricing & Docs | First success / solution engineers | Proves homepage, pricing, and docs drift detection in one summary row |
| Agency Portfolio Site Monitor | Agencies / consultancies | Recurring multi-client sweeps with one summary per client site |
| Release QA Site Monitor | Release QA / web teams | Pre/post-release checks on homepage, pricing, docs, and product templates |
| Platform Site Governance Watch | Platform governance / web ops | Action-needed webhook for main site, docs, support, status, and developer properties |
| Robots.txt + Sitemap + Schema Monitor | SEO / discoverability owners | Ongoing monitoring for the three machine-readable surfaces buyers care about |
Why this is better than separate utilities
Running robotstxt-ai-checker, sitemap-analyzer, and structured-data-validator separately creates operational noise:
- three actors
- three schedules
- three payloads to reconcile
- no shared governance score
- no single action-needed queue
site-governance-monitor is the flagship combined lane:
- one recurring task
- one schedule
- one dataset or webhook payload
- one governance score per domain
- one ranked list of domains that need attention first
That makes it a better fit for agencies, portfolio operators, platform governance owners, release QA teams, and discoverability leads that want one recurring signal instead of a bundle of disconnected utilities.
Input example
{"domains": ["vercel.com"],"samplePaths": ["/", "/pricing", "/docs"],"delivery": "dataset","snapshotKey": "site-governance-homepage-pricing-docs","checkAiBots": true,"checkSchema": true,"checkSitemap": true,"concurrency": 1,"batchDelayMs": 250,"requestTimeoutSecs": 15,"maxSitemapUrls": 5000}
Output example
{"domain": "client-release.example","status": "changed","severity": "high","alertCount": 2,"brief": "2 alert(s): No reachable XML sitemap was found for this domain.","governanceScore": {"total": 46,"grade": "F"},"recommendedActions": ["Publish a reachable XML sitemap for the domain and keep it updated.","Publish a robots.txt file so the robots.txt monitor can confirm which AI crawlers you allow or block."]}
A fuller payload is available in sample-output.example.json. When samplePaths includes /, /pricing, and /docs, the full output also shows which release-sensitive pages lost schema coverage, plus portfolio-level executiveSummary and actionNeededDigest fields for webhook delivery.
Delivery modes
dataset: saves one monitored-domain summary row per checked domain to the actor datasetwebhook: sends the full governance payload (meta,alerts,results) to your webhook URL, with one monitored-domain summary inresults[]for each checked domain
Dataset delivery is best for first proof, recurring QA evidence, and agency reporting. Webhook delivery is best when you want platform or release teams to work from an action-needed queue.
Recommended recurring workflows
| Workflow | Why |
|---|---|
| Homepage / pricing / docs release QA | Catch schema, robots.txt, and sitemap drift on the pages buyers check first |
| Agency portfolio site monitor | Catch robots.txt, sitemap, and schema drift across multiple clients or brands |
| Platform site governance watch | Keep docs, support, developer, and status properties aligned with the main site |
| Robots.txt + sitemap + schema monitoring | Track whether crawlability and machine-readable discoverability stay intentional over time |
Cost profile
Store pricing is aligned to the monitored-domain summary, not raw alert or event volume. One checked domain produces one summary unit, regardless of how many underlying governance findings are attached to it.
The actor uses built-in Node.js networking and public site surfaces. That keeps maintenance cost low and avoids browser or proxy requirements for the core checks.
Commercial ops
Set up .env first:
$cp -n .env.example .env
Configure the Apify task and schedule when you are ready for cloud ops:
$npm run apify:cloud:setup
Local validation for this repository version:
$npm test
Related actors
robotstxt-ai-checker— standalone robots.txt monitorsitemap-analyzer— standalone sitemap monitorstructured-data-validator— standalone schema validator monitordomain-trust-monitor— broader bundled monitor for domain trust posture