🤖 Release QA Site Monitor
Pricing
from $9.00 / 1,000 results
🤖 Release QA Site Monitor
Track website release drift automatically by scraping pages, monitoring schema regressions, and extracting robots.txt details for reliable deployments.
Pricing
from $9.00 / 1,000 results
Rating
0.0
(0)
Developer
太郎 山田
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Site Governance Monitor | Robots, Sitemap & Schema
Recurring robots.txt monitor, sitemap monitor, schema validator monitor, and release QA site monitor for homepage/pricing/docs drift, with one monitored domain summary per checked domain.
Store Quickstart
- Start with
store-input.example.jsonfor a concrete homepage-first run againstvercel.com. - When that matches your workflow, switch to
store-input.templates.jsonand choose one of:Quickstart: Homepage Governance Check (Starter Baseline)Agency Portfolio Site Monitor (Advanced Recurring)Release QA Site Monitor (Schema Regression Watch)Platform Site Governance Watch (Advanced Delivery)Robots.txt + Sitemap + Schema Monitor (Recurring Discoverability)
Key Features
- 🔗 URL-first workflow — Bulk-process thousands of URLs per run with parallel fetching
- 📊 Structured output — Every URL returns consistent, dataset-ready rows for downstream use
- 🛡️ Rate-limit aware — Exponential backoff and concurrency throttling keep you off block lists
- 📡 Webhook delivery — Push results to Slack, Discord, or any HTTP endpoint for real-time alerts
- 💰 No external APIs — Reads public data — zero API-key costs, zero vendor lock-in
Use Cases
| Who | Why |
|---|---|
| Developers | Automate recurring data fetches without building custom scrapers |
| Data teams | Pipe structured output into analytics warehouses |
| Ops teams | Monitor changes via webhook alerts |
| Product managers | Track competitor/market signals without engineering time |
Input
| Field | Type | Default | Description |
|---|---|---|---|
| domains | array | prefilled | Starter quickstart: begin with 1-3 sites for a lightweight first success. Homepage-first runs stay intentionally small, |
| samplePaths | array | prefilled | Path-only routes to validate on every domain. Keep the starter quickstart homepage-first with ["/"], then add /pricing a |
| delivery | string | "dataset" | Starter path: dataset keeps the first run low-friction and still writes the full summary-first payload to OUTPUT. Advanc |
| webhookUrl | string | — | Advanced delivery only: required when delivery is webhook. Must be a valid http(s) URL. The payload includes the executi |
| snapshotKey | string | "site-governance-monitor-snapshots" | Keep this stable when you move from the homepage-first quickstart to recurring release-QA, portfolio, or webhook workflo |
| checkAiBots | boolean | true | Monitor robots.txt for missing files, AI crawler allow/block rules, and drift after releases. |
| checkSchema | boolean | true | Validate JSON-LD and Microdata on homepage, pricing, docs, and other release-sensitive templates. |
| checkSitemap | boolean | true | Monitor sitemap.xml reachability, freshness, robots.txt declarations, and URL inventory drift. |
Input Example
{"domains": ["vercel.com"],"samplePaths": ["/"],"delivery": "dataset","snapshotKey": "site-governance-homepage-quickstart","checkAiBots": true,"checkSchema": true,"checkSitemap": true,"concurrency": 1,"batchDelayMs": 250,"requestTimeoutSecs": 15,"maxSitemapUrls": 5000}
Output
| Field | Type | Description |
|---|---|---|
meta | object | |
alerts | array | |
results | array | |
alerts[].domain | string | |
alerts[].severity | string | |
alerts[].component | string | |
alerts[].type | string | |
alerts[].message | string |
Output Example
{"meta": {"executiveSummary": {"overallStatus": "attention_needed","recommendedCadence": "daily"},"runProfile": {"tier": "starter","label": "Starter first-success path"},"upgradeSuggestions": [{"type": "webhook","templateId": "action_needed_webhook","title": "Route action-needed domains to your endpoint"}],"nextWorkflow": {"type": "same_actor_template","id": "action_needed_webhook","title": "Next best step: Action-Needed Webhook Handoff"}},"alerts": [{"domain": "client-release.example","severity": "high","component": "sitemapHealth","type": "sitemap_missing","message": "No reachable XML sitemap was found for this domain."}],"results": [{"domain": "client-release.example","status": "changed","severity": "high","brief": "3 alert(s): No reachable XML sitemap was found for this domain.","recommendedActions": ["Publish a reachable XML sitemap for the domain and keep it updated.","Publish a robots.txt file so the robots.txt monitor can confirm which AI crawlers you allow or block."]}]}
API Usage
Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.
cURL
curl -X POST "https://api.apify.com/v2/acts/taroyamada~site-governance-monitor/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{ "domains": ["vercel.com"], "samplePaths": ["/"], "delivery": "dataset", "snapshotKey": "site-governance-homepage-quickstart", "checkAiBots": true, "checkSchema": true, "checkSitemap": true, "concurrency": 1, "batchDelayMs": 250, "requestTimeoutSecs": 15, "maxSitemapUrls": 5000 }'
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("taroyamada/site-governance-monitor").call(run_input={"domains": ["vercel.com"],"samplePaths": ["/"],"delivery": "dataset","snapshotKey": "site-governance-homepage-quickstart","checkAiBots": true,"checkSchema": true,"checkSitemap": true,"concurrency": 1,"batchDelayMs": 250,"requestTimeoutSecs": 15,"maxSitemapUrls": 5000})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
JavaScript / Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('taroyamada/site-governance-monitor').call({"domains": ["vercel.com"],"samplePaths": ["/"],"delivery": "dataset","snapshotKey": "site-governance-homepage-quickstart","checkAiBots": true,"checkSchema": true,"checkSitemap": true,"concurrency": 1,"batchDelayMs": 250,"requestTimeoutSecs": 15,"maxSitemapUrls": 5000});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Tips & Limitations
- Keep concurrency ≤ 5 when auditing production sites to avoid WAF rate-limit triggers.
- Use webhook delivery for recurring cron runs — push only deltas to downstream systems.
- Enable
dryRunfor cheap validation before committing to a paid cron schedule. - Results are dataset-first; use Apify API
run-sync-get-dataset-itemsfor instant JSON in CI pipelines. - Run a tiny URL count first, review the sample, then scale up — pay-per-event means you only pay for what you use.
FAQ
Is there a rate limit?
Built-in concurrency throttling keeps requests polite. For most public APIs this actor can run 1–10 parallel requests without issues.
What happens when the input URL is unreachable?
The actor records an error row with the failure reason — successful URLs keep processing.
Can I schedule recurring runs?
Yes — use Apify Schedules to run this actor on a cron (hourly, daily, weekly). Combine with webhook delivery for change alerts.
Does this actor respect robots.txt?
Yes — requests use a standard User-Agent and honor site rate limits. For aggressive audits, set a higher concurrency only on your own properties.
Can I integrate with Google Sheets or Airtable?
Use webhook delivery with a Zapier/Make/n8n catcher, or call the Apify REST API from Apps Script / Airtable automations.
Related Actors
URL/Link Tools cluster — explore related Apify tools:
- 🔗 URL Health Checker — Bulk-check HTTP status codes, redirects, SSL validity, and response times for thousands of URLs.
- 🔗 Broken Link Checker — Crawl websites to find broken links, 404 errors, and dead URLs.
- 🔗 URL Unshortener — Expand bit.
- 🏷️ Meta Tag Analyzer — Analyze meta tags, Open Graph, Twitter Cards, JSON-LD, and hreflang for any URL.
- 📚 Wayback Machine Checker — Check if URLs are archived on the Wayback Machine and find closest snapshots by date.
- Sitemap Analyzer API | sitemap.xml SEO Audit — Analyze sitemap.
- Schema.org Validator API | JSON-LD + Microdata — Validate JSON-LD and Microdata across multiple pages, score markup quality, and flag missing or malformed Schema.
- RDAP Domain Monitor API | Ownership + Expiry — Monitor domain registration data via RDAP and track expiry, registrar, nameserver, and ownership changes in structured rows.
- Domain Security Audit API | SSL Expiry, DMARC, Domain Expiry — Summary-first portfolio monitor for SSL expiry, DMARC/SPF/DKIM, domain expiry/ownership, and security headers with remediation-ready outputs.
Cost
Pay Per Event:
actor-start: $0.01 (flat fee per run)dataset-item: $0.003 per output item
Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01
No subscription required — you only pay for what you use.