Robots.txt Monitor
Pricing
Pay per event
Robots.txt Monitor
Stateful robots.txt monitoring with baseline awareness and severity-classified alerts. Detects meaningful policy changes over time — not noisy diffs.
Pricing
Pay per event
Rating
0.0
(0)
Developer

Datawinder
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
Stateful robots.txt monitoring Actor with baseline awareness, diff-based detection, and severity-classified alerts.
This Actor is designed for monitoring, not validation or SEO auditing.
It reports only meaningful changes over time and avoids noisy false positives.
Snapshot Contract
This Actor uses a versioned, stable snapshot schema.
- Snapshot version: v1
- Schema changes require explicit migration
- Downstream consumers may rely on field names and severity semantics
What this Actor monitors
- robots.txt availability (HTTP reachability)
- User-agent rule changes
- Allow / Disallow directive changes
- Crawl-delay and request-rate changes
- Sitemap directive changes
- Formatting-only edits (comments / whitespace)
The Actor stores a baseline snapshot on first run and compares all subsequent runs against it.
Alert Semantics (Severity Contract)
This Actor follows a strict severity contract.
Each severity level has a clear operational meaning so you can safely wire alerts without alert fatigue.
Severity levels
🔴 Critical
Meaning: Access restriction or loss of reachability.
You should act immediately if this affects your crawl policy or availability.
Triggered when:
robots.txtbecomes unreachable (HTTP error or network failure)- New
Disallowrules are added underUser-agent: * - Existing
Allowrules are removed and crawling becomes more restrictive
Critical alerts are intentionally rare.
🟠 Warning
Meaning: Policy change requiring review.
Review when convenient.
Triggered when:
- Disallow rules added for specific (non-global) user-agents
- User-agent blocks are removed
- Crawl-delay or request-rate values change
- Sitemap directives are removed
- All sitemap directives disappear
Warnings indicate policy changes, not outages.
🔵 Info
Meaning: Non-blocking or informational change.
No action required.
Triggered when:
- robots.txt recovers after being unreachable
- New user-agent blocks are added
- New sitemap directives are added
- Formatting-only changes (comments, whitespace, ordering)
Info events exist for traceability and audits.
Examples
Example 1 — robots.txt becomes unreachable
{"type": "robots_txt_unreachable","severity": "critical","description": "robots.txt became unreachable"}
Example 2 — New global disallow added
User-agent: *Disallow: /private/
{"type": "disallow_added","severity": "critical","description": "Disallow added for *: /private/"}
Example 3 — Crawl-delay changed
{"type": "crawl_delay_changed","severity": "warning","description": "Crawl-delay changed for googlebot"}
Example 4 — Sitemap removed
{"type": "sitemap_removed","severity": "warning","description": "Sitemap removed: https://example.com/sitemap.xml"}
Example 5 — robots.txt formatting-only change
{"type": "formatting_only","severity": "info","description": "Formatting-only changes detected"}
First Run (Baseline)
On the first execution:
- robots.txt is fetched
- A normalized snapshot is stored
- No diff or alerts are emitted
unchangedisnull
This behavior is intentional. Monitoring begins on the second run onward.
Output Contract
Each run produces:
- One snapshot stored in a KV store (per monitored site)
- One dataset row summarizing the run
- A structured OUTPUT object containing:
- baseline
- unchanged
- summary (critical / warning / info counts)
- changes[] This makes the Actor safe for:
- Scheduling
- Webhooks
- Alert automation
Fetch Failure Semantics
httpStatus = 0indicates a network error or timeout- Fetch timeouts are treated as unreachable
- Output is still produced even on failure
- Snapshots are still stored for continuity
Deliberately Ignored Changes
The following do NOT trigger rule-level alerts:
- Comment-only changes
- Whitespace differences
- Line reordering
- Unknown or unsupported directives
These may still appear as formatting_only info events.
Design Philosophy
Stateful, not stateless
- Monitoring, not auditing
- Low noise over high sensitivity
- Safe to run indefinitely
- Clear alert meaning If you wire alerts:
- Page on critical
- Notify on warning
- Log info
Recommended Usage
- Run daily or hourly
- Combine with sitemap and URL monitors
- Use Apify webhooks for alerting
- Treat robots.txt as a policy signal, not a static file