Pricing

Pay per event

Robots.txt Monitor

Stateful robots.txt monitoring with baseline awareness and severity-classified alerts. Detects meaningful policy changes over time — not noisy diffs.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Datawinder

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Snapshot Contract

This Actor uses a versioned, stable snapshot schema.

Snapshot version: v1
Schema changes require explicit migration
Downstream consumers may rely on field names and severity semantics

What this Actor monitors

robots.txt availability (HTTP reachability)
User-agent rule changes
Allow / Disallow directive changes
Crawl-delay and request-rate changes
Sitemap directive changes
Formatting-only edits (comments / whitespace)

The Actor stores a baseline snapshot on first run and compares all subsequent runs against it.

Alert Semantics (Severity Contract)

This Actor follows a strict severity contract.

Each severity level has a clear operational meaning so you can safely wire alerts without alert fatigue.

Severity levels

🔴 Critical

Meaning: Access restriction or loss of reachability.

You should act immediately if this affects your crawl policy or availability.

Triggered when:

robots.txt becomes unreachable (HTTP error or network failure)
New Disallow rules are added under User-agent: *
Existing Allow rules are removed and crawling becomes more restrictive

Critical alerts are intentionally rare.

🟠 Warning

Meaning: Policy change requiring review.

Review when convenient.

Triggered when:

Disallow rules added for specific (non-global) user-agents
User-agent blocks are removed
Crawl-delay or request-rate values change
Sitemap directives are removed
All sitemap directives disappear

Warnings indicate policy changes, not outages.

🔵 Info

Meaning: Non-blocking or informational change.

No action required.

Triggered when:

robots.txt recovers after being unreachable
New user-agent blocks are added
New sitemap directives are added
Formatting-only changes (comments, whitespace, ordering)

Info events exist for traceability and audits.

Examples

Example 1 — robots.txt becomes unreachable

{
  "type": "robots_txt_unreachable",
  "severity": "critical",
  "description": "robots.txt became unreachable"
}

Example 2 — New global disallow added

User-agent: *
Disallow: /private/

{
  "type": "disallow_added",
  "severity": "critical",
  "description": "Disallow added for *: /private/"
}

Example 3 — Crawl-delay changed

{
  "type": "crawl_delay_changed",
  "severity": "warning",
  "description": "Crawl-delay changed for googlebot"
}

Example 4 — Sitemap removed

{
  "type": "sitemap_removed",
  "severity": "warning",
  "description": "Sitemap removed: https://example.com/sitemap.xml"
}

Example 5 — robots.txt formatting-only change

{
  "type": "formatting_only",
  "severity": "info",
  "description": "Formatting-only changes detected"
}

First Run (Baseline)

On the first execution:

robots.txt is fetched
A normalized snapshot is stored
No diff or alerts are emitted
unchanged is null

This behavior is intentional. Monitoring begins on the second run onward.

Output Contract

Each run produces:

One snapshot stored in a KV store (per monitored site)
One dataset row summarizing the run
A structured OUTPUT object containing:
- baseline
- unchanged
- summary (critical / warning / info counts)
- changes[] This makes the Actor safe for:
Scheduling
Webhooks
Alert automation

Fetch Failure Semantics

httpStatus = 0 indicates a network error or timeout
Fetch timeouts are treated as unreachable
Output is still produced even on failure
Snapshots are still stored for continuity

Deliberately Ignored Changes

The following do NOT trigger rule-level alerts:

Comment-only changes
Whitespace differences
Line reordering
Unknown or unsupported directives

These may still appear as formatting_only info events.

Design Philosophy

Stateful, not stateless

Monitoring, not auditing
Low noise over high sensitivity
Safe to run indefinitely
Clear alert meaning If you wire alerts:
Page on critical
Notify on warning
Log info

Recommended Usage

Run daily or hourly
Combine with sitemap and URL monitors
Use Apify webhooks for alerting
Treat robots.txt as a policy signal, not a static file

Robots Txt Analyzer

urban_quidnunc/robots-txt-analyzer

Donny

robots.txt AI Policy Monitor | GPTBot ClaudeBot

taroyamada/robotstxt-ai-checker

Detect AI crawler block policies in robots.txt and monitor policy shifts over time. Export compact or full JSON for SEO and governance workflows.

太郎山田

Robots Txt Analyzer

consummate_mandala/robots-txt-analyzer

Robots Txt Analyzer. Automated analysis with detailed reports and actionable insights. Fast, accurate, and scalable.

Donny Nguyen

Robots.txt Validator

predictable_function/my-actor-3

List of website base URLs whose robots.txt files will be validated

riya rawat

5.0

Parse Robots Txt — Data, Details & Metadata

tropical_quince/robots-txt-parser

Parse robots txt data at scale with this powerful Apify actor. Extracts data, details & metadata with automatic pagination and proxy rotation. Perfect for market research, competitive intelligence, and data-driven decision making.

Donny Nguyen

Robots.txt Checker - CMS-Aware Analysis with AI Recommendations

alizarin_refrigerator-owner/robots-txt-checker

The Robots.txt Checker provides comprehensive analysis of your robots.txt file: Syntax Validation CMS Detection - Identify WordPress, Shopify, Drupal,& 6+ other CMS platforms Best Practice Check Companion File Checks - sitemap.xml, llms.txt, security.txt AI Recommendations - CMS-specific suggestions

The Howlers

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

scrappy_garden/robots-txt-validator

Validate robots.txt for one or more websites: fetches /robots.txt per host, parses directive groups (User-agent/Allow/Disallow/Crawl-delay/Sitemap), reports common errors and warnings, and can test URLs against the chosen User-Agent.

Bikram Adhikari

Ai Visibility Suite - Dark Visitors Alternative

alizarin_refrigerator-owner/ai-visibility-suite---dark-visitors-alternative

Comprehensive AI bot monitoring, robots.txt analysis, LLMs.txt generation & AI shopping optimization. Monitor AI crawlers visits, check AI compliance, generate AI-friendly configurations, and optimize for AI shopping agents. AI Bot Directory Robots.txt LLMs.txt AI Shopping Competitor AI Audit

The Howlers

Fast Sitemap Generator

eunit/sitemap-generator

Boost SEO with this automatic Sitemap Generator. Crawl any site to create XML, HTML, & TXT sitemaps. Supports custom depth, regex filters, & robots.txt. Compatible with Google Search Console.

Emmanuel Uchenna

5.0

Robots Txt Audit

apage/robots-txt-audit

Audit robots.txt files for AI crawler access. Get an AI Readiness Score (0-100), analyze 61+ AI crawlers (ChatGPT, Claude, Perplexity, Gemini), detect syntax errors, security concerns, and get actionable recommendations. Batch audit multiple domains at once with optional subdomain scanning.