robots.txt AI Policy Monitor | GPTBot ClaudeBot
Pricing
from $11.00 / 1,000 results
robots.txt AI Policy Monitor | GPTBot ClaudeBot
Detect GPTBot, ClaudeBot, Google-Extended, and other AI crawler policies in robots.txt, then monitor policy shifts over time.
Pricing
from $11.00 / 1,000 results
Rating
0.0
(0)
Developer
太郎 山田
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
19 hours ago
Last modified
Categories
Share
Detect GPTBot, ClaudeBot, Google-Extended, and other AI crawler policies in robots.txt, then monitor policy shifts over time.
Store Quickstart
- Start with
store-input.example.json. It usesdemoMode=trueso the first Store run is safe, cheap, and easy to understand. - If the compact output is useful, switch to
store-input.templates.jsonand pick one of: Demo Quickstartfor a trial runProduction Monitorfor recurring dataset snapshotsWebhook Alertfor policy-change notifications
Key Features
- 🛡️ Compliance-first — Produces audit-ready reports mapping findings to standards (WCAG, GDPR, SOC2)
- 🔒 Non-invasive scanning — Uses only observable public signals — no intrusive probing
- 📊 Severity-scored output — Each finding rated for criticality with remediation guidance
- 📡 Delta-alerting — Flag new findings since last run via webhook delivery
- 📋 Evidence export — Raw headers/responses captured for compliance documentation
Use Cases
| Who | Why |
|---|---|
| Developers | Automate recurring data fetches without building custom scrapers |
| Data teams | Pipe structured output into analytics warehouses |
| Ops teams | Monitor changes via webhook alerts |
| Product managers | Track competitor/market signals without engineering time |
Input
| Field | Type | Default | Description |
|---|---|---|---|
| domains | array | prefilled | List of domains to analyze robots.txt for AI crawler policies. Max 500. |
| delivery | string | "dataset" | How to deliver results. 'dataset' saves to Apify Dataset, 'webhook' sends to a URL. In demoMode, delivery is always data |
| webhookUrl | string | — | Webhook URL to send results to (only used when delivery is 'webhook'). Works with Slack, Discord, or any HTTP endpoint. |
| snapshotKey | string | "robotstxt-snapshots" | Key name for storing snapshots (used for change detection between runs). |
| concurrency | integer | 5 | Maximum number of parallel requests. Higher = faster but may trigger rate limits. |
| dryRun | boolean | false | If true, runs without saving results or sending webhooks. Useful for testing. |
| demoMode | boolean | false | If true, checks only 1 domain, returns compact policy fields, and disables webhook/snapshot writes. |
Input Example
{"domains": ["google.com","github.com","nytimes.com","openai.com"],"delivery": "dataset","snapshotKey": "robotstxt-snapshots","concurrency": 5,"dryRun": false,"demoMode": false}
Output
| Field | Type | Description |
|---|---|---|
meta | object | |
results | array | |
results[].domain | string | |
results[].status | string | |
results[].summary | object | |
results[].aiPolicies | array | |
results[].changes | array | |
results[].checkedAt | timestamp | |
results[].demoApplied | boolean | |
results[].detailsMasked | boolean | |
results[].error | null |
Output Example
{"meta": {"generatedAt": "2026-02-22T17:50:20.909Z","totals": {"total": 1,"requestedDomains": 2,"processedDomains": 1,"withRobotsTxt": 1,"noRobotsTxt": 0,"invalidDomains": 0,"blockingAi": 0,"errors": 0},"demoApplied": true,"limits": {"maxDomains": 1,"compactPolicies": true,"webhookEnabled": false,"snapshotWriteEnabled": false},"upgradeHint": "Demo mode checks 1 domain, disables webhook delivery, and returns a compact policy view. Set demoMode=false to unlock bulk checks and full policy details."},"results": [{"domain": "openai.com","status": "ok","summary": {"totalCrawlers": 16,"blocked": 0,"partialBlock": 16,"allowed": 0,"changed": 0},"aiPolicies": [{"crawler": "GPTBot","company": "OpenAI","blocked": false,"partialBlock": true,"allowed": false
API Usage
Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.
cURL
curl -X POST "https://api.apify.com/v2/acts/taroyamada~robotstxt-ai-checker/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{ "domains": [ "google.com", "github.com", "nytimes.com", "openai.com" ], "delivery": "dataset", "snapshotKey": "robotstxt-snapshots", "concurrency": 5, "dryRun": false, "demoMode": false }'
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("taroyamada/robotstxt-ai-checker").call(run_input={"domains": ["google.com","github.com","nytimes.com","openai.com"],"delivery": "dataset","snapshotKey": "robotstxt-snapshots","concurrency": 5,"dryRun": false,"demoMode": false})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
JavaScript / Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('taroyamada/robotstxt-ai-checker').call({"domains": ["google.com","github.com","nytimes.com","openai.com"],"delivery": "dataset","snapshotKey": "robotstxt-snapshots","concurrency": 5,"dryRun": false,"demoMode": false});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Tips & Limitations
- Schedule weekly runs against your production domains to catch config drift.
- Use webhook delivery to pipe findings into your SIEM (Splunk, Datadog, Elastic).
- For CI integration, block releases on
criticalseverity findings using exit codes. - Combine with
ssl-certificate-monitorfor layered cert + headers coverage. - Findings include links to official remediation docs — share with dev teams via the webhook payload.
FAQ
Is running this against a third-party site legal?
Passive public-header scanning is generally permitted, but follow your own compliance policies. Only scan sites you have authorization for.
How often should I scan?
Weekly for production domains; daily if you have high config-change velocity.
Can I export to a compliance tool?
Use webhook delivery or Dataset API — formats map well to Drata, Vanta, OneTrust import templates.
Is this a penetration test?
No — this actor performs passive compliance scanning only. No exploitation, fuzzing, or auth bypass.
Does this qualify as a SOC2 control?
This actor produces evidence artifacts suitable for SOC2 CC7.1 (continuous monitoring). It is not itself a SOC2 certification.
Related Actors
Security & Compliance cluster — explore related Apify tools:
- Privacy & Cookie Compliance Scanner | GDPR / CCPA Banner Audit — Scan public privacy pages and cookie banners for GDPR/CCPA compliance signals.
- Security Headers Checker API | OWASP Audit — Bulk-audit websites for OWASP security headers, grade each response, and monitor header changes between runs.
- SSL Certificate Monitor API | Expiry + Issuer Changes — Check SSL/TLS certificates in bulk, detect expiry and issuer changes, and emit alert-ready rows for ops and SEO teams.
- DNS / SPF / DKIM / DMARC Audit API — Bulk-audit domains for SPF, DKIM, DMARC, MX, and email-auth posture with grades and fix-ready recommendations.
- Data Breach Disclosure Monitor | HIPAA Breach Watch — Monitor the HHS OCR Breach Portal for new HIPAA data breach disclosures.
- WCAG Accessibility Checker API | ADA & EAA Compliance Audit — Audit websites for WCAG 2.
- 📜 Open-Source License & Dependency Audit API — Audit npm packages for license risk, dependency depth, maintainer activity, and compliance posture.
- Trust Center & Subprocessor Monitor API — Monitor vendor trust centers, subprocessor lists, DPA updates, and security posture changes.
Cost
Pay Per Event:
actor-start: $0.01 (flat fee per run)dataset-item: $0.003 per output item
Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01
No subscription required — you only pay for what you use.