robots.txt AI Policy Monitor | GPTBot ClaudeBot avatar

robots.txt AI Policy Monitor | GPTBot ClaudeBot

Pricing

from $11.00 / 1,000 results

Go to Apify Store
robots.txt AI Policy Monitor | GPTBot ClaudeBot

robots.txt AI Policy Monitor | GPTBot ClaudeBot

Detect GPTBot, ClaudeBot, Google-Extended, and other AI crawler policies in robots.txt, then monitor policy shifts over time.

Pricing

from $11.00 / 1,000 results

Rating

0.0

(0)

Developer

太郎 山田

太郎 山田

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

19 hours ago

Last modified

Share

Detect GPTBot, ClaudeBot, Google-Extended, and other AI crawler policies in robots.txt, then monitor policy shifts over time.

Store Quickstart

  • Start with store-input.example.json. It uses demoMode=true so the first Store run is safe, cheap, and easy to understand.
  • If the compact output is useful, switch to store-input.templates.json and pick one of:
  • Demo Quickstart for a trial run
  • Production Monitor for recurring dataset snapshots
  • Webhook Alert for policy-change notifications

Key Features

  • 🛡️ Compliance-first — Produces audit-ready reports mapping findings to standards (WCAG, GDPR, SOC2)
  • 🔒 Non-invasive scanning — Uses only observable public signals — no intrusive probing
  • 📊 Severity-scored output — Each finding rated for criticality with remediation guidance
  • 📡 Delta-alerting — Flag new findings since last run via webhook delivery
  • 📋 Evidence export — Raw headers/responses captured for compliance documentation

Use Cases

WhoWhy
DevelopersAutomate recurring data fetches without building custom scrapers
Data teamsPipe structured output into analytics warehouses
Ops teamsMonitor changes via webhook alerts
Product managersTrack competitor/market signals without engineering time

Input

FieldTypeDefaultDescription
domainsarrayprefilledList of domains to analyze robots.txt for AI crawler policies. Max 500.
deliverystring"dataset"How to deliver results. 'dataset' saves to Apify Dataset, 'webhook' sends to a URL. In demoMode, delivery is always data
webhookUrlstringWebhook URL to send results to (only used when delivery is 'webhook'). Works with Slack, Discord, or any HTTP endpoint.
snapshotKeystring"robotstxt-snapshots"Key name for storing snapshots (used for change detection between runs).
concurrencyinteger5Maximum number of parallel requests. Higher = faster but may trigger rate limits.
dryRunbooleanfalseIf true, runs without saving results or sending webhooks. Useful for testing.
demoModebooleanfalseIf true, checks only 1 domain, returns compact policy fields, and disables webhook/snapshot writes.

Input Example

{
"domains": [
"google.com",
"github.com",
"nytimes.com",
"openai.com"
],
"delivery": "dataset",
"snapshotKey": "robotstxt-snapshots",
"concurrency": 5,
"dryRun": false,
"demoMode": false
}

Output

FieldTypeDescription
metaobject
resultsarray
results[].domainstring
results[].statusstring
results[].summaryobject
results[].aiPoliciesarray
results[].changesarray
results[].checkedAttimestamp
results[].demoAppliedboolean
results[].detailsMaskedboolean
results[].errornull

Output Example

{
"meta": {
"generatedAt": "2026-02-22T17:50:20.909Z",
"totals": {
"total": 1,
"requestedDomains": 2,
"processedDomains": 1,
"withRobotsTxt": 1,
"noRobotsTxt": 0,
"invalidDomains": 0,
"blockingAi": 0,
"errors": 0
},
"demoApplied": true,
"limits": {
"maxDomains": 1,
"compactPolicies": true,
"webhookEnabled": false,
"snapshotWriteEnabled": false
},
"upgradeHint": "Demo mode checks 1 domain, disables webhook delivery, and returns a compact policy view. Set demoMode=false to unlock bulk checks and full policy details."
},
"results": [
{
"domain": "openai.com",
"status": "ok",
"summary": {
"totalCrawlers": 16,
"blocked": 0,
"partialBlock": 16,
"allowed": 0,
"changed": 0
},
"aiPolicies": [
{
"crawler": "GPTBot",
"company": "OpenAI",
"blocked": false,
"partialBlock": true,
"allowed": false

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~robotstxt-ai-checker/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "domains": [ "google.com", "github.com", "nytimes.com", "openai.com" ], "delivery": "dataset", "snapshotKey": "robotstxt-snapshots", "concurrency": 5, "dryRun": false, "demoMode": false }'

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/robotstxt-ai-checker").call(run_input={
"domains": [
"google.com",
"github.com",
"nytimes.com",
"openai.com"
],
"delivery": "dataset",
"snapshotKey": "robotstxt-snapshots",
"concurrency": 5,
"dryRun": false,
"demoMode": false
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/robotstxt-ai-checker').call({
"domains": [
"google.com",
"github.com",
"nytimes.com",
"openai.com"
],
"delivery": "dataset",
"snapshotKey": "robotstxt-snapshots",
"concurrency": 5,
"dryRun": false,
"demoMode": false
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

  • Schedule weekly runs against your production domains to catch config drift.
  • Use webhook delivery to pipe findings into your SIEM (Splunk, Datadog, Elastic).
  • For CI integration, block releases on critical severity findings using exit codes.
  • Combine with ssl-certificate-monitor for layered cert + headers coverage.
  • Findings include links to official remediation docs — share with dev teams via the webhook payload.

FAQ

Is running this against a third-party site legal?

Passive public-header scanning is generally permitted, but follow your own compliance policies. Only scan sites you have authorization for.

How often should I scan?

Weekly for production domains; daily if you have high config-change velocity.

Can I export to a compliance tool?

Use webhook delivery or Dataset API — formats map well to Drata, Vanta, OneTrust import templates.

Is this a penetration test?

No — this actor performs passive compliance scanning only. No exploitation, fuzzing, or auth bypass.

Does this qualify as a SOC2 control?

This actor produces evidence artifacts suitable for SOC2 CC7.1 (continuous monitoring). It is not itself a SOC2 certification.

Security & Compliance cluster — explore related Apify tools:

Cost

Pay Per Event:

  • actor-start: $0.01 (flat fee per run)
  • dataset-item: $0.003 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01

No subscription required — you only pay for what you use.