Pricing

from $5.00 / 1,000 results

Robots Txt Audit

Audit robots.txt files for AI crawler access. Get an AI Readiness Score (0-100), analyze 61+ AI crawlers (ChatGPT, Claude, Perplexity, Gemini), detect syntax errors, security concerns, and get actionable recommendations. Batch audit multiple domains at once with optional subdomain scanning.

Pricing

from $5.00 / 1,000 results

Rating

0.0

(0)

Developer

Andy Page

Actor stats

Bookmarked

Total users

Monthly active users

24 days ago

Last modified

Robots.txt Audit Actor (AI Crawler Edition)

Audit robots.txt files for AI crawler management. Get an AI Readiness Score (0-100) for every domain, see which AI systems — ChatGPT, Claude, Perplexity, Gemini, and 60+ others — can access your content, detect syntax errors, flag security concerns, and get actionable recommendations to maximize your AI search visibility.

Why Use This Actor?

AI Readiness Score — A single 0-100 headline metric that instantly tells you how AI-discoverable your site is, with a letter grade (A-F) and detailed breakdown
60+ AI crawlers tracked — The most comprehensive AI crawler database available, covering AI Agents, AI Assistants, AI Search, and AI Training bots
Instant visibility — Know in seconds if ChatGPT, Claude, Perplexity, Brave, or Gemini can cite your content
Subdomain scanning — Discover robots.txt blind spots on www, blog, shop, api, docs, and 8 other common subdomains
Batch auditing — Audit hundreds of domains in a single run with configurable concurrency
Actionable recommendations — Get prioritized fixes with ready-to-paste robots.txt snippets
Security scanning — Detect sensitive path disclosures that actually help attackers
Competitor comparison — See how your AI crawler strategy compares to competitors

Features

Feature	Description
AI Readiness Score	0-100 score with A-F grade measuring AI discoverability
AI Crawler Detection	Analyzes 60+ AI crawlers across AI Agent, AI Assistant, AI Search, AI Data Scraper, and AI Training categories
Subdomain Scanning	Checks 13 common subdomains for separate robots.txt files
Search Engine Analysis	Checks access for Googlebot, Bingbot, DuckDuckBot, Yandex, Baidu, and more
Syntax Validation	Detects typos, malformed lines, non-standard directives, and orphaned rules
Security Audit	Flags sensitive paths disclosed in robots.txt (admin panels, .env, .git, etc.)
Sitemap Validation	Verifies declared sitemaps and discovers undeclared sitemaps at common locations
Strategy Classification	Classifies your AI posture as open, restrictive, mixed, or undefined
Competitor Comparison	Side-by-side AI crawler strategy comparison across domains
Proxy Support	Optional Apify proxy integration for fetching from restricted networks
CSV Export	Flattened output for spreadsheet analysis

AI Crawlers Tracked (60+)

AI Agents (5)

Crawler	Company	Importance
ChatGPT-Agent	OpenAI	High
GoogleAgent-Mariner	Google	Medium
NovaAct	Amazon	Medium
AmazonBuyForMe	Amazon	Low
Manus-User	Butterfly Effect	Low

AI Assistants (10)

Crawler	Company	Importance
Gemini-Deep-Research	Google	High
Google-NotebookLM	Google	Medium
MistralAI-User	Mistral	Medium
PhindBot	Phind	Low
Amzn-User	Amazon	Low
kagi-fetcher	Kagi	Low
Ai2Bot-DeepResearchEval	AI2	Low
Devin	Cognition	Low
TavilyBot	Tavily	Low
LinerBot	Liner	Low

AI Search (20)

Crawler	Company	Importance
GPTBot	OpenAI	Critical
ChatGPT-User	OpenAI	Critical
OAI-SearchBot	OpenAI	Critical
ClaudeBot	Anthropic	High
Claude-SearchBot	Anthropic	High
Claude-User	Anthropic	High
Claude-Web	Anthropic	High
PerplexityBot	Perplexity AI	High
Perplexity-User	Perplexity AI	High
Bravebot	Brave	Medium
AzureAI-SearchBot	Microsoft	Medium
DuckAssistBot	DuckDuckGo	Medium
Amazonbot	Amazon	Medium
meta-webindexer	Meta	Medium
FacebookBot	Meta	Low
Amzn-SearchBot	Amazon	Low
YouBot	You.com	Low
PetalBot	Huawei	Low
Cloudflare-AutoRAG	Cloudflare	Low
AddSearchBot / Anomura / atlassian-bot / Channel3Bot / LinkupBot / ZanistaBot	Various	Low

AI Data Scrapers (9)

Crawler	Company	Importance
GoogleOther	Google	Medium
imageSpider	ByteDance	Low
cohere-training-data-crawler	Cohere	Low
ChatGLM-Spider	Zhipu AI	Low
PanguBot	Huawei	Low
Timpibot	Timpi	Low
webzio-extended	Webz.io	Low
Kangaroo Bot	Kangaroo LLM	Low
VelenPublicWebCrawler	Velen/Hunter	Low

AI Training (10)

Crawler	Company	Importance
Google-Extended	Google	High
anthropic-ai	Anthropic	High
Google-CloudVertexBot	Google	Medium
Applebot-Extended	Apple	Medium
CCBot	Common Crawl	Medium
Bytespider	ByteDance/TikTok	Low
meta-externalagent	Meta	Low
Meta-ExternalFetcher	Meta	Low
Diffbot	Diffbot	Low
Omgilibot / cohere-ai / AI2Bot	Various	Low

Plus 7 search engine crawlers, 7 SEO tool crawlers, and 8 social media crawlers.

Input Parameters

Parameter	Type	Required	Default	Description
`domains`	string[]	Yes	—	Domains to audit (e.g., `["example.com", "github.com"]`)
`checkCompetitors`	string[]	No	`[]`	Competitor domains for side-by-side comparison
`includeSitemap`	boolean	No	`true`	Validate sitemap URLs declared in robots.txt
`scanSubdomains`	boolean	No	`false`	Scan 13 common subdomains (www, blog, shop, api, dev, staging, app, docs, help, support, cdn, mail, admin) for separate robots.txt files
`maxConcurrency`	integer	No	`10`	Max domains to process in parallel (1-50)
`outputFormat`	string	No	`"json"`	Output format: `json` or `csv`
`proxyConfig`	object	No	Apify residential	Proxy configuration for fetching robots.txt files

Example Input

{
  "domains": ["nytimes.com", "github.com", "shopify.com"],
  "checkCompetitors": ["medium.com"],
  "includeSitemap": true,
  "scanSubdomains": false,
  "maxConcurrency": 10,
  "outputFormat": "json"
}

Output Schema

Each domain produces a comprehensive audit report:

{
  "summary": {
    "domain": "nytimes.com",
    "robotsTxtExists": true,
    "robotsTxtUrl": "https://nytimes.com/robots.txt",
    "httpStatus": 200,
    "lastModified": "2026-01-15T00:00:00Z",
    "syntaxValid": true,
    "totalRules": 12,
    "responseTimeMs": 245,
    "aiCrawlerStatus": {
      "allowedAI": 5,
      "blockedAI": 8,
      "unspecifiedAI": 48,
      "defaultPolicy": "allow"
    },
    "aiReadinessScore": 72,
    "aiReadinessGrade": "B"
  },
  "aiReadinessScore": {
    "score": 72,
    "grade": "B",
    "breakdown": [
      { "factor": "Critical AI crawlers", "points": 35, "maxPoints": 35, "detail": "GPTBot: allowed (+12); ChatGPT-User: allowed (+12); OAI-SearchBot: allowed (+12)" },
      { "factor": "High-importance AI crawlers", "points": 15, "maxPoints": 20, "detail": "ClaudeBot: allowed (+5); Claude-SearchBot: allowed (+5); PerplexityBot: allowed (+5); Perplexity-User: BLOCKED (+0)" },
      { "factor": "Sitemap declared", "points": 10, "maxPoints": 10, "detail": "2 sitemap(s) declared and accessible" },
      { "factor": "Syntax quality", "points": 7, "maxPoints": 10, "detail": "1 high-severity syntax issue(s) found" },
      { "factor": "Search engines accessible", "points": 10, "maxPoints": 10, "detail": "All critical search engines allowed" },
      { "factor": "Wildcard policy", "points": 10, "maxPoints": 10, "detail": "Wildcard allows crawling (or no wildcard rule)" },
      { "factor": "Security posture", "points": 5, "maxPoints": 5, "detail": "No high-severity security path disclosures" }
    ]
  },
  "aiCrawlers": {
    "allowed": [
      {
        "bot": "GPTBot",
        "company": "OpenAI",
        "purpose": "ChatGPT responses & training",
        "status": "explicitly_allowed",
        "impact": "Your content CAN be accessed by OpenAI."
      }
    ],
    "blocked": [
      {
        "bot": "CCBot",
        "company": "Common Crawl",
        "status": "explicitly_blocked",
        "impact": "Your content CANNOT be accessed by Common Crawl."
      }
    ],
    "unspecified": [
      {
        "bot": "Bravebot",
        "company": "Brave",
        "status": "not_mentioned",
        "defaultBehavior": "allowed"
      }
    ]
  },
  "searchEngines": {
    "Googlebot": { "status": "allowed", "restrictions": [], "crawlDelay": null },
    "Bingbot": { "status": "partially_restricted", "restrictions": ["/admin"], "crawlDelay": null }
  },
  "syntaxIssues": [],
  "securityConcerns": [],
  "sitemaps": [
    { "url": "https://nytimes.com/sitemap.xml", "accessible": true, "httpStatus": 200, "source": "robots_txt" }
  ],
  "recommendations": [
    {
      "priority": "high",
      "category": "ai_visibility",
      "title": "Unblock Perplexity-User for AI search visibility",
      "description": "Perplexity-User is blocked.",
      "implementation": "User-agent: Perplexity-User\nAllow: /"
    }
  ],
  "aiStrategyInsights": {
    "currentPosture": "mixed",
    "description": "Mixed strategy: Allowing AI search but blocking AI training.",
    "suggestedStrategy": "Good approach. Review periodically as new AI crawlers emerge."
  },
  "competitorComparison": [],
  "subdomainAudits": []
}

AI Readiness Score

The AI Readiness Score is a 0-100 metric that measures how discoverable your site is to AI systems. It provides a single, comparable number with a letter grade (A-F) and a detailed breakdown showing exactly where points are gained or lost.

Scoring Breakdown

Factor	Max Points	How It Works
Critical AI crawlers allowed	35	+12 each for GPTBot, ChatGPT-User, OAI-SearchBot
High-importance AI crawlers	20	+5 each for ClaudeBot, Claude-SearchBot, PerplexityBot, Perplexity-User
Sitemap declared	10	10 if declared in robots.txt, 5 if found at /sitemap.xml, 0 if none
No syntax errors	10	10 if clean, -3 per high-severity issue
Search engines not blocked	10	10 if Googlebot + Bingbot allowed, -5 per blocked
No wildcard Disallow: /	10	10 if wildcard allows, 0 if wildcard blocks root
No security path disclosures	5	5 minus 1 per high-severity concern

Grades

Grade	Score Range	Meaning
A	90-100	Excellent AI discoverability
B	75-89	Good, minor improvements possible
C	55-74	Fair, significant gaps in AI access
D	35-54	Poor, most AI crawlers blocked or no robots.txt
F	0-34	Critical, site is largely invisible to AI

Special Cases

No robots.txt (404): Score = 50 (D) — all crawlers allowed by default but no intentional control
Access restricted (403/401): Score = 40 (D) — crawlers default to allow-all but situation is abnormal

Example Use Cases

1. Agency Client Audit

Audit all client domains for AI visibility issues:

{
  "domains": [
    "client1.com",
    "client2.com",
    "client3.com"
  ],
  "includeSitemap": true,
  "maxConcurrency": 10
}

2. Competitive Analysis

Compare your AI crawler strategy against competitors:

{
  "domains": ["mycompany.com"],
  "checkCompetitors": ["competitor1.com", "competitor2.com", "competitor3.com"],
  "includeSitemap": false
}

3. Enterprise Bulk Scan

Scan hundreds of domains for a portfolio audit:

{
  "domains": ["site1.com", "site2.com", "...up to 1000 domains"],
  "maxConcurrency": 25,
  "outputFormat": "csv"
}

4. Subdomain Audit

Scan a domain and its common subdomains (www, blog, shop, api, docs, etc.):

{
  "domains": ["example.com"],
  "scanSubdomains": true,
  "includeSitemap": true
}

5. Quick Single-Domain Check

Fast check for one domain:

{
  "domains": ["example.com"]
}

Output Formats

JSON (Default)

Standard Apify dataset output. Each domain produces one JSON object in the dataset. Use the Overview dataset view for a quick summary table including the AI Readiness Score.

CSV

Flattened audit data saved to the key-value store as output-csv. Ideal for importing into Google Sheets or Excel for reporting.

How It Works

Fetch — Retrieves /robots.txt via HTTPS (falls back to HTTP). Handles 404, 403, 5xx, redirects, and HTML catch-all responses. Retries transient failures.
Parse — Extracts User-agent groups, Allow/Disallow rules, Crawl-delay, Sitemap declarations, and Host directives.
Categorize — Maps every user-agent to our database of 80+ known crawlers across AI, search engines, SEO tools, and social media bots.
Analyze — Determines access status for each of 60+ AI crawlers and 7 search engines. Identifies security concerns from sensitive path disclosures.
Validate — Checks syntax for typos, malformed lines, orphaned directives, and non-standard extensions.
Recommend — Generates prioritized, actionable recommendations with copy-paste robots.txt snippets.
Score — Calculates the AI Readiness Score (0-100) with a detailed factor-by-factor breakdown.
Subdomain Scan — Optionally checks 13 common subdomains for separate robots.txt files and runs the full audit pipeline on each.
Compare — If competitors are provided, adds side-by-side AI strategy comparison.

Troubleshooting

Empty results or fetch errors

DNS failures: The domain may not exist or may be unreachable. Check the interpretation field in the output for details.
Timeouts: Some servers are slow. The actor retries transient failures automatically (2 retries with exponential backoff).
No robots.txt (404): This is a valid result — it means all crawlers are allowed by default. The actor still generates recommendations and an AI Readiness Score of 50.

All AI crawlers show as "unspecified"

This means the robots.txt has no AI-specific rules. All AI crawlers fall through to the wildcard (*) or default allow policy. The recommendations will suggest adding explicit rules.

Running locally

Install dependencies and run:

cd actors/robots-txt-audit
npm install
echo '{"domains":["example.com"]}' | npx apify-cli run --purge

Or run tests:

$npm test

Limitations

robots.txt only — This actor analyzes /robots.txt files. It does not crawl pages, check meta robots tags, or verify X-Robots-Tag headers.
Static analysis — The audit reflects the robots.txt content at fetch time. It does not monitor changes over time (planned for V2).
No Wayback Machine integration — Historical robots.txt analysis is planned for V2.
Standard compliance — The parser follows the Google robots.txt specification. Non-standard extensions (Crawl-delay, Host) are detected but flagged as non-standard.
Response size limit — robots.txt files larger than 1 MB are skipped to prevent memory issues (this is extremely rare).

FAQ

Q: How long does a typical run take? A: 10 domains takes about 5-10 seconds. 100 domains with concurrency 10 takes about 30-60 seconds. It's very fast since we're only fetching small text files. Adding subdomain scanning increases time proportionally.

Q: Do I need proxies? A: Usually not — robots.txt files are public and lightweight. Proxies are available if you're scanning from a datacenter IP that gets rate-limited.

Q: What if a domain has no robots.txt? A: The actor reports robotsTxtExists: false with the interpretation "all crawlers are allowed by default", generates a recommendation to create one, and assigns an AI Readiness Score of 50 (Grade D).

Q: How often is the AI crawler database updated? A: We track 60+ AI crawlers as of February 2026, sourced from Dark Visitors and our own research. The database is updated with each actor release as new AI crawlers emerge.

Q: What is the AI Readiness Score? A: It's a 0-100 metric that measures how AI-discoverable your site is. A score of 90+ (Grade A) means your site is fully optimized for AI search engines. Below 35 (Grade F) means most AI systems can't access your content.

Q: Can I use this to generate a robots.txt file? A: The recommendations include ready-to-paste robots.txt snippets. A full robots.txt generator is planned for V2.

Changelog

v1.1.0 (February 2026)

AI Readiness Score: New 0-100 headline metric with A-F grades and detailed 7-factor breakdown
60+ AI crawlers: Expanded from 26 to 61 AI crawlers, now covering AI Agents (ChatGPT-Agent, NovaAct, Manus), AI Assistants (Gemini-Deep-Research, Devin, MistralAI-User), AI Data Scrapers (GoogleOther, ChatGLM-Spider), and AI Search (Bravebot, AzureAI-SearchBot, meta-webindexer)
Subdomain scanning: Optionally scan 13 common subdomains for separate robots.txt files
Expanded non-AI crawlers: Added rogerbot, Screaming Frog, SiteAuditBot (SEO) and Discordbot, WhatsApp, TelegramBot, Pinterestbot (social)
267 unit tests

v1.0.1 (February 2026)

Added 7 AI crawlers (ClaudeBot, Claude-SearchBot, Claude-User, Google-CloudVertexBot, DuckAssistBot, Perplexity-User, Meta-ExternalFetcher)
403/401 HTTP status handling with specific recommendations
Sitemap auto-discovery at /sitemap.xml when not declared in robots.txt
Crawl-delay detection and wildcard crawl-delay warnings
Output deduplication (removed knownBots redundancy, cleaned up wildcard-inherited search engine restrictions)
218 unit tests

v1.0.0 (February 2026)

Initial public release
19 AI crawlers tracked
7 search engine crawlers, 4 SEO tool crawlers, 3 social media crawlers
Syntax validation with typo detection
Security concern identification
Prioritized recommendations with implementation snippets
Competitor comparison
Sitemap accessibility validation
CSV export
Proxy support via Apify proxy configuration
Retry logic with exponential backoff
184 unit tests

Support

Issues: Report bugs via GitHub issues or the Apify community forum
Feature requests: Contact us through Apify or open a GitHub issue
Enterprise: For bulk scanning (10K+ domains/month), reach out for custom pricing

Built by A Page Ventures | Apify Store

AI Readiness Checker - Website Scanner

alizarin_refrigerator-owner/ai-readiness-checker

Analyze any website for AI optimization readiness. Check robots.txt, llms.txt, structured data, meta tags & content quality. Get actionable recommendations to improve AI crawler accessibility.

The Howlers

AI Readiness Auditor

rationalistic_counsel/ai-readiness-auditor

Check how AI-ready any website is. Get an AI Readiness Score (0-100) checking llms.txt, robots.txt AI crawler directives, Schema.org structured data, and meta tags. No API key needed.

J N

Ai Visibility Suite - Dark Visitors Alternative

alizarin_refrigerator-owner/ai-visibility-suite---dark-visitors-alternative

Comprehensive AI bot monitoring, robots.txt analysis, LLMs.txt generation & AI shopping optimization. Monitor AI crawlers visits, check AI compliance, generate AI-friendly configurations, and optimize for AI shopping agents. AI Bot Directory Robots.txt LLMs.txt AI Shopping Competitor AI Audit

The Howlers

Robots.txt Checker - CMS-Aware Analysis with AI Recommendations

alizarin_refrigerator-owner/robots-txt-checker

The Robots.txt Checker provides comprehensive analysis of your robots.txt file: Syntax Validation CMS Detection - Identify WordPress, Shopify, Drupal,& 6+ other CMS platforms Best Practice Check Companion File Checks - sitemap.xml, llms.txt, security.txt AI Recommendations - CMS-specific suggestions

The Howlers

Robots.txt Validator

predictable_function/my-actor-3

List of website base URLs whose robots.txt files will be validated

riya rawat

5.0

Indexability Audit

zerobreak/indexability-audit

Indexability audit tool that checks robots.txt, meta robots tags, X-Robots-Tag headers, and canonical URLs for any list of pages, so SEO teams know which ones Google can actually crawl and index.

ZeroBreak

Robots Txt Analyzer

zerobreak/robots-txt-analyzer

Robots txt analyzer that fetches and parses crawl rules from any website in bulk, so SEO teams and developers can audit blocked paths, user agents, and sitemap locations across hundreds of domains without manual work.

ZeroBreak

Robots.txt Auditor & Sitemap Finder

andok/robotstxt-auditor

Scan robots.txt files in bulk to extract sitemap URLs and verify crawler directives for technical SEO compliance.

Andok

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

scrappy_garden/robots-txt-validator

Validate robots.txt for one or more websites: fetches /robots.txt per host, parses directive groups (User-agent/Allow/Disallow/Crawl-delay/Sitemap), reports common errors and warnings, and can test URLs against the chosen User-Agent.

Bikram Adhikari

robots.txt AI Policy Monitor | GPTBot ClaudeBot

taroyamada/robotstxt-ai-checker

Detect GPTBot, ClaudeBot, Google-Extended, and other AI crawler policies in robots.txt, then monitor policy shifts over time.

太郎山田

Robots Txt Audit

Robots.txt Audit Actor (AI Crawler Edition)

Why Use This Actor?

Features

AI Crawlers Tracked (60+)

AI Agents (5)

AI Assistants (10)

AI Search (20)

AI Data Scrapers (9)

AI Training (10)

Input Parameters

Example Input

Output Schema

AI Readiness Score

Scoring Breakdown

Grades

Special Cases

Example Use Cases

1. Agency Client Audit

2. Competitive Analysis

3. Enterprise Bulk Scan

4. Subdomain Audit

5. Quick Single-Domain Check

Output Formats

JSON (Default)

CSV

How It Works

Troubleshooting

Empty results or fetch errors

All AI crawlers show as "unspecified"

Running locally

Limitations

FAQ

Changelog

v1.1.0 (February 2026)

v1.0.1 (February 2026)

v1.0.0 (February 2026)

Support

You might also like

AI Readiness Checker - Website Scanner

AI Readiness Auditor

Ai Visibility Suite - Dark Visitors Alternative

Robots.txt Checker - CMS-Aware Analysis with AI Recommendations

Robots.txt Validator

Indexability Audit

Robots Txt Analyzer

Robots.txt Auditor & Sitemap Finder

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

robots.txt AI Policy Monitor | GPTBot ClaudeBot