Public AI Crawler Policy Signal Agent avatar

Public AI Crawler Policy Signal Agent

Pricing

Pay per event

Go to Apify Store
Public AI Crawler Policy Signal Agent

Public AI Crawler Policy Signal Agent

Analyze public robots.txt and llms.txt files for AI crawler allow/block policy evidence, LLM guidance files, stable hashes, and useful-result pricing.

Pricing

Pay per event

Rating

0.0

(0)

Developer

jack su

jack su

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share

Analyze public site-level policy files for AI crawler and LLM-agent guidance: explicit AI crawler rules in robots.txt, public llms.txt / llms-full.txt files, stable policy hashes, diagnostics, and change-aware useful billing.

What It Reads

  • One public site origin, robots.txt, llms.txt, or llms-full.txt URL per input.
  • robots.txt user-agent blocks for known AI crawler tokens such as GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, CCBot, and similar public bot names.
  • Public llms.txt and optionally llms-full.txt files for headings, links, topics, preview text, and evidence URLs.

What It Does Not Do

  • It does not crawl pages, parse sitemap URLs, audit SEO metadata, run a browser, execute JavaScript, log in, fetch private pages, or inspect account areas.
  • It rejects private-network hosts, query strings, fragments, credentials, path parameters, sensitive account paths, and token-like path segments.
  • It does not decide legal permission. It only returns public policy evidence that a human or downstream agent can review.

Pricing Events

  • apify-actor-start: one tiny run-start event when configured in Apify.
  • useful-ai-crawler-policy-result: charged only for useful, new or changed AI crawler policy evidence.

Generic robots.txt files without AI-specific user-agent blocks and without llms.txt guidance are written as partial records and do not trigger the useful event. Unchanged hashes, invalid inputs, failed fetches, and missing policy evidence are also uncharged.

apify-default-dataset-item is intentionally not used.

Example Input

{
"siteUrls": ["https://openai.com/"],
"includeLlmsFullTxt": true,
"requestTimeoutSecs": 15
}

Output Highlights

  • policyType
  • aiCrawlerPolicySignals
  • knownAiProviders
  • knownAiBotUserAgents
  • wildcardRobotsPolicy
  • llmsTxtSignals
  • riskLabels
  • diagnostics
  • aiCrawlerPolicyHash
  • changeStatus
  • billableEventName