Site QA Indexability AI Crawler Report Scraper avatar

Site QA Indexability AI Crawler Report Scraper

Pricing

from $30.00 / 1,000 ai crawler policy checkeds

Go to Apify Store
Site QA Indexability AI Crawler Report Scraper

Site QA Indexability AI Crawler Report Scraper

Unofficially audit user-supplied public pages, robots.txt, and llms.txt signals for AI crawler indexability issues and source-linked report rows.

Pricing

from $30.00 / 1,000 ai crawler policy checkeds

Rating

0.0

(0)

Developer

naoki anzai

naoki anzai

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

21 days ago

Last modified

Share

Site owners, SEO agencies, and content teams use this actor to audit public indexability and AI crawler access signals. Provide public URLs and optional AI crawler user-agent names. The actor returns source-linked policy observations, indexability issues, reports, and export rows.

Store Quickstart

Run with dryRun=false and public URLs that you own or are allowed to audit.

{
"urls": ["https://example.com/?siteQaCanary=indexability-ai-crawler-v1"],
"aiCrawlerUserAgents": ["GPTBot", "Google-Extended", "PerplexityBot", "ClaudeBot"],
"checkRobotsTxt": true,
"checkLlmsTxt": true,
"authorizedUseConfirmed": true,
"generateReport": true,
"emitUnchanged": false,
"dryRun": false
}

Input Examples

Audit one page and origin policies

{
"urls": ["https://example.com/blog/launch"],
"aiCrawlerUserAgents": ["GPTBot", "ClaudeBot"],
"checkRobotsTxt": true,
"checkLlmsTxt": true,
"authorizedUseConfirmed": true,
"dryRun": false
}

Batch audit a site section

{
"urls": [
"https://example.com/",
"https://example.com/pricing",
"https://example.com/docs"
],
"maxPages": 25,
"emitPageRows": false,
"generateReport": true,
"authorizedUseConfirmed": true,
"dryRun": false
}

Generate a handoff export

{
"urls": ["https://example.com/landing-page"],
"aiCrawlerUserAgents": ["GPTBot", "Google-Extended", "PerplexityBot"],
"emitExport": true,
"emitUnchanged": false,
"authorizedUseConfirmed": true,
"dryRun": false
}

Sample Output

{
"actorName": "site-qa-indexability-ai-crawler-report-scraper",
"rowType": "indexability_issue",
"billingEventName": "indexability-issue-detected",
"issueType": "ai_crawler_disallowed_by_robots",
"severity": "high",
"sourceUrl": "https://example.com/?siteQaCanary=indexability-ai-crawler-v1"
}

Output Fields

  • rowType: ai_crawler_policy_observation, indexability_issue, ai_crawler_indexability_report, or indexability_export.
  • billingEventName: PAY_PER_EVENT event name used for the row.
  • sourceUrl: public URL or policy file that supports the row.
  • issueType: detected source-linked issue when applicable.
  • blockedUserAgents: crawler names with broad robots.txt blocks when detected.

Pricing And No-Change Runs

  • ai-crawler-policy-checked: $0.030 per public robots.txt or llms.txt policy observation.
  • indexability-issue-detected: $0.120 per source-linked indexability issue.
  • ai-crawler-indexability-report: $6.000 per site-level report.
  • indexability-export-generated: $8.000 per generated export.

When emitUnchanged=false, repeated unchanged runs emit zero dataset rows and zero charges after state is saved.

Compliance Guardrails

  • Public pages, robots.txt, and llms.txt only.
  • No login, paywall, CAPTCHA, private session, credentialed API, or bypass behavior.
  • Non-dry runs require authorizedUseConfirmed=true; use this only for sites you own, manage, or are allowed to audit.
  • This is an unofficial audit tool and is not affiliated with any crawler, search engine, or AI provider.
  • No ranking guarantee, AI citation guarantee, legal conclusion, or compliance certification.

Bundle Paths

See Also