# Site QA Indexability AI Crawler Report Scraper (`taroyamada/site-qa-indexability-ai-crawler-report-scraper`) Actor

Unofficially audit user-supplied public pages, robots.txt, and llms.txt signals for AI crawler indexability issues and source-linked report rows.

- **URL**: https://apify.com/taroyamada/site-qa-indexability-ai-crawler-report-scraper.md
- **Developed by:** [naoki anzai](https://apify.com/taroyamada) (community)
- **Categories:** Developer tools, Business
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $30.00 / 1,000 ai crawler policy checkeds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Site QA Indexability AI Crawler Report Scraper

Site owners, SEO agencies, and content teams use this actor to audit public indexability and AI crawler access signals.
Provide public URLs and optional AI crawler user-agent names.
The actor returns source-linked policy observations, indexability issues, reports, and export rows.

### Store Quickstart

Run with `dryRun=false` and public URLs that you own or are allowed to audit.

```json
{
  "urls": ["https://example.com/?siteQaCanary=indexability-ai-crawler-v1"],
  "aiCrawlerUserAgents": ["GPTBot", "Google-Extended", "PerplexityBot", "ClaudeBot"],
  "checkRobotsTxt": true,
  "checkLlmsTxt": true,
  "authorizedUseConfirmed": true,
  "generateReport": true,
  "emitUnchanged": false,
  "dryRun": false
}
````

### Input Examples

#### Audit one page and origin policies

```json
{
  "urls": ["https://example.com/blog/launch"],
  "aiCrawlerUserAgents": ["GPTBot", "ClaudeBot"],
  "checkRobotsTxt": true,
  "checkLlmsTxt": true,
  "authorizedUseConfirmed": true,
  "dryRun": false
}
```

#### Batch audit a site section

```json
{
  "urls": [
    "https://example.com/",
    "https://example.com/pricing",
    "https://example.com/docs"
  ],
  "maxPages": 25,
  "emitPageRows": false,
  "generateReport": true,
  "authorizedUseConfirmed": true,
  "dryRun": false
}
```

#### Generate a handoff export

```json
{
  "urls": ["https://example.com/landing-page"],
  "aiCrawlerUserAgents": ["GPTBot", "Google-Extended", "PerplexityBot"],
  "emitExport": true,
  "emitUnchanged": false,
  "authorizedUseConfirmed": true,
  "dryRun": false
}
```

### Sample Output

```json
{
  "actorName": "site-qa-indexability-ai-crawler-report-scraper",
  "rowType": "indexability_issue",
  "billingEventName": "indexability-issue-detected",
  "issueType": "ai_crawler_disallowed_by_robots",
  "severity": "high",
  "sourceUrl": "https://example.com/?siteQaCanary=indexability-ai-crawler-v1"
}
```

### Output Fields

- `rowType`: `ai_crawler_policy_observation`, `indexability_issue`, `ai_crawler_indexability_report`, or `indexability_export`.
- `billingEventName`: PAY\_PER\_EVENT event name used for the row.
- `sourceUrl`: public URL or policy file that supports the row.
- `issueType`: detected source-linked issue when applicable.
- `blockedUserAgents`: crawler names with broad robots.txt blocks when detected.

### Pricing And No-Change Runs

- `ai-crawler-policy-checked`: $0.030 per public robots.txt or llms.txt policy observation.
- `indexability-issue-detected`: $0.120 per source-linked indexability issue.
- `ai-crawler-indexability-report`: $6.000 per site-level report.
- `indexability-export-generated`: $8.000 per generated export.

When `emitUnchanged=false`, repeated unchanged runs emit zero dataset rows and zero charges after state is saved.

### Compliance Guardrails

- Public pages, robots.txt, and llms.txt only.
- No login, paywall, CAPTCHA, private session, credentialed API, or bypass behavior.
- Non-dry runs require `authorizedUseConfirmed=true`; use this only for sites you own, manage, or are allowed to audit.
- This is an unofficial audit tool and is not affiliated with any crawler, search engine, or AI provider.
- No ranking guarantee, AI citation guarantee, legal conclusion, or compliance certification.

### Bundle Paths

- Site QA / AI Crawler Readiness: run a low-cost policy check first, then generate `ai-crawler-indexability-report` or `indexability-export-generated` for client delivery.
- Pair with [Site QA Content Report Scraper](https://apify.com/taroyamada/site-qa-content-report-scraper) for page content issue reports.
- Pair with [Site QA Broken Link Report Scraper](https://apify.com/taroyamada/site-qa-broken-link-report-scraper) for link health reports.

### See Also

- [Site QA Content Report Scraper](https://apify.com/taroyamada/site-qa-content-report-scraper) for content QA issue reports.
- [Site QA Broken Link Report Scraper](https://apify.com/taroyamada/site-qa-broken-link-report-scraper) for broken link reports.
- [SaaS Pricing Changelog Battlecard Watch Scraper](https://apify.com/taroyamada/saas-pricing-changelog-battlecard-watch-scraper) for SaaS competitive page monitoring.

# Actor input Schema

## `urls` (type: `array`):

Public URLs that you are allowed to audit.

## `aiCrawlerUserAgents` (type: `array`):

Crawler user-agent names to inspect in robots.txt.

## `maxPages` (type: `integer`):

Maximum number of input pages to check.

## `checkRobotsTxt` (type: `boolean`):

Fetch and inspect robots.txt at each site origin.

## `checkLlmsTxt` (type: `boolean`):

Fetch and inspect llms.txt at each site origin.

## `authorizedUseConfirmed` (type: `boolean`):

Required for non-dry runs. Confirms each URL is owned by you, your client, or otherwise authorized for this audit.

## `emitPageRows` (type: `boolean`):

Emit optional public page indexability snapshot rows.

## `generateReport` (type: `boolean`):

Generate a site-level AI crawler indexability report row.

## `emitExport` (type: `boolean`):

Generate an export row for handoff workflows.

## `emitUnchanged` (type: `boolean`):

When false, repeated unchanged runs emit zero rows and zero charges.

## `dryRun` (type: `boolean`):

Emit local sample rows without charging.

## `initialRunMode` (type: `string`):

Deployment canary control used by automation; emit\_backfill allows first-run proof rows, baseline\_only is used for no-change proof.

## `snapshotKey` (type: `string`):

Optional state namespace for canary and recurring no-change proof runs.

## Actor input object example

```json
{
  "urls": [
    "https://example.com/?siteQaCanary=indexability-ai-crawler-v1"
  ],
  "aiCrawlerUserAgents": [
    "GPTBot",
    "Google-Extended",
    "PerplexityBot",
    "ClaudeBot"
  ],
  "maxPages": 10,
  "checkRobotsTxt": true,
  "checkLlmsTxt": true,
  "authorizedUseConfirmed": false,
  "emitPageRows": false,
  "generateReport": true,
  "emitExport": false,
  "emitUnchanged": false,
  "dryRun": true,
  "initialRunMode": "emit_backfill",
  "snapshotKey": ""
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("taroyamada/site-qa-indexability-ai-crawler-report-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("taroyamada/site-qa-indexability-ai-crawler-report-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call taroyamada/site-qa-indexability-ai-crawler-report-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=taroyamada/site-qa-indexability-ai-crawler-report-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Site QA Indexability AI Crawler Report Scraper",
        "description": "Unofficially audit user-supplied public pages, robots.txt, and llms.txt signals for AI crawler indexability issues and source-linked report rows.",
        "version": "0.1",
        "x-build-id": "fQyeImsuzgFKAOp8c"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/taroyamada~site-qa-indexability-ai-crawler-report-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-taroyamada-site-qa-indexability-ai-crawler-report-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/taroyamada~site-qa-indexability-ai-crawler-report-scraper/runs": {
            "post": {
                "operationId": "runs-sync-taroyamada-site-qa-indexability-ai-crawler-report-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/taroyamada~site-qa-indexability-ai-crawler-report-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-taroyamada-site-qa-indexability-ai-crawler-report-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "urls"
                ],
                "properties": {
                    "urls": {
                        "title": "Public page URLs",
                        "type": "array",
                        "description": "Public URLs that you are allowed to audit.",
                        "default": [
                            "https://example.com/?siteQaCanary=indexability-ai-crawler-v1"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "aiCrawlerUserAgents": {
                        "title": "AI crawler user agents",
                        "type": "array",
                        "description": "Crawler user-agent names to inspect in robots.txt.",
                        "default": [
                            "GPTBot",
                            "Google-Extended",
                            "PerplexityBot",
                            "ClaudeBot"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxPages": {
                        "title": "Max pages",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Maximum number of input pages to check.",
                        "default": 10
                    },
                    "checkRobotsTxt": {
                        "title": "Check robots.txt",
                        "type": "boolean",
                        "description": "Fetch and inspect robots.txt at each site origin.",
                        "default": true
                    },
                    "checkLlmsTxt": {
                        "title": "Check llms.txt",
                        "type": "boolean",
                        "description": "Fetch and inspect llms.txt at each site origin.",
                        "default": true
                    },
                    "authorizedUseConfirmed": {
                        "title": "Authorized use confirmed",
                        "type": "boolean",
                        "description": "Required for non-dry runs. Confirms each URL is owned by you, your client, or otherwise authorized for this audit.",
                        "default": false
                    },
                    "emitPageRows": {
                        "title": "Emit page snapshot rows",
                        "type": "boolean",
                        "description": "Emit optional public page indexability snapshot rows.",
                        "default": false
                    },
                    "generateReport": {
                        "title": "Generate report row",
                        "type": "boolean",
                        "description": "Generate a site-level AI crawler indexability report row.",
                        "default": true
                    },
                    "emitExport": {
                        "title": "Generate export row",
                        "type": "boolean",
                        "description": "Generate an export row for handoff workflows.",
                        "default": false
                    },
                    "emitUnchanged": {
                        "title": "Emit unchanged rows",
                        "type": "boolean",
                        "description": "When false, repeated unchanged runs emit zero rows and zero charges.",
                        "default": false
                    },
                    "dryRun": {
                        "title": "Dry run",
                        "type": "boolean",
                        "description": "Emit local sample rows without charging.",
                        "default": true
                    },
                    "initialRunMode": {
                        "title": "Initial run mode",
                        "enum": [
                            "emit_backfill",
                            "baseline_only"
                        ],
                        "type": "string",
                        "description": "Deployment canary control used by automation; emit_backfill allows first-run proof rows, baseline_only is used for no-change proof.",
                        "default": "emit_backfill"
                    },
                    "snapshotKey": {
                        "title": "Snapshot key",
                        "type": "string",
                        "description": "Optional state namespace for canary and recurring no-change proof runs.",
                        "default": ""
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
