# Website RAG Readiness Audit Report (`taroyamada/website-rag-readiness-audit`) Actor

Turn public website URLs into a decision-ready RAG readiness audit with coverage, chunking risk, retrieval cleanup actions, source URLs, and no user API key requirement.

- **URL**: https://apify.com/taroyamada/website-rag-readiness-audit.md
- **Developed by:** [太郎 山田](https://apify.com/taroyamada) (community)
- **Categories:** AI, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Website RAG Readiness Audit Report

<!-- v29-buyability:start -->
### Buyable first run

Use this Actor when AI builders, documentation teams, support teams, and technical marketers need to decide whether public website pages are clean and complete enough for RAG ingestion. It is positioned as a report, not a raw scraper.

- Entry report: $9 / website_rag_snapshot_report. $9 checks public pages for volume, structure, noise, and basic RAG risk.
- Premium report: $29 / website_rag_readiness_report. $29 adds chunking risk, retrieval QA actions, coverage gaps, and cleanup priorities.
- Public price surface is entry and premium only. High-tier/watch events are held until real paid proof exists.
- Safety cap: `maxChargeUsd` is the hard budget limit.
- Why it is worth paying for: Avoids embedding public website content that is too thin, noisy, or poorly structured for retrieval.

Recommended first paid run:

```json
{
  "demoMode": false,
  "dryRun": false,
  "reportTier": "snapshot",
  "maxChargeUsd": 9,
  "maxReports": 1,
  "maxPages": 2,
  "urls": [
    "https://docs.apify.com/platform/actors"
  ],
  "seedQuestions": [
    "Can this documentation answer onboarding and troubleshooting questions?",
    "What content cleanup is needed before embedding?"
  ]
}
````

This Actor does not promise rankings, revenue, conversion lifts, or sales outcomes. It returns source-backed summaries, warnings, and prioritized actions.

### What It Does

Website RAG Readiness Audit Report fetches public pages you provide, extracts visible text signals, and returns a decision-ready report for whether the pages are suitable for retrieval-augmented generation workflows.

It focuses on:

- content volume and thin-page risk
- navigation boilerplate and chunking risk
- source URL coverage and blocked pages
- missing answer coverage for your seed questions
- prioritized cleanup actions before embedding

### Pricing Events

- `website_rag_snapshot_report` - $9
- `website_rag_readiness_report` - $29
  Use the listed report tiers for public runs; recurring watch workflows should be created as Apify tasks from a successful paid input.

`demoMode`, `dryRun`, invalid URLs, blocked/private pages, no-content pages, source failures, and cap-limited groups are no-charge.

### Source Rules

Allowed: public website URLs, public docs, help pages, blogs, product pages, pricing pages, sitemaps in a future version.

Blocked: login-only pages, private dashboards, paywalls, checkout/account portals, CAPTCHA/rate-limit bypass, personal data extraction, and unsupported business outcome claims.

### Output

Each dataset row includes `status`, `chargedEvent`, `chargedUsd`, `reason`, `decisionSummary`, `score`, `prioritizedActions`, `sourceUrls`, `warnings`, and `errors`.

# Actor input Schema

## `urls` (type: `array`):

Public website pages to audit. Use docs, pricing, help, blog, policy, or product pages. Login/paywall/private dashboard URLs are skipped as no-charge.

## `domain` (type: `string`):

Optional public domain. If provided without URLs, the Actor audits the homepage only in v1.

## `reportTier` (type: `string`):

Choose the public launch tier. Watch summaries are proof-gated and not selectable publicly.

## `seedQuestions` (type: `array`):

Optional buyer questions the RAG corpus should answer. Used for action recommendations only.

## `maxPages` (type: `integer`):

Maximum public pages to fetch in this run. maxChargeUsd is still checked before charging.

## `maxReports` (type: `integer`):

Maximum report groups to charge. Usually 1 for this Actor.

## `maxChargeUsd` (type: `number`):

Hard safety cap. If the selected report would exceed this cap, the Actor returns a no-charge limit\_reached summary.

## `demoMode` (type: `boolean`):

Return a no-charge sample preview without fetching external pages.

## `dryRun` (type: `boolean`):

Return a no-charge previewReport and nextRunInput.

## `sourceDatasetId` (type: `string`):

Advanced only. Optional future hook for prepared URL rows; not needed for first run.

## Actor input object example

```json
{
  "urls": [],
  "domain": "",
  "reportTier": "snapshot",
  "seedQuestions": [],
  "maxPages": 3,
  "maxReports": 1,
  "maxChargeUsd": 9,
  "demoMode": false,
  "dryRun": false,
  "sourceDatasetId": ""
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("taroyamada/website-rag-readiness-audit").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("taroyamada/website-rag-readiness-audit").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call taroyamada/website-rag-readiness-audit --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=taroyamada/website-rag-readiness-audit",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Website RAG Readiness Audit Report",
        "description": "Turn public website URLs into a decision-ready RAG readiness audit with coverage, chunking risk, retrieval cleanup actions, source URLs, and no user API key requirement.",
        "version": "0.1",
        "x-build-id": "FISP3HvBdCUGjJLh9"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/taroyamada~website-rag-readiness-audit/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-taroyamada-website-rag-readiness-audit",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/taroyamada~website-rag-readiness-audit/runs": {
            "post": {
                "operationId": "runs-sync-taroyamada-website-rag-readiness-audit",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/taroyamada~website-rag-readiness-audit/run-sync": {
            "post": {
                "operationId": "run-sync-taroyamada-website-rag-readiness-audit",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "urls": {
                        "title": "Public URLs",
                        "type": "array",
                        "description": "Public website pages to audit. Use docs, pricing, help, blog, policy, or product pages. Login/paywall/private dashboard URLs are skipped as no-charge.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "domain": {
                        "title": "Domain",
                        "type": "string",
                        "description": "Optional public domain. If provided without URLs, the Actor audits the homepage only in v1.",
                        "default": ""
                    },
                    "reportTier": {
                        "title": "Report tier",
                        "enum": [
                            "snapshot",
                            "readiness"
                        ],
                        "type": "string",
                        "description": "Choose the public launch tier. Watch summaries are proof-gated and not selectable publicly.",
                        "default": "snapshot"
                    },
                    "seedQuestions": {
                        "title": "Seed questions",
                        "type": "array",
                        "description": "Optional buyer questions the RAG corpus should answer. Used for action recommendations only.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxPages": {
                        "title": "Max pages",
                        "minimum": 1,
                        "maximum": 25,
                        "type": "integer",
                        "description": "Maximum public pages to fetch in this run. maxChargeUsd is still checked before charging.",
                        "default": 3
                    },
                    "maxReports": {
                        "title": "Max reports",
                        "minimum": 1,
                        "maximum": 5,
                        "type": "integer",
                        "description": "Maximum report groups to charge. Usually 1 for this Actor.",
                        "default": 1
                    },
                    "maxChargeUsd": {
                        "title": "Max charge USD",
                        "minimum": 0,
                        "maximum": 100,
                        "type": "number",
                        "description": "Hard safety cap. If the selected report would exceed this cap, the Actor returns a no-charge limit_reached summary.",
                        "default": 9
                    },
                    "demoMode": {
                        "title": "Demo mode",
                        "type": "boolean",
                        "description": "Return a no-charge sample preview without fetching external pages.",
                        "default": false
                    },
                    "dryRun": {
                        "title": "Dry run",
                        "type": "boolean",
                        "description": "Return a no-charge previewReport and nextRunInput.",
                        "default": false
                    },
                    "sourceDatasetId": {
                        "title": "Advanced source dataset ID",
                        "type": "string",
                        "description": "Advanced only. Optional future hook for prepared URL rows; not needed for first run.",
                        "default": ""
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
