# Privacy & Cookie Compliance Scanner | GDPR / CCPA Banner Audit (`taroyamada/privacy-cookie-compliance-scanner`) Actor

Scan public privacy pages and cookie banners for GDPR/CCPA compliance signals. Returns one clean compliance summary row per site with banner detection, consent framework identification, policy freshness, and recommended actions.

- **URL**: https://apify.com/taroyamada/privacy-cookie-compliance-scanner.md
- **Developed by:** [太郎 山田](https://apify.com/taroyamada) (community)
- **Categories:** Business, Automation, Developer tools
- **Stats:** 1 total users, 0 monthly users, 0.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Privacy & Cookie Compliance Scanner

Scan public privacy pages and cookie banners for GDPR/CCPA compliance signals. Returns **one clean compliance summary row per site** with cookie banner detection, consent framework identification, privacy policy freshness, snapshot-based drift detection, and recommended actions.

### What it does

For each site you provide, the actor:

1. **Fetches the homepage** — detects cookie banners and consent management platform (CMP) signatures (OneTrust, Cookiebot, TrustArc, Osano, Didomi, iubenda, IAB TCF, and more)
2. **Fetches the privacy policy page** — confirms reachability and extracts the "Last Updated" date
3. **Fetches the cookie policy page** — confirms reachability (auto-discovered from homepage links if not supplied)
4. **Compares against the previous run snapshot** — flags changes in banner presence, policy URL, policy date, and consent signal set
5. **Produces one compliance summary row per site** with `complianceStatus`, `cookieBannerDetected`, `consentSignals`, `policyUpdatedAt`, `changedSinceLastRun`, `recommendedActions`, and raw `evidence`

### Use cases

- **Recurring compliance monitoring** — schedule daily or weekly runs to catch banner removals, CMP changes, or policy date regressions before they become audit findings
- **Agency portfolio audits** — scan all client sites in one run, export results as a dataset for dashboards or handoff reports
- **Pre-launch checklists** — verify cookie banner, privacy policy, and cookie policy are in place before go-live
- **Post-deploy regression watch** — webhook delivery for immediate alerts when compliance signals change after a release

### Inputs

| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `sites` | array | ✅ | — | List of sites to scan. Each entry needs `homepageUrl`; `privacyPolicyUrl` and `cookiePolicyUrl` are auto-discovered if omitted. |
| `sites[].homepageUrl` | string | ✅ | — | Full URL of the homepage to scan for the cookie banner. |
| `sites[].privacyPolicyUrl` | string | ❌ | auto | Direct URL of the privacy policy page. Auto-discovered from footer links if not supplied. |
| `sites[].cookiePolicyUrl` | string | ❌ | auto | Direct URL of the cookie policy page. Auto-discovered from footer links if not supplied. |
| `sites[].region` | string | ❌ | — | Expected region (`EU`, `US`, etc.) — used to apply stricter banner checks for GDPR jurisdictions. |
| `sites[].consentMode` | string | ❌ | — | Optional expected consent mode tag (e.g. `CCPA`) for evidence labelling. |
| `delivery` | string | ❌ | `dataset` | `dataset` writes results to the Apify dataset; `webhook` POSTs the payload to `webhookUrl`. |
| `webhookUrl` | string | ❌ | — | Required when `delivery` is `webhook`. |
| `snapshotKey` | string | ❌ | `privacy-cookie-compliance-snapshots` | Key for persisting run-to-run snapshots. Keep stable for recurring drift detection. |
| `concurrency` | integer | ❌ | `2` | Parallel site scans (1–10). |
| `batchDelayMs` | integer | ❌ | `500` | Pause between batches in milliseconds. |
| `requestTimeoutSecs` | integer | ❌ | `20` | Per-request timeout (5–60s). |
| `followRedirects` | boolean | ❌ | `true` | Follow HTTP redirects before scanning. |
| `dryRun` | boolean | ❌ | `false` | Preview results without saving snapshots, dataset rows, or sending webhooks. |

### Output

One dataset row per site. Key fields:

| Field | Type | Description |
|-------|------|-------------|
| `siteUrl` | string | Homepage URL from input. |
| `complianceStatus` | string | `compliant` \| `partial` \| `non_compliant` \| `unknown` |
| `executiveSummary` | string | Plain-text one-line summary for dashboards and reports. |
| `cookieBannerDetected` | boolean | Whether a cookie consent banner or CMP was found on the homepage. |
| `privacyPolicyDetected` | boolean | Whether a reachable privacy policy page was found. |
| `cookiePolicyDetected` | boolean | Whether a reachable cookie policy page was found. |
| `policyUpdatedAt` | string \| null | "Last Updated" date extracted from the privacy or cookie policy page. |
| `consentSignals` | string[] | Named CMPs detected (e.g. `["OneTrust", "IAB TCF"]`). |
| `recommendedActions` | string[] | Prioritised, human-readable action items. |
| `changedSinceLastRun` | boolean \| null | `true` if any compliance signal changed; `null` on first run. |
| `evidence` | object | Raw evidence: status codes, discovered URLs, banner match details, region, consent mode. |
| `checkedAt` | string | ISO 8601 timestamp of when the scan ran. |

### Quickstart

1. Open the actor and click **Try for free**
2. Under **Sites to scan**, keep the default or paste your own site's `homepageUrl`
3. Click **Start** — you'll have a compliance summary row in seconds
4. Add a schedule to repeat the scan daily or weekly for drift detection

### Detected consent frameworks

OneTrust · Cookiebot · TrustArc · Osano · iubenda · CookieYes · Quantcast Choice · Didomi · Usercentrics · Sourcepoint · CIVIC Cookie Control · Cookiefirst · Termly · IAB TCF · CCPA USP API · Google Consent Mode · Generic banner keywords

### Notes

- The actor uses lightweight HTTP fetching (no browser/JavaScript rendering). Dynamic SPAs that inject the cookie banner only after JS execution may show `cookieBannerDetected: false` even if a banner exists visually. For JS-rendered banners, supplement with a Playwright-based actor.
- `changedSinceLastRun` requires at least two runs with the same `snapshotKey`.
- No external dependencies beyond Node.js 18+ built-in `fetch`.

# Actor input Schema

## `sites` (type: `array`):

List of sites to scan. Each entry requires a homepageUrl; privacyPolicyUrl and cookiePolicyUrl are auto-discovered if omitted. Quickstart: begin with one site. Maximum 100 per run.
## `delivery` (type: `string`):

Starter path: dataset keeps the first run low-friction. Advanced path: webhook sends the same payload to your endpoint for compliance ops or agency handoff.
## `webhookUrl` (type: `string`):

Advanced delivery only: required when delivery is webhook. Must be a valid http(s) URL.
## `snapshotKey` (type: `string`):

Keep this stable when moving from the quickstart to recurring compliance monitoring so policy drift stays comparable run to run.
## `concurrency` (type: `integer`):

Parallel site checks. Keep at 1-2 for quickstart runs; increase for larger compliance portfolios.
## `batchDelayMs` (type: `integer`):

Pause between batches to keep scans polite and avoid rate limiting.
## `requestTimeoutSecs` (type: `integer`):

Per-request timeout for fetching homepage, privacy policy, and cookie policy pages.
## `followRedirects` (type: `boolean`):

Follow HTTP redirects before scanning pages so canonical URLs are evaluated correctly.
## `dryRun` (type: `boolean`):

Preview scan results without saving snapshots, dataset rows, or sending webhooks.

## Actor input object example

```json
{
  "sites": [
    {
      "homepageUrl": "https://vercel.com",
      "privacyPolicyUrl": "https://vercel.com/legal/privacy-policy",
      "cookiePolicyUrl": "",
      "region": "EU",
      "consentMode": ""
    }
  ],
  "delivery": "dataset",
  "snapshotKey": "privacy-cookie-compliance-snapshots",
  "concurrency": 2,
  "batchDelayMs": 500,
  "requestTimeoutSecs": 20,
  "followRedirects": true,
  "dryRun": false
}
````

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "sites": [
        {
            "homepageUrl": "https://vercel.com",
            "privacyPolicyUrl": "https://vercel.com/legal/privacy-policy",
            "cookiePolicyUrl": "",
            "region": "EU",
            "consentMode": ""
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("taroyamada/privacy-cookie-compliance-scanner").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "sites": [{
            "homepageUrl": "https://vercel.com",
            "privacyPolicyUrl": "https://vercel.com/legal/privacy-policy",
            "cookiePolicyUrl": "",
            "region": "EU",
            "consentMode": "",
        }] }

# Run the Actor and wait for it to finish
run = client.actor("taroyamada/privacy-cookie-compliance-scanner").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "sites": [
    {
      "homepageUrl": "https://vercel.com",
      "privacyPolicyUrl": "https://vercel.com/legal/privacy-policy",
      "cookiePolicyUrl": "",
      "region": "EU",
      "consentMode": ""
    }
  ]
}' |
apify call taroyamada/privacy-cookie-compliance-scanner --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=taroyamada/privacy-cookie-compliance-scanner",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Privacy & Cookie Compliance Scanner | GDPR / CCPA Banner Audit",
        "description": "Scan public privacy pages and cookie banners for GDPR/CCPA compliance signals. Returns one clean compliance summary row per site with banner detection, consent framework identification, policy freshness, and recommended actions.",
        "version": "0.1",
        "x-build-id": "vo6xisdAwudF4gAqY"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/taroyamada~privacy-cookie-compliance-scanner/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-taroyamada-privacy-cookie-compliance-scanner",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/taroyamada~privacy-cookie-compliance-scanner/runs": {
            "post": {
                "operationId": "runs-sync-taroyamada-privacy-cookie-compliance-scanner",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/taroyamada~privacy-cookie-compliance-scanner/run-sync": {
            "post": {
                "operationId": "run-sync-taroyamada-privacy-cookie-compliance-scanner",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "sites"
                ],
                "properties": {
                    "sites": {
                        "title": "Sites to scan",
                        "type": "array",
                        "description": "List of sites to scan. Each entry requires a homepageUrl; privacyPolicyUrl and cookiePolicyUrl are auto-discovered if omitted. Quickstart: begin with one site. Maximum 100 per run."
                    },
                    "delivery": {
                        "title": "Delivery mode",
                        "enum": [
                            "dataset",
                            "webhook"
                        ],
                        "type": "string",
                        "description": "Starter path: dataset keeps the first run low-friction. Advanced path: webhook sends the same payload to your endpoint for compliance ops or agency handoff.",
                        "default": "dataset"
                    },
                    "webhookUrl": {
                        "title": "Webhook URL",
                        "type": "string",
                        "description": "Advanced delivery only: required when delivery is webhook. Must be a valid http(s) URL."
                    },
                    "snapshotKey": {
                        "title": "Snapshot key for recurring checks",
                        "type": "string",
                        "description": "Keep this stable when moving from the quickstart to recurring compliance monitoring so policy drift stays comparable run to run.",
                        "default": "privacy-cookie-compliance-snapshots"
                    },
                    "concurrency": {
                        "title": "Concurrency",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Parallel site checks. Keep at 1-2 for quickstart runs; increase for larger compliance portfolios.",
                        "default": 2
                    },
                    "batchDelayMs": {
                        "title": "Batch delay (ms)",
                        "minimum": 0,
                        "maximum": 5000,
                        "type": "integer",
                        "description": "Pause between batches to keep scans polite and avoid rate limiting.",
                        "default": 500
                    },
                    "requestTimeoutSecs": {
                        "title": "Request timeout (seconds)",
                        "minimum": 5,
                        "maximum": 60,
                        "type": "integer",
                        "description": "Per-request timeout for fetching homepage, privacy policy, and cookie policy pages.",
                        "default": 20
                    },
                    "followRedirects": {
                        "title": "Follow redirects",
                        "type": "boolean",
                        "description": "Follow HTTP redirects before scanning pages so canonical URLs are evaluated correctly.",
                        "default": true
                    },
                    "dryRun": {
                        "title": "Dry run",
                        "type": "boolean",
                        "description": "Preview scan results without saving snapshots, dataset rows, or sending webhooks.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
