Pricing

Pay per usage

🔍 Privacy Policy Scraper

Extract compliance data from arbitrary websites to identify GDPR/CCPA violations. Generate audit reports to find leads lacking proper cookie banners.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

太郎山田

Actor stats

Bookmarked

Total users

Monthly active users

a day ago

Last modified

Store Quickstart

Run this actor with your target input. Results appear in the Apify Dataset and can be piped to webhooks for real-time delivery. Use dryRun to validate before committing to a schedule.

Key Features

🛡️ Compliance-first — Produces audit-ready reports mapping findings to standards (WCAG, GDPR, SOC2)
🔒 Non-invasive scanning — Uses only observable public signals — no intrusive probing
📊 Severity-scored output — Each finding rated for criticality with remediation guidance
📡 Delta-alerting — Flag new findings since last run via webhook delivery
📋 Evidence export — Raw headers/responses captured for compliance documentation

Use Cases

Who	Why
Developers	Automate recurring data fetches without building custom scrapers
Data teams	Pipe structured output into analytics warehouses
Ops teams	Monitor changes via webhook alerts
Product managers	Track competitor/market signals without engineering time

Input

Field	Type	Default	Description
sites	array	prefilled	List of sites to scan. Each entry requires a homepageUrl; privacyPolicyUrl and cookiePolicyUrl are auto-discovered if om
delivery	string	`"dataset"`	Starter path: dataset keeps the first run low-friction. Advanced path: webhook sends the same payload to your endpoint f
webhookUrl	string	—	Advanced delivery only: required when delivery is webhook. Must be a valid http(s) URL.
snapshotKey	string	`"privacy-cookie-compliance-snapshots"`	Keep this stable when moving from the quickstart to recurring compliance monitoring so policy drift stays comparable run
concurrency	integer	`2`	Parallel site checks. Keep at 1-2 for quickstart runs; increase for larger compliance portfolios.
batchDelayMs	integer	`500`	Pause between batches to keep scans polite and avoid rate limiting.
requestTimeoutSecs	integer	`20`	Per-request timeout for fetching homepage, privacy policy, and cookie policy pages.
followRedirects	boolean	`true`	Follow HTTP redirects before scanning pages so canonical URLs are evaluated correctly.

Input Example

{
  "sites": [
    {
      "homepageUrl": "https://vercel.com",
      "privacyPolicyUrl": "https://vercel.com/legal/privacy-policy",
      "cookiePolicyUrl": "",
      "region": "EU",
      "consentMode": ""
    }
  ],
  "delivery": "dataset",
  "snapshotKey": "privacy-cookie-compliance-snapshots",
  "concurrency": 2,
  "batchDelayMs": 500,
  "requestTimeoutSecs": 20,
  "followRedirects": true,
  "dryRun": false
}

Output

Field	Type	Description
`meta`	object
`alerts`	array
`results`	array
`alerts[].siteUrl`	string (url)
`alerts[].severity`	string
`alerts[].type`	string
`alerts[].message`	string

Output Example

{
  "meta": {
    "generatedAt": "2026-06-01T12:00:00.000Z",
    "totals": {
      "total": 2,
      "initial": 0,
      "ok": 1,
      "changed": 1,
      "error": 0,
      "compliant": 1,
      "partial": 1,
      "non_compliant": 1,
      "unknown": 0
    },
    "severityCounts": {
      "critical": 0,
      "high": 1,
      "watch": 1,
      "info": 0
    },
    "alertCount": 3,
    "executiveSummary": {
      "overallStatus": "attention_needed",
      "brief": "2 of 2 site(s) need compliance attention. Top issue: Cookie banner presence changed since last run.",
      "totals": {
        "total": 2,
        "initial": 0,
        "ok": 1,
        "changed": 1,
        "error": 0,
        "compliant": 1,
        "partial": 1,
        "non_compliant": 1,
        "unknown": 0
      },
      "severityCounts": {
        "critical": 0,
        "high": 1,
        "watch": 1,
        "info": 0

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~privacy-cookie-compliance-scanner/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "sites": [ { "homepageUrl": "https://vercel.com", "privacyPolicyUrl": "https://vercel.com/legal/privacy-policy", "cookiePolicyUrl": "", "region": "EU", "consentMode": "" } ], "delivery": "dataset", "snapshotKey": "privacy-cookie-compliance-snapshots", "concurrency": 2, "batchDelayMs": 500, "requestTimeoutSecs": 20, "followRedirects": true, "dryRun": false }'

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/privacy-cookie-compliance-scanner").call(run_input={
  "sites": [
    {
      "homepageUrl": "https://vercel.com",
      "privacyPolicyUrl": "https://vercel.com/legal/privacy-policy",
      "cookiePolicyUrl": "",
      "region": "EU",
      "consentMode": ""
    }
  ],
  "delivery": "dataset",
  "snapshotKey": "privacy-cookie-compliance-snapshots",
  "concurrency": 2,
  "batchDelayMs": 500,
  "requestTimeoutSecs": 20,
  "followRedirects": true,
  "dryRun": false
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/privacy-cookie-compliance-scanner').call({
  "sites": [
    {
      "homepageUrl": "https://vercel.com",
      "privacyPolicyUrl": "https://vercel.com/legal/privacy-policy",
      "cookiePolicyUrl": "",
      "region": "EU",
      "consentMode": ""
    }
  ],
  "delivery": "dataset",
  "snapshotKey": "privacy-cookie-compliance-snapshots",
  "concurrency": 2,
  "batchDelayMs": 500,
  "requestTimeoutSecs": 20,
  "followRedirects": true,
  "dryRun": false
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

Schedule weekly runs against your production domains to catch config drift.
Use webhook delivery to pipe findings into your SIEM (Splunk, Datadog, Elastic).
For CI integration, block releases on critical severity findings using exit codes.
Combine with ssl-certificate-monitor for layered cert + headers coverage.
Findings include links to official remediation docs — share with dev teams via the webhook payload.

FAQ

Is running this against a third-party site legal?

Passive public-header scanning is generally permitted, but follow your own compliance policies. Only scan sites you have authorization for.

How often should I scan?

Weekly for production domains; daily if you have high config-change velocity.

Can I export to a compliance tool?

Use webhook delivery or Dataset API — formats map well to Drata, Vanta, OneTrust import templates.

Is this a penetration test?

No — this actor performs passive compliance scanning only. No exploitation, fuzzing, or auth bypass.

Does this qualify as a SOC2 control?

This actor produces evidence artifacts suitable for SOC2 CC7.1 (continuous monitoring). It is not itself a SOC2 certification.

Security & Compliance cluster — explore related Apify tools:

Security Headers Checker API | OWASP Audit — Bulk-audit websites for OWASP security headers, grade each response, and monitor header changes between runs.
SSL Certificate Monitor API | Expiry + Issuer Changes — Check SSL/TLS certificates in bulk, detect expiry and issuer changes, and emit alert-ready rows for ops and SEO teams.
DNS / SPF / DKIM / DMARC Audit API — Bulk-audit domains for SPF, DKIM, DMARC, MX, and email-auth posture with grades and fix-ready recommendations.
robots.txt AI Policy Monitor | GPTBot ClaudeBot — Detect GPTBot, ClaudeBot, Google-Extended, and other AI crawler policies in robots.
Data Breach Disclosure Monitor | HIPAA Breach Watch — Monitor the HHS OCR Breach Portal for new HIPAA data breach disclosures.
WCAG Accessibility Checker API | ADA & EAA Compliance Audit — Audit websites for WCAG 2.
📜 Open-Source License & Dependency Audit API — Audit npm packages for license risk, dependency depth, maintainer activity, and compliance posture.
Trust Center & Subprocessor Monitor API — Monitor vendor trust centers, subprocessor lists, DPA updates, and security posture changes.

Cost

Pay Per Event:

actor-start: $0.01 (flat fee per run)
dataset-item: $0.003 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01

No subscription required — you only pay for what you use.

Privacy Policy Detector

automation-lab/privacy-policy-detector

This actor scans websites for legal and compliance pages — privacy policy, terms of service, cookie policy, disclaimer, and accessibility statements. It also detects cookie consent banners, GDPR mentions, and CCPA references. Useful for compliance audits and vendor assessments.

Stas Persiianenko

GDPR & Privacy Cookie Scanner

andok/gdpr-cookie-scanner

Scan websites to identify tracking cookies and third-party scripts. Automate privacy compliance and GDPR audits.

Andok

The Privacy Policy Generator

anointment/the-privacy-policy-generator

Save $500 on legal fees. Generates a professional, GDPR & CCPA compliant Privacy Policy for your website or app in seconds. Returns clean Markdown/HTML ready to copy-paste.

Anointment

Privacy Compliance Analyzer V.1

actor_researcher.48/privacy-compliance-analyzer-v-1

Scan websites for privacy compliance issues. Detects trackers, checks GDPR/CCPA rights, finds privacy policies, and generates DSAR templates. Get actionable compliance scores and recommendations in an easy-to-read HTML report. Perfect for privacy audits and regulatory assessments.

ANIRBAN ROY

Gdpr Compliance API

vivid_astronaut/gdpr-compliance

Fabio Suizu

Twitter B2b Email Scraper

scrapier/twitter-b2b-email-scraper

🐦 Twitter/X B2B Email Scraper finds publicly available business emails from profiles, bios & linked sites—at scale. ⚡ Enrich leads, filter by keywords/locations, and export to CSV/CRM. ✅ Ideal for B2B sales, growth & recruiting. Compliance-first (GDPR/CCPA).

Scrapier

Cookie Consent Checker

pillowy_travel/cookie-consent-checker

Checks whether a website displays a cookie consent banner and basic GDPR compliance signals.

SAHIL KUMAR

Web Accessibility Checker

louisdeconinck/web-accessibility-checker

Audit your entire website for WCAG 2.1 & 2.2 compliance using the industry-standard axe-core engine. Automatically crawl pages, detect accessibility violations, and generate detailed actionable reports to ensure legal compliance and improve SEO.

Louis Deconinck

Privacy Stack

bikram786/privacy-stack

Privacy researcher & developer building production Apify actors for arXiv privacy research. Privacy Stack brings 1 5,00+ real arXiv privacy papers into one place ..carefully verified with no fake URLs & no duplicates. Categories : Internet Privacy Data Privacy Crypto Privacy Post-Quantum Privacy