🔍 Privacy Policy Scraper
Pricing
Pay per usage
🔍 Privacy Policy Scraper
Extract compliance data from arbitrary websites to identify GDPR/CCPA violations. Generate audit reports to find leads lacking proper cookie banners.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
太郎 山田
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Privacy & Cookie Compliance Scanner | GDPR / CCPA Banner Audit
Scan public privacy pages and cookie banners for GDPR/CCPA compliance signals. Returns one clean compliance summary row per site with banner detection, consent framework identification, policy freshness, and recommended actions.
Store Quickstart
Run this actor with your target input. Results appear in the Apify Dataset and can be piped to webhooks for real-time delivery. Use dryRun to validate before committing to a schedule.
Key Features
- 🛡️ Compliance-first — Produces audit-ready reports mapping findings to standards (WCAG, GDPR, SOC2)
- 🔒 Non-invasive scanning — Uses only observable public signals — no intrusive probing
- 📊 Severity-scored output — Each finding rated for criticality with remediation guidance
- 📡 Delta-alerting — Flag new findings since last run via webhook delivery
- 📋 Evidence export — Raw headers/responses captured for compliance documentation
Use Cases
| Who | Why |
|---|---|
| Developers | Automate recurring data fetches without building custom scrapers |
| Data teams | Pipe structured output into analytics warehouses |
| Ops teams | Monitor changes via webhook alerts |
| Product managers | Track competitor/market signals without engineering time |
Input
| Field | Type | Default | Description |
|---|---|---|---|
| sites | array | prefilled | List of sites to scan. Each entry requires a homepageUrl; privacyPolicyUrl and cookiePolicyUrl are auto-discovered if om |
| delivery | string | "dataset" | Starter path: dataset keeps the first run low-friction. Advanced path: webhook sends the same payload to your endpoint f |
| webhookUrl | string | — | Advanced delivery only: required when delivery is webhook. Must be a valid http(s) URL. |
| snapshotKey | string | "privacy-cookie-compliance-snapshots" | Keep this stable when moving from the quickstart to recurring compliance monitoring so policy drift stays comparable run |
| concurrency | integer | 2 | Parallel site checks. Keep at 1-2 for quickstart runs; increase for larger compliance portfolios. |
| batchDelayMs | integer | 500 | Pause between batches to keep scans polite and avoid rate limiting. |
| requestTimeoutSecs | integer | 20 | Per-request timeout for fetching homepage, privacy policy, and cookie policy pages. |
| followRedirects | boolean | true | Follow HTTP redirects before scanning pages so canonical URLs are evaluated correctly. |
Input Example
{"sites": [{"homepageUrl": "https://vercel.com","privacyPolicyUrl": "https://vercel.com/legal/privacy-policy","cookiePolicyUrl": "","region": "EU","consentMode": ""}],"delivery": "dataset","snapshotKey": "privacy-cookie-compliance-snapshots","concurrency": 2,"batchDelayMs": 500,"requestTimeoutSecs": 20,"followRedirects": true,"dryRun": false}
Output
| Field | Type | Description |
|---|---|---|
meta | object | |
alerts | array | |
results | array | |
alerts[].siteUrl | string (url) | |
alerts[].severity | string | |
alerts[].type | string | |
alerts[].message | string |
Output Example
{"meta": {"generatedAt": "2026-06-01T12:00:00.000Z","totals": {"total": 2,"initial": 0,"ok": 1,"changed": 1,"error": 0,"compliant": 1,"partial": 1,"non_compliant": 1,"unknown": 0},"severityCounts": {"critical": 0,"high": 1,"watch": 1,"info": 0},"alertCount": 3,"executiveSummary": {"overallStatus": "attention_needed","brief": "2 of 2 site(s) need compliance attention. Top issue: Cookie banner presence changed since last run.","totals": {"total": 2,"initial": 0,"ok": 1,"changed": 1,"error": 0,"compliant": 1,"partial": 1,"non_compliant": 1,"unknown": 0},"severityCounts": {"critical": 0,"high": 1,"watch": 1,"info": 0
API Usage
Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.
cURL
curl -X POST "https://api.apify.com/v2/acts/taroyamada~privacy-cookie-compliance-scanner/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{ "sites": [ { "homepageUrl": "https://vercel.com", "privacyPolicyUrl": "https://vercel.com/legal/privacy-policy", "cookiePolicyUrl": "", "region": "EU", "consentMode": "" } ], "delivery": "dataset", "snapshotKey": "privacy-cookie-compliance-snapshots", "concurrency": 2, "batchDelayMs": 500, "requestTimeoutSecs": 20, "followRedirects": true, "dryRun": false }'
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("taroyamada/privacy-cookie-compliance-scanner").call(run_input={"sites": [{"homepageUrl": "https://vercel.com","privacyPolicyUrl": "https://vercel.com/legal/privacy-policy","cookiePolicyUrl": "","region": "EU","consentMode": ""}],"delivery": "dataset","snapshotKey": "privacy-cookie-compliance-snapshots","concurrency": 2,"batchDelayMs": 500,"requestTimeoutSecs": 20,"followRedirects": true,"dryRun": false})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
JavaScript / Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('taroyamada/privacy-cookie-compliance-scanner').call({"sites": [{"homepageUrl": "https://vercel.com","privacyPolicyUrl": "https://vercel.com/legal/privacy-policy","cookiePolicyUrl": "","region": "EU","consentMode": ""}],"delivery": "dataset","snapshotKey": "privacy-cookie-compliance-snapshots","concurrency": 2,"batchDelayMs": 500,"requestTimeoutSecs": 20,"followRedirects": true,"dryRun": false});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Tips & Limitations
- Schedule weekly runs against your production domains to catch config drift.
- Use webhook delivery to pipe findings into your SIEM (Splunk, Datadog, Elastic).
- For CI integration, block releases on
criticalseverity findings using exit codes. - Combine with
ssl-certificate-monitorfor layered cert + headers coverage. - Findings include links to official remediation docs — share with dev teams via the webhook payload.
FAQ
Is running this against a third-party site legal?
Passive public-header scanning is generally permitted, but follow your own compliance policies. Only scan sites you have authorization for.
How often should I scan?
Weekly for production domains; daily if you have high config-change velocity.
Can I export to a compliance tool?
Use webhook delivery or Dataset API — formats map well to Drata, Vanta, OneTrust import templates.
Is this a penetration test?
No — this actor performs passive compliance scanning only. No exploitation, fuzzing, or auth bypass.
Does this qualify as a SOC2 control?
This actor produces evidence artifacts suitable for SOC2 CC7.1 (continuous monitoring). It is not itself a SOC2 certification.
Related Actors
Security & Compliance cluster — explore related Apify tools:
- Security Headers Checker API | OWASP Audit — Bulk-audit websites for OWASP security headers, grade each response, and monitor header changes between runs.
- SSL Certificate Monitor API | Expiry + Issuer Changes — Check SSL/TLS certificates in bulk, detect expiry and issuer changes, and emit alert-ready rows for ops and SEO teams.
- DNS / SPF / DKIM / DMARC Audit API — Bulk-audit domains for SPF, DKIM, DMARC, MX, and email-auth posture with grades and fix-ready recommendations.
- robots.txt AI Policy Monitor | GPTBot ClaudeBot — Detect GPTBot, ClaudeBot, Google-Extended, and other AI crawler policies in robots.
- Data Breach Disclosure Monitor | HIPAA Breach Watch — Monitor the HHS OCR Breach Portal for new HIPAA data breach disclosures.
- WCAG Accessibility Checker API | ADA & EAA Compliance Audit — Audit websites for WCAG 2.
- 📜 Open-Source License & Dependency Audit API — Audit npm packages for license risk, dependency depth, maintainer activity, and compliance posture.
- Trust Center & Subprocessor Monitor API — Monitor vendor trust centers, subprocessor lists, DPA updates, and security posture changes.
Cost
Pay Per Event:
actor-start: $0.01 (flat fee per run)dataset-item: $0.003 per output item
Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01
No subscription required — you only pay for what you use.
