Pricing

from $10.00 / 1,000 results

⚖️ HHS Data Breach Scraper

Extract newly reported HIPAA security incidents from the HHS Wall of Shame. Target fresh legal leads and cybersecurity prospects using exact breach counts and entity details.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

naoki anzai

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Data Breach Disclosure Monitor | HIPAA Breach Watch

Transform the Department of Health and Human Services (HHS) OCR "Wall of Shame" into a reliable, automated pipeline of actionable intelligence with this specialized data breach disclosure monitor. Legal professionals seeking class-action lawsuit opportunities, cybersecurity vendors looking for highly targeted outreach prospects, and risk analysts assessing healthcare market threats can instantly extract critical security incidents directly from official government pages. Instead of manually checking the portal for newly reported HIPAA violations, this scraper navigates the site to pull highly structured data, capturing affected entity names, precise breach counts, incident types, and comprehensive executive summaries.

Manually tracking these data breach disclosures is tedious and prone to human error, often resulting in missed opportunities or delayed responses. By automating the web extraction process, you can schedule regular daily or weekly runs to identify fresh leads the moment they are publicly disclosed. The scraped data includes essential contact context and verifiable, audit-ready documentation for every incident you search. Whether you need to feed a legal CRM with new compromised healthcare providers or trigger automated alerts for your cybersecurity sales team, this scraper provides the exact details required to act quickly. Retrieve comprehensive evidence and raw response headers alongside the incident summaries to ensure your outreach and risk models are backed by verified government data.

Store Quickstart

Run this actor with your target input. Results appear in the Apify Dataset and can be piped to webhooks for real-time delivery. Use dryRun to validate before committing to a schedule.

Key Features

🛡️ Compliance-first — Produces audit-ready reports mapping findings to standards (WCAG, GDPR, SOC2)
🔒 Non-invasive scanning — Uses only observable public signals — no intrusive probing
📊 Severity-scored output — Each finding rated for criticality with remediation guidance
📡 Delta-alerting — Flag new findings since last run via webhook delivery
📋 Evidence export — Raw headers/responses captured for compliance documentation

Use Cases

Who	Why
Developers	Automate recurring data fetches without building custom scrapers
Data teams	Pipe structured output into analytics warehouses
Ops teams	Monitor changes via webhook alerts
Product managers	Track competitor/market signals without engineering time

Input

Field	Type	Default	Description
lookbackDays	integer	`30`	How many days back from today to include breach submissions. Breaches submitted before this window are ignored. Default
entityKeywords	string	`""`	Comma-separated keywords to filter breaches by covered entity name. Case-insensitive substring match. Leave empty to inc
stateFilter	string	`""`	Comma-separated two-letter US state codes to filter breaches (e.g. 'CA,TX,NY'). Leave empty to include all states.
minIndividualsAffected	integer	`500`	Only include breaches affecting at least this many individuals. The HHS portal only lists breaches affecting 500+, so va
maxBreachesInEvidence	integer	`50`	Maximum number of individual breach records to include in the evidence array. Keeps output manageable.
watchTerms	string	`""`	Comma-separated terms that trigger actionNeeded=true when found in breach entity names or breach types. Use for competit
requestTimeoutSeconds	integer	`45`	HTTP timeout for the HHS portal request.
delivery	string	`"dataset"`	dataset: write digest to Apify dataset. webhook: POST digest to a URL.

Input Example

{
  "lookbackDays": 30,
  "entityKeywords": "",
  "stateFilter": "",
  "minIndividualsAffected": 500,
  "maxBreachesInEvidence": 50,
  "watchTerms": "",
  "requestTimeoutSeconds": 45,
  "delivery": "dataset",
  "datasetMode": "all",
  "notifyOnNoChange": false,
  "snapshotKey": "hhs-breach-monitor-snapshots",
  "dryRun": false
}

Input Examples

Example: California recent breaches

{
  "states": [
    "CA"
  ],
  "sinceDays": 30
}

Example: Multi-state coverage

{
  "states": [
    "CA",
    "NY",
    "MD",
    "WA"
  ],
  "sinceDays": 90
}

Example: Specific entity watchlist

{
  "states": [
    "CA"
  ],
  "entityKeywords": [
    "healthcare",
    "insurance"
  ]
}

Output

Field	Type	Description
`queryId`	string
`source`	string
`checkedAt`	timestamp
`windowStart`	string
`windowEnd`	string
`executiveSummary`	string
`status`	string
`actionNeeded`	boolean
`totalBreachCount`	number
`newBreachCount`	number
`totalIndividualsAffected`	number
`breachTypeSummary`	object
`topEntities`	array
`watchTermHits`	array
`recommendedActions`	array
`changedSinceLastRun`	boolean
`evidence`	array
`meta`	object
`topEntities[].entityName`	string
`topEntities[].state`	string
`topEntities[].individualsAffected`	number
`topEntities[].breachType`	string
`topEntities[].submissionDate`	string

Output Example

{
  "queryId": "hhs-breach-30d",
  "source": "hhs_ocr_breach_portal",
  "checkedAt": "2025-06-15T12:00:00.000Z",
  "windowStart": "2025-05-16",
  "windowEnd": "2025-06-15",
  "executiveSummary": "18 HIPAA breaches reported in the last 30 days affecting 142,853 individuals. 3 new since last run. 1 watch-term hit: UnitedHealth Group (54,200 affected).",
  "status": "action_needed",
  "actionNeeded": true,
  "totalBreachCount": 18,
  "newBreachCount": 3,
  "totalIndividualsAffected": 142853,
  "breachTypeSummary": {
    "Hacking/IT Incident": 12,
    "Unauthorized Access/Disclosure": 4,
    "Theft": 1,
    "Loss": 1
  },
  "topEntities": [
    {
      "entityName": "UnitedHealth Group",
      "state": "MN",
      "individualsAffected": 54200,
      "breachType": "Hacking/IT Incident",
      "submissionDate": "2025-06-10"
    },
    {
      "entityName": "Regional Medical Center of San Jose",
      "state": "CA",
      "individualsAffected": 32100,
      "breachType": "Hacking/IT Incident",
      "submissionDate": "2025-06-05"
    },
    {
      "entityName": "Blue Cross of Texas",
      "state": "TX",
      "individualsAffected": 21000,
      "breachType": "Unauthorized Access/Disclosure",
      "submissionDate": "2025-05-28"
    }

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~data-breach-disclosure-monitor/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "lookbackDays": 30, "entityKeywords": "", "stateFilter": "", "minIndividualsAffected": 500, "maxBreachesInEvidence": 50, "watchTerms": "", "requestTimeoutSeconds": 45, "delivery": "dataset", "datasetMode": "all", "notifyOnNoChange": false, "snapshotKey": "hhs-breach-monitor-snapshots", "dryRun": false }'

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/data-breach-disclosure-monitor").call(run_input={
  "lookbackDays": 30,
  "entityKeywords": "",
  "stateFilter": "",
  "minIndividualsAffected": 500,
  "maxBreachesInEvidence": 50,
  "watchTerms": "",
  "requestTimeoutSeconds": 45,
  "delivery": "dataset",
  "datasetMode": "all",
  "notifyOnNoChange": false,
  "snapshotKey": "hhs-breach-monitor-snapshots",
  "dryRun": false
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/data-breach-disclosure-monitor').call({
  "lookbackDays": 30,
  "entityKeywords": "",
  "stateFilter": "",
  "minIndividualsAffected": 500,
  "maxBreachesInEvidence": 50,
  "watchTerms": "",
  "requestTimeoutSeconds": 45,
  "delivery": "dataset",
  "datasetMode": "all",
  "notifyOnNoChange": false,
  "snapshotKey": "hhs-breach-monitor-snapshots",
  "dryRun": false
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

Schedule weekly runs against your production domains to catch config drift.
Use webhook delivery to pipe findings into your SIEM (Splunk, Datadog, Elastic).
For CI integration, block releases on critical severity findings using exit codes.
Combine with ssl-certificate-monitor for layered cert + headers coverage.
Findings include links to official remediation docs — share with dev teams via the webhook payload.

FAQ

Is running this against a third-party site legal?

Passive public-header scanning is generally permitted, but follow your own compliance policies. Only scan sites you have authorization for.

How often should I scan?

Weekly for production domains; daily if you have high config-change velocity.

Can I export to a compliance tool?

Use webhook delivery or Dataset API — formats map well to Drata, Vanta, OneTrust import templates.

Is this a penetration test?

No — this actor performs passive compliance scanning only. No exploitation, fuzzing, or auth bypass.

Does this qualify as a SOC2 control?

This actor produces evidence artifacts suitable for SOC2 CC7.1 (continuous monitoring). It is not itself a SOC2 certification.

Security & Compliance cluster — explore related Apify tools:

Privacy & Cookie Compliance Scanner | GDPR / CCPA Banner Audit — Scan public privacy pages and cookie banners for GDPR/CCPA compliance signals.
Security Headers Checker API | OWASP Audit — Bulk-audit websites for OWASP security headers, grade each response, and monitor header changes between runs.
SSL Certificate Monitor API | Expiry + Issuer Changes — Check SSL/TLS certificates in bulk, detect expiry and issuer changes, and emit alert-ready rows for ops and SEO teams.
DNS / SPF / DKIM / DMARC Audit API — Bulk-audit domains for SPF, DKIM, DMARC, MX, and email-auth posture with grades and fix-ready recommendations.
robots.txt AI Policy Monitor | GPTBot ClaudeBot — Detect GPTBot, ClaudeBot, Google-Extended, and other AI crawler policies in robots.
WCAG Accessibility Checker API | ADA & EAA Compliance Audit — Audit websites for WCAG 2.
📜 Open-Source License & Dependency Audit API — Audit npm packages for license risk, dependency depth, maintainer activity, and compliance posture.
Trust Center & Subprocessor Monitor API — Monitor vendor trust centers, subprocessor lists, DPA updates, and security posture changes.

Cost

Pay Per Event:

actor-start: $0.01 (flat fee per run)
dataset-item: $0.003 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01

No subscription required — you only pay for what you use.

⭐ Was this helpful?

If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.

Bug report or feature request? Open an issue on the Issues tab of this actor.

HHS Data Breach Scraper

automation-lab/hhs-data-breach-scraper

Extract public HIPAA breach reports from the HHS OCR portal for compliance monitoring, cybersecurity research, and legal lead workflows.

Stas Persiianenko

HHS OCR HIPAA Breach Delta Monitor

zentrafoundry/hhs-ocr-hipaa-breach-delta-monitor

Track new and updated HIPAA breaches by entity, business associate, vendor, state, breach type, and affected-individual threshold.

Zentra

CA Data Breach Notification Scraper

jungle_synthesizer/ca-oag-data-breach-notification-scraper

Scrapes the California Attorney General's SB 24 data-breach notification registry. Returns all reported breaches (organization, breach date, reported date, report URL) with optional detail-page enrichment for consumer notice letter text and PDF links.

BowTiedRaccoon

Have I Been Pwned Breaches Catalog Scraper

parseforge/hibp-breaches-catalog-scraper

Pull the entire Have I Been Pwned breach catalog with company logos, breach dates, account counts, and the categories of data exposed like email addresses, passwords, and IP addresses. Filter by domain or fetch one breach by name. Built for breach awareness and security research.

ParseForge

Email Data Breach Checker

lofomachines/email-breach-checker

Check if your email has been compromised in a data breach. Scan one or hundreds of emails in bulk to find leaked passwords, exposed accounts, and stolen credentials. Get a full risk score, breach history, and affected sites — fast, accurate, and affordable.

Lofomachines

133

Hibp Breaches Scraper

velvety_bedbug/hibp-breaches-scraper

Scrape the public data breach database from HaveIBeenPwned. Returns breach name, domain, date, number of accounts compromised, and exposed data types. Filter by domain. Free public API, no auth required.

Peters Bugs

HHS OIG Exclusions List Scraper

automation-lab/hhs-oig-exclusions-list-scraper

Download official HHS OIG LEIE exclusion records and monthly supplements for healthcare compliance, credentialing, and vendor screening.

Stas Persiianenko

Credential Breach Checker

clearcheck.io/credential-breach-checker

Check emails, phones, passwords, names, and social IDs against breach and leak intelligence sources. Useful for security reviews, fraud prevention, KYC support, and investigative workflows. Results are decision-support signals.

Clearcheck Labs

221

5.0

Texas Data Breach Reports Scraper

defenestrator/texas-data-breach-reports

Scrape the official public Texas Attorney General Data Security Breach Reports listing into structured rows for compliance monitoring, cybersecurity research, and public-record analysis. Unofficial; not affiliated with the Texas Attorney General or State of Texas.

Defenestrator

Security Breach Scraper (Saad Belcaid)

belcaidsaad/security-breach-scraper-saad-belcaid

A company can hide budget, hide priorities, but cannot hide a HIPAA breach once it hits that portal. So this scraper turns a public compliance record into a live demand signal.