⚖️ HHS Data Breach Scraper avatar

⚖️ HHS Data Breach Scraper

Pricing

from $10.00 / 1,000 results

Go to Apify Store
⚖️ HHS Data Breach Scraper

⚖️ HHS Data Breach Scraper

Extract newly reported HIPAA security incidents from the HHS Wall of Shame. Target fresh legal leads and cybersecurity prospects using exact breach counts and entity details.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

太郎 山田

太郎 山田

Maintained by Community

Actor stats

1

Bookmarked

5

Total users

3

Monthly active users

a day ago

Last modified

Share

Data Breach Disclosure Monitor | HIPAA Breach Watch

Transform the Department of Health and Human Services (HHS) OCR "Wall of Shame" into a reliable, automated pipeline of actionable intelligence with this specialized data breach disclosure monitor. Legal professionals seeking class-action lawsuit opportunities, cybersecurity vendors looking for highly targeted outreach prospects, and risk analysts assessing healthcare market threats can instantly extract critical security incidents directly from official government pages. Instead of manually checking the portal for newly reported HIPAA violations, this scraper navigates the site to pull highly structured data, capturing affected entity names, precise breach counts, incident types, and comprehensive executive summaries.

Manually tracking these data breach disclosures is tedious and prone to human error, often resulting in missed opportunities or delayed responses. By automating the web extraction process, you can schedule regular daily or weekly runs to identify fresh leads the moment they are publicly disclosed. The scraped data includes essential contact context and verifiable, audit-ready documentation for every incident you search. Whether you need to feed a legal CRM with new compromised healthcare providers or trigger automated alerts for your cybersecurity sales team, this scraper provides the exact details required to act quickly. Retrieve comprehensive evidence and raw response headers alongside the incident summaries to ensure your outreach and risk models are backed by verified government data.

Store Quickstart

Run this actor with your target input. Results appear in the Apify Dataset and can be piped to webhooks for real-time delivery. Use dryRun to validate before committing to a schedule.

Key Features

  • 🛡️ Compliance-first — Produces audit-ready reports mapping findings to standards (WCAG, GDPR, SOC2)
  • 🔒 Non-invasive scanning — Uses only observable public signals — no intrusive probing
  • 📊 Severity-scored output — Each finding rated for criticality with remediation guidance
  • 📡 Delta-alerting — Flag new findings since last run via webhook delivery
  • 📋 Evidence export — Raw headers/responses captured for compliance documentation

Use Cases

WhoWhy
DevelopersAutomate recurring data fetches without building custom scrapers
Data teamsPipe structured output into analytics warehouses
Ops teamsMonitor changes via webhook alerts
Product managersTrack competitor/market signals without engineering time

Input

FieldTypeDefaultDescription
lookbackDaysinteger30How many days back from today to include breach submissions. Breaches submitted before this window are ignored. Default
entityKeywordsstring""Comma-separated keywords to filter breaches by covered entity name. Case-insensitive substring match. Leave empty to inc
stateFilterstring""Comma-separated two-letter US state codes to filter breaches (e.g. 'CA,TX,NY'). Leave empty to include all states.
minIndividualsAffectedinteger500Only include breaches affecting at least this many individuals. The HHS portal only lists breaches affecting 500+, so va
maxBreachesInEvidenceinteger50Maximum number of individual breach records to include in the evidence array. Keeps output manageable.
watchTermsstring""Comma-separated terms that trigger actionNeeded=true when found in breach entity names or breach types. Use for competit
requestTimeoutSecondsinteger45HTTP timeout for the HHS portal request.
deliverystring"dataset"dataset: write digest to Apify dataset. webhook: POST digest to a URL.

Input Example

{
"lookbackDays": 30,
"entityKeywords": "",
"stateFilter": "",
"minIndividualsAffected": 500,
"maxBreachesInEvidence": 50,
"watchTerms": "",
"requestTimeoutSeconds": 45,
"delivery": "dataset",
"datasetMode": "all",
"notifyOnNoChange": false,
"snapshotKey": "hhs-breach-monitor-snapshots",
"dryRun": false
}

Input Examples

Example: California recent breaches

{
"states": [
"CA"
],
"sinceDays": 30
}

Example: Multi-state coverage

{
"states": [
"CA",
"NY",
"MD",
"WA"
],
"sinceDays": 90
}

Example: Specific entity watchlist

{
"states": [
"CA"
],
"entityKeywords": [
"healthcare",
"insurance"
]
}

Output

FieldTypeDescription
queryIdstring
sourcestring
checkedAttimestamp
windowStartstring
windowEndstring
executiveSummarystring
statusstring
actionNeededboolean
totalBreachCountnumber
newBreachCountnumber
totalIndividualsAffectednumber
breachTypeSummaryobject
topEntitiesarray
watchTermHitsarray
recommendedActionsarray
changedSinceLastRunboolean
evidencearray
metaobject
topEntities[].entityNamestring
topEntities[].statestring
topEntities[].individualsAffectednumber
topEntities[].breachTypestring
topEntities[].submissionDatestring

Output Example

{
"queryId": "hhs-breach-30d",
"source": "hhs_ocr_breach_portal",
"checkedAt": "2025-06-15T12:00:00.000Z",
"windowStart": "2025-05-16",
"windowEnd": "2025-06-15",
"executiveSummary": "18 HIPAA breaches reported in the last 30 days affecting 142,853 individuals. 3 new since last run. 1 watch-term hit: UnitedHealth Group (54,200 affected).",
"status": "action_needed",
"actionNeeded": true,
"totalBreachCount": 18,
"newBreachCount": 3,
"totalIndividualsAffected": 142853,
"breachTypeSummary": {
"Hacking/IT Incident": 12,
"Unauthorized Access/Disclosure": 4,
"Theft": 1,
"Loss": 1
},
"topEntities": [
{
"entityName": "UnitedHealth Group",
"state": "MN",
"individualsAffected": 54200,
"breachType": "Hacking/IT Incident",
"submissionDate": "2025-06-10"
},
{
"entityName": "Regional Medical Center of San Jose",
"state": "CA",
"individualsAffected": 32100,
"breachType": "Hacking/IT Incident",
"submissionDate": "2025-06-05"
},
{
"entityName": "Blue Cross of Texas",
"state": "TX",
"individualsAffected": 21000,
"breachType": "Unauthorized Access/Disclosure",
"submissionDate": "2025-05-28"
}

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~data-breach-disclosure-monitor/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "lookbackDays": 30, "entityKeywords": "", "stateFilter": "", "minIndividualsAffected": 500, "maxBreachesInEvidence": 50, "watchTerms": "", "requestTimeoutSeconds": 45, "delivery": "dataset", "datasetMode": "all", "notifyOnNoChange": false, "snapshotKey": "hhs-breach-monitor-snapshots", "dryRun": false }'

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/data-breach-disclosure-monitor").call(run_input={
"lookbackDays": 30,
"entityKeywords": "",
"stateFilter": "",
"minIndividualsAffected": 500,
"maxBreachesInEvidence": 50,
"watchTerms": "",
"requestTimeoutSeconds": 45,
"delivery": "dataset",
"datasetMode": "all",
"notifyOnNoChange": false,
"snapshotKey": "hhs-breach-monitor-snapshots",
"dryRun": false
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/data-breach-disclosure-monitor').call({
"lookbackDays": 30,
"entityKeywords": "",
"stateFilter": "",
"minIndividualsAffected": 500,
"maxBreachesInEvidence": 50,
"watchTerms": "",
"requestTimeoutSeconds": 45,
"delivery": "dataset",
"datasetMode": "all",
"notifyOnNoChange": false,
"snapshotKey": "hhs-breach-monitor-snapshots",
"dryRun": false
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

  • Schedule weekly runs against your production domains to catch config drift.
  • Use webhook delivery to pipe findings into your SIEM (Splunk, Datadog, Elastic).
  • For CI integration, block releases on critical severity findings using exit codes.
  • Combine with ssl-certificate-monitor for layered cert + headers coverage.
  • Findings include links to official remediation docs — share with dev teams via the webhook payload.

FAQ

Is running this against a third-party site legal?

Passive public-header scanning is generally permitted, but follow your own compliance policies. Only scan sites you have authorization for.

How often should I scan?

Weekly for production domains; daily if you have high config-change velocity.

Can I export to a compliance tool?

Use webhook delivery or Dataset API — formats map well to Drata, Vanta, OneTrust import templates.

Is this a penetration test?

No — this actor performs passive compliance scanning only. No exploitation, fuzzing, or auth bypass.

Does this qualify as a SOC2 control?

This actor produces evidence artifacts suitable for SOC2 CC7.1 (continuous monitoring). It is not itself a SOC2 certification.

Security & Compliance cluster — explore related Apify tools:

Cost

Pay Per Event:

  • actor-start: $0.01 (flat fee per run)
  • dataset-item: $0.003 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01

No subscription required — you only pay for what you use.

⭐ Was this helpful?

If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.

Bug report or feature request? Open an issue on the Issues tab of this actor.