Pricing

from $0.02 / 1,000 breach report saveds

HHS Data Breach Scraper

Extract public HIPAA breach reports from the HHS OCR portal for compliance monitoring, cybersecurity research, and legal lead workflows.

Pricing

from $0.02 / 1,000 breach report saveds

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

4 days ago

Last modified

What does HHS Data Breach Scraper do?

HHS Data Breach Scraper collects rows from the public U.S. Department of Health and Human Services Office for Civil Rights breach portal. It turns the public HIPAA breach report table into clean JSON records for monitoring, compliance dashboards, legal lead generation, and cybersecurity research.

Who is it for?

🏥 Healthcare compliance teams monitoring newly reported HIPAA breaches.
🛡️ Cybersecurity vendors tracking healthcare incidents and affected organizations.
⚖️ Legal and insurance teams building breach-response lead lists.
📊 Data teams maintaining internal breach intelligence dashboards.
🧾 Consultants preparing recurring reports for covered entities and business associates.

Why use this actor?

The HHS OCR portal is public, but the data is exposed through a JSF/PrimeFaces table that is inconvenient to automate manually. This actor handles the session, ViewState token, and report-table pagination, then emits typed records that are ready for export.

Data source

The actor uses the public HHS OCR Breach Portal:

https://ocrportal.hhs.gov/ocr/breach/breach_report.jsf

No login, private account, or captcha is required for the public report table.

Data fields

Field	Description
`coveredEntity`	Name of the covered entity in the HHS table
`state`	State or territory abbreviation
`coveredEntityType`	Covered entity type such as Healthcare Provider or Business Associate
`individualsAffected`	Number of affected individuals as an integer
`breachSubmissionDate`	Submission date normalized to `YYYY-MM-DD`
`breachSubmissionDateRaw`	Original HHS `MM/DD/YYYY` date
`typeOfBreach`	Breach type list
`locationOfBreachedInformation`	Breached information location list
`businessAssociatePresent`	Boolean value from the HHS hidden column
`webDescription`	Optional web description column when HHS provides it
`hhsBreachId`	HHS table row key
`sourceUrl`	HHS report page URL
`scrapedAt`	Timestamp when the row was saved

How much does it cost to scrape HHS data breach reports?

The actor uses pay-per-event pricing. There is a small start fee for each run and a per-record fee for each breach report saved. Use a small maxItems value for quick checks and larger values for scheduled backfills.

Input options

maxItems — maximum number of breach rows to save.
startPage — zero-based HHS report page to start from.
state — optional state abbreviation filter.
coveredEntityQuery — optional case-insensitive covered-entity name filter.
includeWebDescription — include the hidden web description field when available.

Example input

{
  "maxItems": 100,
  "startPage": 0,
  "state": "",
  "coveredEntityQuery": "",
  "includeWebDescription": true
}

Example output

{
  "coveredEntity": "JASON R EGBERT OD PC",
  "state": "WA",
  "coveredEntityType": "Healthcare Provider",
  "individualsAffected": 1225,
  "breachSubmissionDate": "2026-06-02",
  "breachSubmissionDateRaw": "06/02/2026",
  "typeOfBreach": ["Hacking/IT Incident"],
  "locationOfBreachedInformation": ["Network Server"],
  "businessAssociatePresent": true,
  "webDescription": null,
  "hhsBreachId": "1453895",
  "sourceUrl": "https://ocrportal.hhs.gov/ocr/breach/breach_report_hip.jsf",
  "scrapedAt": "2026-06-21T03:04:29.531Z"
}

How to run

Open the actor on Apify.
Set maxItems to the number of breach rows you need.
Optionally add a state or coveredEntityQuery filter.
Start the run.
Export the dataset as JSON, CSV, Excel, or via API.

Monitoring workflow

Schedule the actor daily or weekly with maxItems set to 100 or 200. Compare new hhsBreachId values against your previous dataset to detect newly disclosed breach reports.

Compliance workflow

Compliance teams can use the output to enrich internal registers with affected-count totals, breach type, covered entity type, and submission date. The normalized fields reduce manual cleanup before loading the data into spreadsheets or BI tools.

Cybersecurity workflow

Security vendors can monitor healthcare breach disclosures, prioritize incidents by affected individuals, and identify covered entities that may need response services.

Lead generation workflow

Legal, insurance, and consulting teams can filter by state or entity name, then combine the results with CRM enrichment and outreach tools.

Tips

Start with maxItems: 100 for the newest portal page.
Use startPage for older pages when backfilling.
Keep scheduled runs conservative; HHS is a public government portal.
Use hhsBreachId to de-duplicate records across runs.
Use breachSubmissionDate for chronological sorting.

Limitations

The actor extracts the public report table as provided by HHS. If HHS changes JSF component names or the table structure, the actor may need an update. Filters are applied after fetching rows from the portal page, so very narrow filters may require a higher maxItems or startPage strategy.

Integrations

Export JSON to a data lake for breach intelligence.
Send CSV output to a compliance analyst.
Trigger alerts when a new hhsBreachId appears.
Join by coveredEntity with enrichment providers.
Use the Apify API to feed dashboards.

API usage with Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('automation-lab/hhs-data-breach-scraper').call({
  maxItems: 100,
  includeWebDescription: true
});
console.log(run.defaultDatasetId);

API usage with Python

from apify_client import ApifyClient
import os

client = ApifyClient(os.environ['APIFY_TOKEN'])
run = client.actor('automation-lab/hhs-data-breach-scraper').call(run_input={
    'maxItems': 100,
    'includeWebDescription': True,
})
print(run['defaultDatasetId'])

API usage with cURL

curl -X POST "https://api.apify.com/v2/acts/automation-lab~hhs-data-breach-scraper/runs?token=$APIFY_TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"maxItems":100,"includeWebDescription":true}'

MCP usage

Use this actor from Apify MCP with:

https://mcp.apify.com/?tools=automation-lab/hhs-data-breach-scraper

Claude Code setup:

$claude mcp add apify-hhs-breaches https://mcp.apify.com/?tools=automation-lab/hhs-data-breach-scraper

Claude Desktop JSON config:

{
  "mcpServers": {
    "apify-hhs-breaches": {
      "url": "https://mcp.apify.com/?tools=automation-lab/hhs-data-breach-scraper"
    }
  }
}

Example prompts:

"Run the HHS data breach scraper for the newest 100 reports and summarize the largest incidents."
"Find California HIPAA breach reports from the latest HHS OCR page."
"Compare today's HHS breach IDs with yesterday's dataset."

Dataset exports

Apify datasets can be downloaded as JSON, CSV, Excel, XML, RSS, or HTML. For recurring monitoring, use the dataset API and store the latest hhsBreachId values in your own system.

Legality and responsible use

This actor collects publicly available government records from the HHS OCR Breach Portal. Always use the data responsibly and follow applicable privacy, compliance, and outreach rules. The actor does not bypass access controls or collect private account data.

Troubleshooting

If a run returns fewer items than expected, increase maxItems or remove narrow filters. If HHS changes its JSF table, open an issue with the run ID and logs so the extractor can be updated.

Automation Lab also builds public-data and compliance-focused Apify actors. Use this actor alongside future security-header, trust-center, privacy, and government-record scrapers for broader risk monitoring.

FAQ

Does this actor need proxies?

No proxy is required for the public HHS OCR report table in normal operation.

Can it scrape all historical rows?

Yes, use a higher maxItems value. The actor paginates the PrimeFaces report table in 100-row batches.

Can I filter by state?

Yes. Set state to a two-letter abbreviation such as CA or TX.

Can I monitor only new breaches?

Yes. Schedule the actor and compare new runs against previously stored hhsBreachId values.

Is this official HHS data?

The actor extracts the public HHS OCR breach report table, but the actor itself is not affiliated with or endorsed by HHS.

Changelog

Initial version: HTTP-only JSF extraction for the public HHS OCR HIPAA breach report table.

⚖️ HHS Data Breach Scraper

taroyamada/data-breach-disclosure-monitor

Extract newly reported HIPAA security incidents from the HHS Wall of Shame. Target fresh legal leads and cybersecurity prospects using exact breach counts and entity details.

naoki anzai

HHS OCR HIPAA Breach Delta Monitor

zentrafoundry/hhs-ocr-hipaa-breach-delta-monitor

Track new and updated HIPAA breaches by entity, business associate, vendor, state, breach type, and affected-individual threshold.

Zentra

Texas Data Breach Reports Scraper

defenestrator/texas-data-breach-reports

Scrape the official public Texas Attorney General Data Security Breach Reports listing into structured rows for compliance monitoring, cybersecurity research, and public-record analysis. Unofficial; not affiliated with the Texas Attorney General or State of Texas.

Defenestrator

Have I Been Pwned Breaches Catalog Scraper

parseforge/hibp-breaches-catalog-scraper

Pull the entire Have I Been Pwned breach catalog with company logos, breach dates, account counts, and the categories of data exposed like email addresses, passwords, and IP addresses. Filter by domain or fetch one breach by name. Built for breach awareness and security research.

ParseForge

Email Data Breach Checker

lofomachines/email-breach-checker

Check if your email has been compromised in a data breach. Scan one or hundreds of emails in bulk to find leaked passwords, exposed accounts, and stolen credentials. Get a full risk score, breach history, and affected sites — fast, accurate, and affordable.

Lofomachines

128

HHS OIG Exclusions List Scraper

automation-lab/hhs-oig-exclusions-list-scraper

Download official HHS OIG LEIE exclusion records and monthly supplements for healthcare compliance, credentialing, and vendor screening.

Stas Persiianenko

CA Data Breach Notification Scraper

jungle_synthesizer/ca-oag-data-breach-notification-scraper

Scrapes the California Attorney General's SB 24 data-breach notification registry. Returns all reported breaches (organization, breach date, reported date, report URL) with optional detail-page enrichment for consumer notice letter text and PDF links.

BowTiedRaccoon

Hibp Breaches Scraper

velvety_bedbug/hibp-breaches-scraper

Scrape the public data breach database from HaveIBeenPwned. Returns breach name, domain, date, number of accounts compromised, and exposed data types. Filter by domain. Free public API, no auth required.

Peters Bugs

Security Breach Scraper (Saad Belcaid)

belcaidsaad/security-breach-scraper-saad-belcaid

A company can hide budget, hide priorities, but cannot hide a HIPAA breach once it hits that portal. So this scraper turns a public compliance record into a live demand signal.

Saad Belcaid

Credential Breach Checker

clearcheck.io/credential-breach-checker

Check emails, phones, passwords, names, and social IDs against breach and leak intelligence sources. Useful for security reviews, fraud prevention, KYC support, and investigative workflows. Results are decision-support signals.

Clearcheck Labs

205

5.0

HHS Data Breach Scraper

What does HHS Data Breach Scraper do?

Who is it for?

Why use this actor?

Data source

Data fields

How much does it cost to scrape HHS data breach reports?

Input options

Example input

Example output

How to run

Monitoring workflow

Compliance workflow

Cybersecurity workflow

Lead generation workflow

Tips

Limitations

Integrations

API usage with Node.js

API usage with Python

API usage with cURL

MCP usage

Dataset exports

Legality and responsible use

Troubleshooting

Related scrapers

FAQ

Does this actor need proxies?

Can it scrape all historical rows?

Can I filter by state?

Can I monitor only new breaches?

Is this official HHS data?

Changelog

You might also like

⚖️ HHS Data Breach Scraper

HHS OCR HIPAA Breach Delta Monitor

Texas Data Breach Reports Scraper

Have I Been Pwned Breaches Catalog Scraper

Email Data Breach Checker

HHS OIG Exclusions List Scraper

CA Data Breach Notification Scraper

Hibp Breaches Scraper

Security Breach Scraper (Saad Belcaid)

Credential Breach Checker