HHS Data Breach Scraper avatar

HHS Data Breach Scraper

Pricing

from $0.02 / 1,000 breach report saveds

Go to Apify Store
HHS Data Breach Scraper

HHS Data Breach Scraper

Extract public HIPAA breach reports from the HHS OCR portal for compliance monitoring, cybersecurity research, and legal lead workflows.

Pricing

from $0.02 / 1,000 breach report saveds

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

5 days ago

Last modified

Categories

Share

Extract public HIPAA breach report rows from the HHS OCR Breach Portal.

What does HHS Data Breach Scraper do?

HHS Data Breach Scraper collects rows from the public U.S. Department of Health and Human Services Office for Civil Rights breach portal. It turns the public HIPAA breach report table into clean JSON records for monitoring, compliance dashboards, legal lead generation, and cybersecurity research.

Who is it for?

  • ๐Ÿฅ Healthcare compliance teams monitoring newly reported HIPAA breaches.
  • ๐Ÿ›ก๏ธ Cybersecurity vendors tracking healthcare incidents and affected organizations.
  • โš–๏ธ Legal and insurance teams building breach-response lead lists.
  • ๐Ÿ“Š Data teams maintaining internal breach intelligence dashboards.
  • ๐Ÿงพ Consultants preparing recurring reports for covered entities and business associates.

Why use this actor?

The HHS OCR portal is public, but the data is exposed through a JSF/PrimeFaces table that is inconvenient to automate manually. This actor handles the session, ViewState token, and report-table pagination, then emits typed records that are ready for export.

Data source

The actor uses the public HHS OCR Breach Portal:

https://ocrportal.hhs.gov/ocr/breach/breach_report.jsf

No login, private account, or captcha is required for the public report table.

Data fields

FieldDescription
coveredEntityName of the covered entity in the HHS table
stateState or territory abbreviation
coveredEntityTypeCovered entity type such as Healthcare Provider or Business Associate
individualsAffectedNumber of affected individuals as an integer
breachSubmissionDateSubmission date normalized to YYYY-MM-DD
breachSubmissionDateRawOriginal HHS MM/DD/YYYY date
typeOfBreachBreach type list
locationOfBreachedInformationBreached information location list
businessAssociatePresentBoolean value from the HHS hidden column
webDescriptionOptional web description column when HHS provides it
hhsBreachIdHHS table row key
sourceUrlHHS report page URL
scrapedAtTimestamp when the row was saved

How much does it cost to scrape HHS data breach reports?

The actor uses pay-per-event pricing. There is a small start fee for each run and a per-record fee for each breach report saved. Use a small maxItems value for quick checks and larger values for scheduled backfills.

Input options

  • maxItems โ€” maximum number of breach rows to save.
  • startPage โ€” zero-based HHS report page to start from.
  • state โ€” optional state abbreviation filter.
  • coveredEntityQuery โ€” optional case-insensitive covered-entity name filter.
  • includeWebDescription โ€” include the hidden web description field when available.

Example input

{
"maxItems": 100,
"startPage": 0,
"state": "",
"coveredEntityQuery": "",
"includeWebDescription": true
}

Example output

{
"coveredEntity": "JASON R EGBERT OD PC",
"state": "WA",
"coveredEntityType": "Healthcare Provider",
"individualsAffected": 1225,
"breachSubmissionDate": "2026-06-02",
"breachSubmissionDateRaw": "06/02/2026",
"typeOfBreach": ["Hacking/IT Incident"],
"locationOfBreachedInformation": ["Network Server"],
"businessAssociatePresent": true,
"webDescription": null,
"hhsBreachId": "1453895",
"sourceUrl": "https://ocrportal.hhs.gov/ocr/breach/breach_report_hip.jsf",
"scrapedAt": "2026-06-21T03:04:29.531Z"
}

How to run

  1. Open the actor on Apify.
  2. Set maxItems to the number of breach rows you need.
  3. Optionally add a state or coveredEntityQuery filter.
  4. Start the run.
  5. Export the dataset as JSON, CSV, Excel, or via API.

Monitoring workflow

Schedule the actor daily or weekly with maxItems set to 100 or 200. Compare new hhsBreachId values against your previous dataset to detect newly disclosed breach reports.

Compliance workflow

Compliance teams can use the output to enrich internal registers with affected-count totals, breach type, covered entity type, and submission date. The normalized fields reduce manual cleanup before loading the data into spreadsheets or BI tools.

Cybersecurity workflow

Security vendors can monitor healthcare breach disclosures, prioritize incidents by affected individuals, and identify covered entities that may need response services.

Lead generation workflow

Legal, insurance, and consulting teams can filter by state or entity name, then combine the results with CRM enrichment and outreach tools.

Tips

  • Start with maxItems: 100 for the newest portal page.
  • Use startPage for older pages when backfilling.
  • Keep scheduled runs conservative; HHS is a public government portal.
  • Use hhsBreachId to de-duplicate records across runs.
  • Use breachSubmissionDate for chronological sorting.

Limitations

The actor extracts the public report table as provided by HHS. If HHS changes JSF component names or the table structure, the actor may need an update. Filters are applied after fetching rows from the portal page, so very narrow filters may require a higher maxItems or startPage strategy.

Integrations

  • Export JSON to a data lake for breach intelligence.
  • Send CSV output to a compliance analyst.
  • Trigger alerts when a new hhsBreachId appears.
  • Join by coveredEntity with enrichment providers.
  • Use the Apify API to feed dashboards.

API usage with Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('automation-lab/hhs-data-breach-scraper').call({
maxItems: 100,
includeWebDescription: true
});
console.log(run.defaultDatasetId);

API usage with Python

from apify_client import ApifyClient
import os
client = ApifyClient(os.environ['APIFY_TOKEN'])
run = client.actor('automation-lab/hhs-data-breach-scraper').call(run_input={
'maxItems': 100,
'includeWebDescription': True,
})
print(run['defaultDatasetId'])

API usage with cURL

curl -X POST "https://api.apify.com/v2/acts/automation-lab~hhs-data-breach-scraper/runs?token=$APIFY_TOKEN" \
-H 'Content-Type: application/json' \
-d '{"maxItems":100,"includeWebDescription":true}'

MCP usage

Use this actor from Apify MCP with:

https://mcp.apify.com/?tools=automation-lab/hhs-data-breach-scraper

Claude Code setup:

$claude mcp add apify-hhs-breaches https://mcp.apify.com/?tools=automation-lab/hhs-data-breach-scraper

Claude Desktop JSON config:

{
"mcpServers": {
"apify-hhs-breaches": {
"url": "https://mcp.apify.com/?tools=automation-lab/hhs-data-breach-scraper"
}
}
}

Example prompts:

  • "Run the HHS data breach scraper for the newest 100 reports and summarize the largest incidents."
  • "Find California HIPAA breach reports from the latest HHS OCR page."
  • "Compare today's HHS breach IDs with yesterday's dataset."

Dataset exports

Apify datasets can be downloaded as JSON, CSV, Excel, XML, RSS, or HTML. For recurring monitoring, use the dataset API and store the latest hhsBreachId values in your own system.

Legality and responsible use

This actor collects publicly available government records from the HHS OCR Breach Portal. Always use the data responsibly and follow applicable privacy, compliance, and outreach rules. The actor does not bypass access controls or collect private account data.

Troubleshooting

If a run returns fewer items than expected, increase maxItems or remove narrow filters. If HHS changes its JSF table, open an issue with the run ID and logs so the extractor can be updated.

Automation Lab also builds public-data and compliance-focused Apify actors. Use this actor alongside future security-header, trust-center, privacy, and government-record scrapers for broader risk monitoring.

FAQ

Does this actor need proxies?

No proxy is required for the public HHS OCR report table in normal operation.

Can it scrape all historical rows?

Yes, use a higher maxItems value. The actor paginates the PrimeFaces report table in 100-row batches.

Can I filter by state?

Yes. Set state to a two-letter abbreviation such as CA or TX.

Can I monitor only new breaches?

Yes. Schedule the actor and compare new runs against previously stored hhsBreachId values.

Is this official HHS data?

The actor extracts the public HHS OCR breach report table, but the actor itself is not affiliated with or endorsed by HHS.

Changelog

  • Initial version: HTTP-only JSF extraction for the public HHS OCR HIPAA breach report table.