HHS Data Breach Scraper
Pricing
from $0.02 / 1,000 breach report saveds
HHS Data Breach Scraper
Extract public HIPAA breach reports from the HHS OCR portal for compliance monitoring, cybersecurity research, and legal lead workflows.
Pricing
from $0.02 / 1,000 breach report saveds
Rating
0.0
(0)
Developer
Stas Persiianenko
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
Extract public HIPAA breach report rows from the HHS OCR Breach Portal.
What does HHS Data Breach Scraper do?
HHS Data Breach Scraper collects rows from the public U.S. Department of Health and Human Services Office for Civil Rights breach portal. It turns the public HIPAA breach report table into clean JSON records for monitoring, compliance dashboards, legal lead generation, and cybersecurity research.
Who is it for?
- ๐ฅ Healthcare compliance teams monitoring newly reported HIPAA breaches.
- ๐ก๏ธ Cybersecurity vendors tracking healthcare incidents and affected organizations.
- โ๏ธ Legal and insurance teams building breach-response lead lists.
- ๐ Data teams maintaining internal breach intelligence dashboards.
- ๐งพ Consultants preparing recurring reports for covered entities and business associates.
Why use this actor?
The HHS OCR portal is public, but the data is exposed through a JSF/PrimeFaces table that is inconvenient to automate manually. This actor handles the session, ViewState token, and report-table pagination, then emits typed records that are ready for export.
Data source
The actor uses the public HHS OCR Breach Portal:
https://ocrportal.hhs.gov/ocr/breach/breach_report.jsf
No login, private account, or captcha is required for the public report table.
Data fields
| Field | Description |
|---|---|
coveredEntity | Name of the covered entity in the HHS table |
state | State or territory abbreviation |
coveredEntityType | Covered entity type such as Healthcare Provider or Business Associate |
individualsAffected | Number of affected individuals as an integer |
breachSubmissionDate | Submission date normalized to YYYY-MM-DD |
breachSubmissionDateRaw | Original HHS MM/DD/YYYY date |
typeOfBreach | Breach type list |
locationOfBreachedInformation | Breached information location list |
businessAssociatePresent | Boolean value from the HHS hidden column |
webDescription | Optional web description column when HHS provides it |
hhsBreachId | HHS table row key |
sourceUrl | HHS report page URL |
scrapedAt | Timestamp when the row was saved |
How much does it cost to scrape HHS data breach reports?
The actor uses pay-per-event pricing.
There is a small start fee for each run and a per-record fee for each breach report saved.
Use a small maxItems value for quick checks and larger values for scheduled backfills.
Input options
maxItemsโ maximum number of breach rows to save.startPageโ zero-based HHS report page to start from.stateโ optional state abbreviation filter.coveredEntityQueryโ optional case-insensitive covered-entity name filter.includeWebDescriptionโ include the hidden web description field when available.
Example input
{"maxItems": 100,"startPage": 0,"state": "","coveredEntityQuery": "","includeWebDescription": true}
Example output
{"coveredEntity": "JASON R EGBERT OD PC","state": "WA","coveredEntityType": "Healthcare Provider","individualsAffected": 1225,"breachSubmissionDate": "2026-06-02","breachSubmissionDateRaw": "06/02/2026","typeOfBreach": ["Hacking/IT Incident"],"locationOfBreachedInformation": ["Network Server"],"businessAssociatePresent": true,"webDescription": null,"hhsBreachId": "1453895","sourceUrl": "https://ocrportal.hhs.gov/ocr/breach/breach_report_hip.jsf","scrapedAt": "2026-06-21T03:04:29.531Z"}
How to run
- Open the actor on Apify.
- Set
maxItemsto the number of breach rows you need. - Optionally add a
stateorcoveredEntityQueryfilter. - Start the run.
- Export the dataset as JSON, CSV, Excel, or via API.
Monitoring workflow
Schedule the actor daily or weekly with maxItems set to 100 or 200.
Compare new hhsBreachId values against your previous dataset to detect newly disclosed breach reports.
Compliance workflow
Compliance teams can use the output to enrich internal registers with affected-count totals, breach type, covered entity type, and submission date. The normalized fields reduce manual cleanup before loading the data into spreadsheets or BI tools.
Cybersecurity workflow
Security vendors can monitor healthcare breach disclosures, prioritize incidents by affected individuals, and identify covered entities that may need response services.
Lead generation workflow
Legal, insurance, and consulting teams can filter by state or entity name, then combine the results with CRM enrichment and outreach tools.
Tips
- Start with
maxItems: 100for the newest portal page. - Use
startPagefor older pages when backfilling. - Keep scheduled runs conservative; HHS is a public government portal.
- Use
hhsBreachIdto de-duplicate records across runs. - Use
breachSubmissionDatefor chronological sorting.
Limitations
The actor extracts the public report table as provided by HHS.
If HHS changes JSF component names or the table structure, the actor may need an update.
Filters are applied after fetching rows from the portal page, so very narrow filters may require a higher maxItems or startPage strategy.
Integrations
- Export JSON to a data lake for breach intelligence.
- Send CSV output to a compliance analyst.
- Trigger alerts when a new
hhsBreachIdappears. - Join by
coveredEntitywith enrichment providers. - Use the Apify API to feed dashboards.
API usage with Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: process.env.APIFY_TOKEN });const run = await client.actor('automation-lab/hhs-data-breach-scraper').call({maxItems: 100,includeWebDescription: true});console.log(run.defaultDatasetId);
API usage with Python
from apify_client import ApifyClientimport osclient = ApifyClient(os.environ['APIFY_TOKEN'])run = client.actor('automation-lab/hhs-data-breach-scraper').call(run_input={'maxItems': 100,'includeWebDescription': True,})print(run['defaultDatasetId'])
API usage with cURL
curl -X POST "https://api.apify.com/v2/acts/automation-lab~hhs-data-breach-scraper/runs?token=$APIFY_TOKEN" \-H 'Content-Type: application/json' \-d '{"maxItems":100,"includeWebDescription":true}'
MCP usage
Use this actor from Apify MCP with:
https://mcp.apify.com/?tools=automation-lab/hhs-data-breach-scraper
Claude Code setup:
$claude mcp add apify-hhs-breaches https://mcp.apify.com/?tools=automation-lab/hhs-data-breach-scraper
Claude Desktop JSON config:
{"mcpServers": {"apify-hhs-breaches": {"url": "https://mcp.apify.com/?tools=automation-lab/hhs-data-breach-scraper"}}}
Example prompts:
- "Run the HHS data breach scraper for the newest 100 reports and summarize the largest incidents."
- "Find California HIPAA breach reports from the latest HHS OCR page."
- "Compare today's HHS breach IDs with yesterday's dataset."
Dataset exports
Apify datasets can be downloaded as JSON, CSV, Excel, XML, RSS, or HTML.
For recurring monitoring, use the dataset API and store the latest hhsBreachId values in your own system.
Legality and responsible use
This actor collects publicly available government records from the HHS OCR Breach Portal. Always use the data responsibly and follow applicable privacy, compliance, and outreach rules. The actor does not bypass access controls or collect private account data.
Troubleshooting
If a run returns fewer items than expected, increase maxItems or remove narrow filters.
If HHS changes its JSF table, open an issue with the run ID and logs so the extractor can be updated.
Related scrapers
Automation Lab also builds public-data and compliance-focused Apify actors. Use this actor alongside future security-header, trust-center, privacy, and government-record scrapers for broader risk monitoring.
FAQ
Does this actor need proxies?
No proxy is required for the public HHS OCR report table in normal operation.
Can it scrape all historical rows?
Yes, use a higher maxItems value. The actor paginates the PrimeFaces report table in 100-row batches.
Can I filter by state?
Yes. Set state to a two-letter abbreviation such as CA or TX.
Can I monitor only new breaches?
Yes. Schedule the actor and compare new runs against previously stored hhsBreachId values.
Is this official HHS data?
The actor extracts the public HHS OCR breach report table, but the actor itself is not affiliated with or endorsed by HHS.
Changelog
- Initial version: HTTP-only JSF extraction for the public HHS OCR HIPAA breach report table.