CA Data Breach Notification Scraper
Pricing
Pay per event
CA Data Breach Notification Scraper
Scrapes the California Attorney General's SB 24 data-breach notification registry. Returns all reported breaches (organization, breach date, reported date, report URL) with optional detail-page enrichment for consumer notice letter text and PDF links.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Share
Scrapes the California Attorney General's SB 24 Data Breach Notification Registry — the authoritative public feed of California data breach notices under Cal. Civ. Code § 1798.82. Over 5,000 breach filings since 2012, updated continuously as companies file new SB 24 notices.
What it does
The AG breach list is a server-rendered Drupal Views table. All records are delivered in a single HTML response — no pagination, no JavaScript required. This actor:
- Fetches the listing page and extracts all breach records (organization, breach date(s), reported date, report URL)
- Optionally enriches each record by fetching the individual report detail page for the consumer notification letter text and the sample notice PDF link
Output fields
| Field | Description |
|---|---|
organization_name | Company or entity that filed the breach notification |
breach_dates | Date(s) of the breach — comma-separated when multiple dates reported |
reported_date | Date the notice was posted to the AG list |
report_url | Full URL to the report detail page on oag.ca.gov |
report_id | SB 24 report identifier (e.g. sb24-625166) |
notice_letter_text | Consumer notification letter body text (requires fetch_details: true) |
sample_notice_pdf_url | URL to the sample notice PDF (requires fetch_details: true) |
affected_individuals | Number of affected individuals, derived from notice text |
breach_type | Derived keyword: ransomware, phishing, vendor, unauthorized_access, etc. |
first_seen | ISO-8601 timestamp when this record was first scraped |
Input options
| Parameter | Type | Default | Description |
|---|---|---|---|
maxItems | integer | 10 | Maximum number of breach records to return. Leave empty for all records (~5,000+) |
fetch_details | boolean | false | When true, fetches each report's detail page for notice text and PDF links |
Use cases
- Cyber-insurance underwriting — weekly delta runs to identify newly reported breaches for risk modeling
- Breach litigation — CA is the #1 breach class-action jurisdiction; plaintiff/defense firms track new filings
- Threat intelligence — identify breach patterns by type (ransomware, phishing, vendor) and affected sector
- Compliance monitoring — enterprises monitoring whether their vendors have filed breach notices
- Journalism & research — structured access to the complete regulatory breach history
Scheduling
The CA AG list updates whenever a new SB 24 notice is filed (typically several per week). Recommended run cadence for delta monitoring: daily or weekly. Use the first_seen field as a change-tracking cursor — filter for records where first_seen is after your last run timestamp.
Technical notes
- No proxy required — CA gov site is datacenter-accessible with no anti-bot protection
- Full listing response is ~3.4 MB (5,000+ rows in one HTML page)
- Detail fetch mode runs at concurrency 5 with rate limiting enabled
- Memory: 512 MB (sufficient for the full listing parse)