CA Data Breach Notification Scraper avatar

CA Data Breach Notification Scraper

Pricing

Pay per event

Go to Apify Store
CA Data Breach Notification Scraper

CA Data Breach Notification Scraper

Scrapes the California Attorney General's SB 24 data-breach notification registry. Returns all reported breaches (organization, breach date, reported date, report URL) with optional detail-page enrichment for consumer notice letter text and PDF links.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Categories

Share

Scrapes the California Attorney General's SB 24 Data Breach Notification Registry — the authoritative public feed of California data breach notices under Cal. Civ. Code § 1798.82. Over 5,000 breach filings since 2012, updated continuously as companies file new SB 24 notices.

What it does

The AG breach list is a server-rendered Drupal Views table. All records are delivered in a single HTML response — no pagination, no JavaScript required. This actor:

  1. Fetches the listing page and extracts all breach records (organization, breach date(s), reported date, report URL)
  2. Optionally enriches each record by fetching the individual report detail page for the consumer notification letter text and the sample notice PDF link

Output fields

FieldDescription
organization_nameCompany or entity that filed the breach notification
breach_datesDate(s) of the breach — comma-separated when multiple dates reported
reported_dateDate the notice was posted to the AG list
report_urlFull URL to the report detail page on oag.ca.gov
report_idSB 24 report identifier (e.g. sb24-625166)
notice_letter_textConsumer notification letter body text (requires fetch_details: true)
sample_notice_pdf_urlURL to the sample notice PDF (requires fetch_details: true)
affected_individualsNumber of affected individuals, derived from notice text
breach_typeDerived keyword: ransomware, phishing, vendor, unauthorized_access, etc.
first_seenISO-8601 timestamp when this record was first scraped

Input options

ParameterTypeDefaultDescription
maxItemsinteger10Maximum number of breach records to return. Leave empty for all records (~5,000+)
fetch_detailsbooleanfalseWhen true, fetches each report's detail page for notice text and PDF links

Use cases

  • Cyber-insurance underwriting — weekly delta runs to identify newly reported breaches for risk modeling
  • Breach litigation — CA is the #1 breach class-action jurisdiction; plaintiff/defense firms track new filings
  • Threat intelligence — identify breach patterns by type (ransomware, phishing, vendor) and affected sector
  • Compliance monitoring — enterprises monitoring whether their vendors have filed breach notices
  • Journalism & research — structured access to the complete regulatory breach history

Scheduling

The CA AG list updates whenever a new SB 24 notice is filed (typically several per week). Recommended run cadence for delta monitoring: daily or weekly. Use the first_seen field as a change-tracking cursor — filter for records where first_seen is after your last run timestamp.

Technical notes

  • No proxy required — CA gov site is datacenter-accessible with no anti-bot protection
  • Full listing response is ~3.4 MB (5,000+ rows in one HTML page)
  • Detail fetch mode runs at concurrency 5 with rate limiting enabled
  • Memory: 512 MB (sufficient for the full listing parse)