# California DCA Professional License Scraper (`haketa/california-dca-license-scraper`) Actor

Download and parse California professional license data from DCA's public Box folder. 3.3M+ active licenses across 36 boards — pharmacy, nursing, medical, dental, engineering and more. Monthly updated bulk data, no browser needed.

- **URL**: https://apify.com/haketa/california-dca-license-scraper.md
- **Developed by:** [Haketa](https://apify.com/haketa) (community)
- **Categories:** Developer tools, Automation, Other
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.50 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## California DCA Professional License Scraper — Bulk License Lookup for CSLB, BRN, Medical Board, Pharmacy, Dental, Accountancy & 30+ More Boards

> **The most comprehensive California Department of Consumer Affairs (DCA) license extractor on Apify.** Download structured records for **3.3M+ active California professional licenses** across **36 state boards and bureaus** — pharmacists, registered nurses, physicians, dentists, contractors, engineers, accountants, real estate appraisers, cosmetologists and more — from DCA's official monthly bulk data drop on Box. No CAPTCHA juggling, no per-search rate limit, no per-board scraper rewrites.

[![Apify Actor](https://img.shields.io/badge/Apify-Actor-blue)](https://apify.com/haketa/california-dca-license-scraper)
[![Monthly Refresh](https://img.shields.io/badge/Data-Monthly%20Bulk%20Drop-orange)]()
[![Engine](https://img.shields.io/badge/Engine-Box%20Folder%20%2B%20Playwright-green)]()
[![Coverage](https://img.shields.io/badge/Records-3.3M%2B-purple)]()
[![Boards](https://img.shields.io/badge/Boards-36-informational)]()
[![Pay Per Event](https://img.shields.io/badge/Pricing-Pay%20Per%20Event-yellow)]()
[![State](https://img.shields.io/badge/State-California-red)]()
[![No Auth](https://img.shields.io/badge/Authentication-None%20Required-success)]()

---

### What This Actor Does

The **California DCA Professional License Scraper** is a production-grade Apify Actor that downloads, normalises, filters, and serves the **complete public licensing dataset** published by the **California Department of Consumer Affairs (DCA)** — the umbrella agency that regulates **professional licensing for the State of California** across health care, building trades, engineering, accounting, personal care, legal support and more.

DCA publishes its **license database as a monthly bulk drop** on a public **Box.com folder** ([dca.box.com/s/oss6hf8jys2bmgxqd2gdz7w4oepm2il9](https://dca.box.com/s/oss6hf8jys2bmgxqd2gdz7w4oepm2il9)) — one **pipe-delimited file per board or bureau**. Each file is shipped with a non-trivial header that drifts between boards, sometimes ships zipped, and lives behind a JavaScript-rendered Box page that resists naive HTTP fetchers. This actor handles all of that automatically and returns a **single unified JSON schema** across every board, so a downstream pipeline does not have to special-case Pharmacy vs. CSLB vs. Medical vs. Dental.

In a typical run you receive structured records covering:

- **Health care licensees** — Registered Pharmacists, Pharmacy Technicians, Registered Nurses (BRN), Vocational Nurses (BVNPT), Physicians & Surgeons (MBC), Physician Assistants, Dentists & RDAs (DBC), Optometrists, Psychologists, Veterinarians, Chiropractors, Acupuncturists, Naturopathic Doctors, Podiatric Physicians, Respiratory Therapists, Occupational Therapists, Physical Therapists, Behavioral Sciences (LMFT/LCSW/LPCC), Speech-Language Pathologists, Dietitians, Hearing Aid Dispensers
- **Building & engineering trades** — California Contractors State License Board (CSLB) general and specialty contractors, Professional Engineers, Land Surveyors, Professional Geologists, Architects, Landscape Architects
- **Business & finance** — California Board of Accountancy (CPAs and CPA firms), Cemetery & Funeral Bureau, real-estate-adjacent appraisers
- **Personal care** — Board of Barbering & Cosmetology (BBC) licensees and establishments
- **Legal & specialty** — Court Reporters Board, Guide Dogs for the Blind, Athletic Commission, Pilots

Every record carries the issuing **agency code, license type, license number, current status, original issue date, expiration date, full mailing address, county, state, ZIP and licensee or organisation name** — the building blocks for compliance, sales, credentialing, recruiting, due-diligence, location intelligence and research workflows.

---

### Why scrape California DCA yourself when this exists?

DCA publishes the data freely, but actually consuming it at scale is its own engineering project. Common pain points the actor solves out of the box:

- **Box.com folder is JavaScript-rendered** — a plain `curl` against the share URL returns an empty shell. The actor talks to the Box Shared Items API and falls back to a real Chromium download flow when needed.
- **Per-board file naming drifts** — files include spaces, dates, version suffixes and the occasional spelling variant. The actor uses fuzzy keyword matching to find the right file for the board you asked for.
- **Some boards ship ZIPs, some ship raw pipe-delimited text** — large files (CSLB, BRN, BBC) arrive as `.zip` archives. The actor detects the `PK` magic header, extracts the largest non-summary entry, and returns the underlying text without OOM-ing on multi-hundred-megabyte payloads.
- **Pipe-delimited files with shifting headers** — column counts and header spellings vary per board. The actor fuzzy-matches header names, and falls back to **positional mapping** when DCA temporarily ships a file without the canonical header row.
- **CSLB alone has 280K+ active contractors** — at the search UI you would need tens of thousands of paginated lookups; the bulk file solves it in one download.
- **Status fields are inconsistent** (`CURRENT`, `Current`, `current`) — the actor normalises everything to lower-case canonical values that drive a clean enum filter.
- **No NPI, no SSN, no DOB** — the bulk dataset is licensing-only, so no GLBA/HIPAA exposure if you treat it as the public record it is.
- **No daily update API** — DCA only refreshes monthly, so you must redownload everything. The actor makes a fresh pull cheap (no browser-per-search overhead).
- **Address blocks differ** — some boards split `ADDRESS_LINE_1 / ADDRESS_LINE_2`, others ship a single combined block. The actor exposes both fields and never silently drops content.
- **Out-of-state licensees** — mail-order pharmacies, telehealth doctors and out-of-state contractors all hold California licenses with non-CA addresses. The actor exposes a `stateFilter` so you can choose to include or exclude them.
- **No incremental diff** — DCA does not publish change-deltas. Schedule the actor monthly, archive the dataset, and diff yourself.

This actor encapsulates roughly **eight to sixteen hours of one-off engineering** — Box auth quirks, ZIP extraction, header normalisation, positional fallback, status canonicalisation, county filtering — into a single `npm` install of an actor.

---

### Quick Start

#### One-Click Run (Apify UI)

1. Open the actor page → click **"Try for free"**.
2. In the **Boards** field, type one or more board names (e.g. `Board of Pharmacy`, `Contractors State License Board`, `Board of Registered Nursing`). Leave it empty to attempt every board.
3. Optionally narrow by **License Type**, **License Status**, **County** or **State Filter**, then set a sensible **Max Records** cap (default 1,000 — full bulk runs can return millions).
4. Hit **Start**. Within a few minutes your dataset is ready as JSON, CSV, Excel, XML, RSS or HTML.

#### API Run (Python)

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("haketa/california-dca-license-scraper").call(run_input={
    "boards": ["Board of Pharmacy", "Board of Registered Nursing"],
    "licenseStatus": "current",
    "counties": ["Los Angeles", "San Diego", "Orange"],
    "stateFilter": "CA",
    "maxRecords": 5000,
})

for record in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(
        record["agencyName"],
        record["licenseNumber"],
        record.get("lastName") or record.get("organizationName"),
        record["city"],
        record["licenseStatus"],
    )
````

#### API Run (Node.js / TypeScript)

```typescript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('haketa/california-dca-license-scraper').call({
    boards: ['Contractors State License Board'],
    licenseStatus: 'current',
    counties: ['Los Angeles', 'Orange', 'Riverside', 'San Bernardino'],
    stateFilter: 'CA',
    maxRecords: 10000,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Got ${items.length} active SoCal CSLB contractors`);
```

#### API Run (cURL)

```bash
curl -X POST "https://api.apify.com/v2/acts/haketa~california-dca-license-scraper/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "boards": ["Medical Board of California"],
    "licenseStatus": "current",
    "counties": ["San Francisco", "San Mateo", "Alameda", "Santa Clara"],
    "maxRecords": 2000
  }'
```

#### API Run (Apify CLI)

```bash
apify call haketa/california-dca-license-scraper --input='{
  "boards": ["Board of Accountancy"],
  "licenseStatus": "current",
  "stateFilter": "CA",
  "maxRecords": 0
}'
```

***

### How It Works

#### Source of truth

DCA's public bulk-data folder lives at:

```
https://dca.box.com/s/oss6hf8jys2bmgxqd2gdz7w4oepm2il9
```

It contains **one file per board or bureau**, refreshed roughly every month. Most files are **pipe-delimited (`|`) text**; a handful (notably CSLB, BRN, BBC) are shipped as **ZIP archives** containing the same pipe-delimited data plus a small Counts / summary file.

#### Engineering pipeline

| Stage | Technique | Notes |
|---|---|---|
| 1. Folder listing | **Box Shared Items API** with the `BoxApi: shared_link=…` header | Returns each file's `id`, `name`, `size`. Falls back to parsing `Box.prefetchedData` embedded in the share page, then to regex extraction of `{type:"file"}` blobs. |
| 2. File matching | Fuzzy keyword match against the file name | Strips `board of` / `bureau of` / `committee` from the user's input and matches the first remaining token against the file name. |
| 3. Download | **Playwright (Chromium, headless)** navigating the Box folder UI | Box's download endpoint is signed, single-use, and JS-gated; a real browser session is the only reliable extraction path. Cookie consent banners are dismissed automatically. |
| 4. Format detection | First two bytes of the buffer | `0x50 0x4B` triggers ZIP extraction via `adm-zip`; everything else is treated as UTF-8 text. |
| 5. ZIP extraction | `adm-zip` `extractEntryTo` to `/tmp/dca_extract_<ts>/` | Picks the **largest entry that is not `*count*` / `*summary*` / `*readme*`** — that is consistently the actual data file. |
| 6. Header detection | Pipe / tab / comma sniff on first row | Real DCA files use `|`; the parser tolerates the alternatives so historical exports still parse. |
| 7. Header mapping | Fuzzy match on `AGENCY_CODE`, `LICENSE_NUMBER`, `ORG/LAST_NAME`, `ZIP_CODE` etc. | Falls back to **positional mapping** (canonical 21-column DCA layout) when fewer than five header tokens are recognised. |
| 8. Record normalisation | Trim, strip wrapping quotes, null-coalesce empties | Output schema is identical across every board. |
| 9. Category tagging | Keyword-based board → category mapping | Buckets each record into `healthcare`, `engineering`, `business`, `personal_care`, `legal` or `other`. |
| 10. Filtering | Board / license-type / status / county / state in-memory passes | All filters are case-insensitive partial matches except `state`, which is an exact uppercase compare. |
| 11. Dataset push | `Dataset.pushData(record)` | Progress is logged every 500 records. Run terminates cleanly when `maxRecords` is hit. |

#### Notable behaviours

- **Memory:** the actor is configured with **2 GB minimum, 8 GB maximum** because large boards (CSLB, BRN, BBC) ship 50–250 MB files; ZIP extraction goes through disk (`/tmp/`) rather than RAM to keep the runtime stable.
- **Proxy:** Apify Proxy is supported but **not required** — Box's public share endpoints do not rate-limit modest crawler traffic from datacentre IPs. Enable proxy only if you are running many parallel jobs from the same Apify region.
- **Playwright fallback:** if Playwright cannot be imported (e.g. on a stripped-down build), the actor falls back to direct file URL patterns such as `https://data.dca.ca.gov/DCALicenseData/<Board>.csv` for boards that happen to expose them.
- **Deduplication:** DCA's files are already keyed by license number; the actor does not collapse cross-board duplicates because a person can legitimately hold licenses across multiple boards (e.g. an RN who is also an RPh).

***

### Input Parameters

```json
{
  "boards": ["Board of Pharmacy", "Board of Registered Nursing"],
  "licenseTypes": ["Registered Pharmacist", "Registered Nurse"],
  "licenseStatus": "current",
  "counties": ["Los Angeles", "San Diego", "Orange"],
  "stateFilter": "CA",
  "maxRecords": 5000,
  "proxyConfiguration": { "useApifyProxy": true }
}
```

#### Parameter reference

| Parameter | Type | Default | Description |
|---|---|---|---|
| `boards` | `array<string>` | `["Board of Pharmacy"]` (prefilled) | Board / bureau names to download. **Case-insensitive partial match** against the file name. Leave empty to attempt every file in the Box folder. Examples: `Board of Pharmacy`, `Medical Board of California`, `Contractors State License Board`, `Board of Accountancy`, `Dental Board of California`, `Board of Barbering and Cosmetology`. |
| `licenseTypes` | `array<string>` | `[]` | Substring filter on `licenseTypeName`. Examples: `Registered Pharmacist`, `Pharmacy Technician`, `Registered Nurse`, `Vocational Nurse`, `Physician and Surgeon`, `Certified Public Accountant`, `Class A General Engineering Contractor`. Empty array = no type filter. |
| `licenseStatus` | `string` enum | `all` | One of `all`, `current`, `delinquent`, `inactive`, `cancelled`, `expired`. See the Status Reference table below for the meaning of each value. |
| `counties` | `array<string>` | `[]` | Filter by California county. Case-insensitive partial match. Examples: `Los Angeles`, `San Francisco`, `San Diego`, `Orange`, `Sacramento`, `Alameda`, `Santa Clara`, `Riverside`, `San Bernardino`, `Fresno`. |
| `stateFilter` | `string` | `""` (empty = all states) | Two-letter state code. Set to `CA` to exclude out-of-state mail-order, telehealth or out-of-state contractor licensees. |
| `maxRecords` | `integer` | `1000` | Cap total output across all boards. `0` = unlimited. The default 1,000 is a sane testing limit because a single full bulk pull can return millions of records and consume non-trivial compute. |
| `proxyConfiguration` | `object` | `{ "useApifyProxy": true }` (prefilled) | Standard Apify proxy config. Optional — Box does not rate-limit the share endpoint for typical workloads. |

***

### Output Schema

Every record uses the **same flat JSON schema** across every board so downstream consumers do not need per-board branching. Both individual and organisation records share the same envelope; organisation-only or individual-only fields are simply `null` where they do not apply.

#### Field reference

| Field | Type | Always present | Description |
|---|---|---|---|
| `agencyCode` | `string` | yes | DCA-issued 4-letter board / bureau code (e.g. `PHA`, `RN`, `MBC`, `DBC`, `CBA`). |
| `agencyName` | `string` | yes | Full board / bureau name as published by DCA (e.g. `Board of Pharmacy`, `Medical Board of California`). |
| `licenseTypeCode` | `string` | usually | Short type code (e.g. `RPH`, `RN`, `PHY`, `DDS`, `CPA`, `B`). |
| `licenseTypeName` | `string` | usually | Full license-type label (e.g. `Registered Pharmacist`, `Class B General Building Contractor`). |
| `licenseNumber` | `string` | yes | Board-issued license / registration number. **Not always numeric** — some boards prefix the type code. |
| `individualOrOrg` | `string` | usually | `I` for individuals, `O` for organisations (firms, facilities, partnerships). |
| `lastName` | `string` | `null` | individuals | Family name for individual licensees. |
| `firstName` | `string` | `null` | individuals | Given name for individual licensees. |
| `middleName` | `string` | `null` | individuals | Middle name / initial. |
| `suffix` | `string` | `null` | individuals | Suffix such as `JR`, `SR`, `II`, `MD`, `DO`. |
| `organizationName` | `string` | `null` | organisations | DBA / facility / firm name. |
| `addressLine1` | `string` | usually | First line of mailing address as reported to DCA. |
| `addressLine2` | `string` | `null` | sometimes | Suite, unit, building. |
| `city` | `string` | usually | City. |
| `county` | `string` | `null` | CA-resident records | California county. `null` for out-of-state licensees. |
| `state` | `string` | usually | Two-letter US state abbreviation. |
| `zip` | `string` | usually | 5- or 9-digit ZIP. |
| `country` | `string` | usually | Country abbreviation (`USA` for almost all CA licensees). |
| `originalIssueDate` | `string` | usually | Original license issue date. |
| `expirationDate` | `string` | usually | Current expiration date. |
| `licenseStatus` | `string` | yes | Status — see Status Reference below. |
| `licenseCategory` | `string` | yes | Derived bucket: `healthcare`, `engineering`, `business`, `personal_care`, `legal`, `other`. |
| `scrapedAt` | `string` (ISO-8601) | yes | UTC timestamp when this record was emitted. |

#### Example — Registered Pharmacist (Board of Pharmacy)

```json
{
  "agencyCode": "PHA",
  "agencyName": "Board of Pharmacy",
  "licenseTypeCode": "RPH",
  "licenseTypeName": "Registered Pharmacist",
  "licenseNumber": "99999",
  "individualOrOrg": "I",
  "lastName": "RODRIGUEZ",
  "firstName": "MARIA",
  "middleName": "T",
  "suffix": null,
  "organizationName": null,
  "addressLine1": "350 S GRAND AVE",
  "addressLine2": "STE 1200",
  "city": "Los Angeles",
  "county": "Los Angeles",
  "state": "CA",
  "zip": "90071",
  "country": "USA",
  "originalIssueDate": "2014-05-22",
  "expirationDate": "2026-09-30",
  "licenseStatus": "current",
  "licenseCategory": "healthcare",
  "scrapedAt": "2026-05-16T09:00:00.000Z"
}
```

#### Example — CSLB general contractor (Contractors State License Board)

```json
{
  "agencyCode": "CSLB",
  "agencyName": "Contractors State License Board",
  "licenseTypeCode": "B",
  "licenseTypeName": "Class B - General Building Contractor",
  "licenseNumber": "999999",
  "individualOrOrg": "O",
  "lastName": null,
  "firstName": null,
  "middleName": null,
  "suffix": null,
  "organizationName": "PACIFIC COAST BUILDERS INC",
  "addressLine1": "1450 HARBOR BLVD",
  "addressLine2": null,
  "city": "San Diego",
  "county": "San Diego",
  "state": "CA",
  "zip": "92101",
  "country": "USA",
  "originalIssueDate": "2009-11-04",
  "expirationDate": "2026-11-30",
  "licenseStatus": "current",
  "licenseCategory": "engineering",
  "scrapedAt": "2026-05-16T09:00:00.000Z"
}
```

***

### License Status Reference

DCA boards do not share a single status vocabulary. The actor normalises everything to lower-case canonical values, then maps to the six-value enum below.

#### Statuses that signal the licensee may legally operate

| Status | Meaning |
|---|---|
| `current` | License is active, paid up, and in good standing. The default for almost every working California professional. |

#### Statuses that mean the licensee may NOT operate

| Status | Meaning |
|---|---|
| `delinquent` | Renewal payment lapsed; licensee has a grace window to cure but cannot legally practice in the meantime. |
| `inactive` | Voluntarily placed in an inactive bucket — common for retired CPAs, snow-bird physicians, and pharmacists in non-practising roles. |
| `cancelled` | License terminated administratively or by board action. |
| `expired` | License expired and was not renewed within the cure window. |

> **Tip:** Use `licenseStatus: "current"` to receive only practising licensees. The bulk file also contains historical statuses, so a request without a status filter returns the broader population for trend analytics.

***

### DCA Boards & Bureaus Covered

The Box folder ships one file per board / bureau. The actor's category-tagger maps each board to a high-level bucket so downstream stacks can group records without parsing names.

#### Health care

| Board / Bureau | Code | Examples of license types |
|---|---|---|
| Board of Pharmacy | `PHA` | Registered Pharmacist, Pharmacy Technician, Pharmacy, Wholesaler, Compounding facility |
| Board of Registered Nursing (BRN) | `RN` | Registered Nurse, Nurse Practitioner, CNS, CNM, CRNA, Public Health Nurse |
| Board of Vocational Nursing & Psychiatric Technicians (BVNPT) | `LVN` | Licensed Vocational Nurse, Psychiatric Technician |
| Medical Board of California (MBC) | `MBC` | Physician & Surgeon (MD), Podiatric Physician, Midwife, Polysomnographic Tech |
| Physician Assistant Board | `PAB` | Physician Assistant |
| Osteopathic Medical Board | `OMB` | Osteopathic Physician & Surgeon (DO) |
| Dental Board of California (DBC) | `DBC` | Dentist (DDS), Registered Dental Assistant, Oral & Maxillofacial Surgeon |
| Dental Hygiene Board | `DHBC` | Registered Dental Hygienist, RDH in Alternative Practice |
| State Board of Optometry | `OPT` | Optometrist, Spectacle Lens Dispenser |
| Board of Psychology | `PSY` | Licensed Psychologist, Registered Psychology Assistant |
| Veterinary Medical Board | `VMB` | Veterinarian, Registered Veterinary Technician, Veterinary Premises |
| Board of Chiropractic Examiners | `CHIRO` | Doctor of Chiropractic |
| Acupuncture Board | `ACU` | Licensed Acupuncturist |
| Naturopathic Medicine Committee | `NMC` | Naturopathic Doctor |
| Respiratory Care Board | `RCB` | Respiratory Care Practitioner |
| Occupational Therapy Board | `OTB` | Occupational Therapist, COTA |
| Physical Therapy Board | `PTB` | Physical Therapist, PTA |
| Board of Behavioral Sciences (BBS) | `BBS` | LMFT, LCSW, LPCC, LEP, plus their registered associates and trainees |
| Speech-Language Pathology Board | `SLPAB` | SLP, Audiologist, Hearing Aid Dispenser |
| Hearing Aid Dispensers Bureau | `HAD` | Hearing Aid Dispenser |
| Dietetics & Nutrition Board | `DIET` | Registered Dietitian (where state-credentialed) |

#### Building, engineering & design

| Board / Bureau | Code | Examples |
|---|---|---|
| Contractors State License Board (CSLB) | `CSLB` | Class A General Engineering, Class B General Building, Class C specialty contractors (electrical C-10, plumbing C-36, HVAC C-20, roofing C-39, etc.) |
| Board of Professional Engineers, Land Surveyors & Geologists (BPELSG) | `BPELSG` | Civil, Electrical, Mechanical, Structural, Fire Protection, Geotechnical engineers; Land Surveyors; Geologists; Geophysicists |
| California Architects Board | `CAB` | Licensed Architect |
| Landscape Architects Technical Committee | `LATC` | Licensed Landscape Architect |

#### Business & finance

| Board / Bureau | Code | Examples |
|---|---|---|
| California Board of Accountancy (CBA) | `CBA` | Certified Public Accountant (individual & firm) |
| Cemetery & Funeral Bureau | `CFB` | Cemetery brokers, funeral directors, embalmers, crematories |

#### Personal care

| Board / Bureau | Code | Examples |
|---|---|---|
| Board of Barbering & Cosmetology (BBC) | `BBC` | Cosmetologist, Barber, Esthetician, Manicurist, Electrologist, Establishment |

#### Legal & specialty

| Board / Bureau | Code | Examples |
|---|---|---|
| Court Reporters Board | `CRB` | Certified Shorthand Reporter |
| State Athletic Commission | `SAC` | Boxers, MMA fighters, promoters, matchmakers, seconds |
| Guide Dogs for the Blind Board | `GDB` | Guide dog trainers, schools |

> The exact set of files in the Box folder shifts month to month as DCA reorganises its bureaus. The actor always reflects whatever DCA currently publishes.

***

### Use Cases

#### Healthcare staffing, locum tenens & travel nursing

California is the largest healthcare labour market in the United States. Travel nursing, locum physician, locum pharmacist and allied-health agencies use this dataset to:

- **Verify a candidate's CA license** before sending a credentialing packet to a hospital system or PBM.
- **Source candidates by metro** — every active RN in Los Angeles, every active RPh in San Francisco, every active LMFT in San Diego.
- **Refresh expiration dates monthly** so credentials never lapse mid-assignment and JCAHO / DNV audits stay clean.
- **Filter out cancelled and expired licensees** automatically with `licenseStatus: "current"`.
- **Cross-board match** — find professionals dual-licensed as RN + LMFT or RPh + PharmD for high-tier assignments.

#### Compliance, credentialing & primary-source verification (PSV)

Hospital systems, PBMs, MSOs, telehealth platforms and Medicaid managed-care plans use bulk DCA data to:

- **Automate monthly PSV** for every CA-licensed prescriber, dispenser or therapist on payroll.
- **Catch status changes** (`current` → `delinquent` / `cancelled`) within the monthly refresh window.
- **Maintain audit-ready logs** with the `scrapedAt` timestamp on every record.
- **Replace expensive per-lookup verification subscriptions** that bill per query.
- **Document due diligence** for The Joint Commission, URAC, NCQA, DMHC and DEA inspector audits.
- **Detect dual practice** — a pharmacist also showing up as a CSLB contractor is a red flag worth a closer look.

#### B2B sales & California-focused lead generation

Pharma reps, medical-device vendors, EHR / EMR companies, pharmacy management software, contractor SaaS, accounting tooling and PoS providers use the dataset to:

- **Build city- or county-targeted CA lead lists** filtered by board, license type or facility size.
- **Identify newly issued licenses** by diffing this month's run against last month's — fresh contractors, fresh CPAs, fresh dispensaries to onboard.
- **Route territory assignments** by ZIP, county or DMA.
- **Enrich CRM records** (Salesforce, HubSpot, Pipedrive, Apollo) with current license status, expiration and category.
- **Power direct-mail and door-knocker campaigns** with verified business mailing addresses.

#### Construction tech & contractor lead generation (CSLB)

The Contractors State License Board (CSLB) regulates **~280,000 active California contractors**. Construction-tech founders, materials suppliers, payment-app vendors, lien-management platforms and insurance brokers use the dataset to:

- **Find every active Class C-10 electrical contractor in Los Angeles County**, every active Class C-36 plumber in Orange County, every active Class B general in the Bay Area.
- **Map contractor density** by ZIP for last-mile sales territory planning.
- **Track newly licensed contractors** — a freshly minted Class B in Sacramento is a perfect-fit prospect for a starter ERP, payment terminal or insurance package.
- **Spot expiring bonds and licenses** for renewal-cycle marketing.
- **Validate sub-contractors** before a GC adds them to a bid.

#### Real-estate, mortgage, title & insurance underwriting

Insurers, title companies and lenders use license validity as an underwriting signal:

- **Verify contractor licenses** before bonding a project or writing a builder's-risk policy.
- **Confirm appraisers, surveyors and engineers** are in `current` standing before relying on their attestations.
- **Adjust pricing for disciplinary or cancelled history** automatically.
- **Monitor portfolio risk** — flag insureds whose status flips mid-policy.
- **Geocode by county and ZIP** to feed catastrophe-risk and wildfire-zone models.

#### M\&A, due diligence & investor research

Private equity, family offices, search funds and corporate development teams use DCA data when underwriting California acquisitions:

- **Roll-up sourcing** — every dental practice, every CPA firm, every Class B GC by county becomes a structured target list.
- **Pre-LOI verification** — confirm the target's listed principals actually hold the licenses they claim.
- **Continuity diligence** — for healthcare or contractor targets, check the responsible practitioner / qualifier has a clean status.
- **Market-sizing models** — count active licensees by category to back into TAM for a SaaS thesis.
- **Post-close monitoring** — watch the portfolio company's roster monthly for status drift.

#### Recruiting, sales-ops & talent sourcing

Recruiting platforms and outbound sales-ops teams use the dataset as a structured CA "people directory" for licensed professions:

- Build candidate pipelines for CPA firms, hospital systems, dental DSOs, contractor roll-ups and law-adjacent (court reporter) businesses.
- Match LinkedIn profiles against authoritative license data for outreach trust signals.
- Enrich ATS records with current credential status.
- Identify newly licensed professionals as warm leads for first-job recruiting.

#### Academic, public-health & policy research

Universities, state agencies and think tanks use DCA bulk data to:

- **Quantify healthcare-worker supply** by county, ZIP and license type.
- **Map healthcare deserts** — counties with low RN-per-capita, low RPh-per-capita or low MD-per-capita ratios.
- **Track licensure pipelines** over time as DCA refreshes monthly.
- **Study disciplinary patterns** — combine with each board's separate disciplinary roster.
- **Inform workforce-policy proposals** with hard empirical data.

#### Investigative journalism & data reporting

Reporters covering healthcare, construction, finance, beauty industry and consumer protection use DCA data to:

- **Verify credentials of profile subjects** before publication.
- **Map the geography of trades** — pharmacy deserts, contractor concentration in fire-rebuild zones, CPA density in tax-prep season stories.
- **Cross-reference government contractor awards** against CSLB records to spot mismatches.
- **Build interactive maps** of licensed professionals for public-interest reporting.

#### Legal discovery & expert-witness vetting

Plaintiff and defense firms use the dataset to:

- **Confirm expert credentials** before engagement.
- **Build chronologies** of an individual's license history when combined with archived runs.
- **Identify every active dentist / pharmacist / contractor at a given address** for litigation discovery.
- **Validate party allegations** about license status during pleadings.

***

### Sample Queries & Recipes

#### Recipe 1 — Every active CA pharmacist in Los Angeles County

```json
{
  "boards": ["Board of Pharmacy"],
  "licenseTypes": ["Registered Pharmacist"],
  "licenseStatus": "current",
  "counties": ["Los Angeles"],
  "stateFilter": "CA",
  "maxRecords": 0
}
```

#### Recipe 2 — All active CSLB Class B general contractors in Southern California

```json
{
  "boards": ["Contractors State License Board"],
  "licenseTypes": ["Class B"],
  "licenseStatus": "current",
  "counties": ["Los Angeles", "Orange", "San Diego", "Riverside", "San Bernardino", "Ventura"],
  "stateFilter": "CA",
  "maxRecords": 0
}
```

#### Recipe 3 — Every active Bay Area physician (Medical Board of California)

```json
{
  "boards": ["Medical Board of California"],
  "licenseTypes": ["Physician and Surgeon"],
  "licenseStatus": "current",
  "counties": ["San Francisco", "San Mateo", "Alameda", "Santa Clara", "Contra Costa", "Marin"],
  "stateFilter": "CA",
  "maxRecords": 0
}
```

#### Recipe 4 — Active RNs in the Sacramento metro for travel-nursing recruiting

```json
{
  "boards": ["Board of Registered Nursing"],
  "licenseTypes": ["Registered Nurse"],
  "licenseStatus": "current",
  "counties": ["Sacramento", "Placer", "El Dorado", "Yolo"],
  "stateFilter": "CA",
  "maxRecords": 50000
}
```

#### Recipe 5 — Every active CPA firm in California (Board of Accountancy)

```json
{
  "boards": ["Board of Accountancy"],
  "licenseTypes": ["Public Accountancy Corporation", "Partnership"],
  "licenseStatus": "current",
  "stateFilter": "CA",
  "maxRecords": 0
}
```

#### Recipe 6 — Dentists in Long Beach and Oakland for DSO M\&A sourcing

```json
{
  "boards": ["Dental Board of California"],
  "licenseTypes": ["Dentist"],
  "licenseStatus": "current",
  "counties": ["Los Angeles", "Alameda"],
  "stateFilter": "CA"
}
```

#### Recipe 7 — Compliance sweep: every delinquent or cancelled licensee statewide

```json
{
  "licenseStatus": "delinquent",
  "stateFilter": "CA",
  "maxRecords": 0
}
```

Combine with a second run for `cancelled` and concatenate downstream.

#### Recipe 8 — Cosmetology establishments in Fresno County

```json
{
  "boards": ["Board of Barbering and Cosmetology"],
  "licenseTypes": ["Establishment"],
  "licenseStatus": "current",
  "counties": ["Fresno"],
  "stateFilter": "CA"
}
```

#### Recipe 9 — Quick 50-record sample for a new pipeline build

```json
{
  "boards": ["Board of Pharmacy"],
  "maxRecords": 50
}
```

#### Recipe 10 — Out-of-state mail-order pharmacies licensed in California

```json
{
  "boards": ["Board of Pharmacy"],
  "licenseTypes": ["Nonresident Pharmacy"],
  "licenseStatus": "current",
  "stateFilter": ""
}
```

Leave `stateFilter` empty (or omit it) and the run includes pharmacies physically located outside CA but holding a CA permit.

***

### Integration Examples

#### Google Sheets (via Apify Integration)

1. Set up an Apify **schedule** running the actor on the 5th of each month at 06:00 PT (DCA's bulk drop usually lands in the first week).
2. Attach the **"Export to Google Sheets"** integration to the schedule.
3. Receive a fresh CA license tab in your Sheet every month — ready for filtering, pivoting, or distribution to sales reps.

#### Make.com / Zapier / n8n

Use the **Apify** native connector. Trigger downstream automations on:

- New records (current run minus previous run) → push to Slack or a CRM.
- Status changes (`current` → `cancelled`) → open a Salesforce Case.
- Address changes (relocations) → update HubSpot Company records.
- Newly issued licenses by category → trigger an outbound email cadence.

#### Power BI / Tableau / Looker / Metabase

Add Apify's REST API as a data source. Refresh on schedule. Build dashboards covering:

- Active licensee count by metro, county, ZIP, board.
- CSLB contractor density per neighbourhood.
- Healthcare-worker supply heat maps (RN, RPh, MD, DDS) per California county.
- Monthly churn (newly cancelled vs. newly issued) by board.

#### Postgres / Snowflake / BigQuery / Redshift

Use the [Apify webhook integration](https://docs.apify.com/platform/integrations/webhooks) to POST run results directly to a warehouse ingestion endpoint. Suggested table layout:

```sql
CREATE TABLE ca_dca_licenses (
    agency_code           text,
    agency_name           text,
    license_type_code     text,
    license_type_name     text,
    license_number        text,
    individual_or_org     text,
    last_name             text,
    first_name            text,
    middle_name           text,
    suffix                text,
    organization_name     text,
    address_line1         text,
    address_line2         text,
    city                  text,
    county                text,
    state                 text,
    zip                   text,
    country               text,
    original_issue_date   date,
    expiration_date       date,
    license_status        text,
    license_category      text,
    scraped_at            timestamptz,
    PRIMARY KEY (agency_code, license_number, scraped_at)
);
```

#### Salesforce / HubSpot / Pipedrive CRM enrichment

Trigger an Apify run monthly, then upsert against Account / Contact records keyed on `agency_code + license_number`. Status-change events can create Tasks, open Cases, or post to a #compliance Slack channel automatically.

#### Webhooks & event triggers

Send each new run's results to an HTTP endpoint with the built-in Apify webhook. Use the `licenseCategory` field to route healthcare records to a credentialing service and engineering records to a contractor-onboarding service in the same run.

#### Esri ArcGIS / Mapbox / Kepler

Use `state`, `county`, `city`, `zip`, `addressLine1` and `addressLine2` as the geocode key. Each record becomes a point on a state-wide licensee map. Combine with U.S. Census tract data to study healthcare-access disparities.

***

### Major California Metros at a Glance

| Metro Area | Primary Counties | Population | Notable for licensing data |
|---|---|---|---|
| Los Angeles | Los Angeles, Orange | 13.2M | Largest CA healthcare market; ~3,000 pharmacies, 80K+ RNs, 30K+ MDs, ~50K CSLB contractors |
| San Diego | San Diego | 3.3M | Biotech, defense, large hospital systems, dense dental market |
| San Francisco Bay Area | San Francisco, San Mateo, Alameda, Santa Clara, Contra Costa, Marin | 7.7M | Highest CPA density in CA; Kaiser, UCSF, Stanford |
| San Jose | Santa Clara | 2.0M | Engineering-heavy: BPELSG licensees per capita is highest in the state |
| Sacramento | Sacramento, Placer, Yolo, El Dorado | 2.4M | State-capital concentration of regulators, government health programs, BRN headquarters |
| Fresno | Fresno, Madera, Tulare | 1.2M | Central Valley healthcare hub, agriculture-adjacent contractor density |
| Long Beach | Los Angeles | 0.5M | Port-of-LA logistics, dense cosmetology and dental markets |
| Oakland | Alameda | 0.4M | Kaiser HQ, dense BBC and BBS markets |
| Riverside / San Bernardino (Inland Empire) | Riverside, San Bernardino | 4.7M | Booming residential construction → CSLB density |
| Bakersfield | Kern | 0.9M | Oil-and-gas adjacent engineering and contractor licenses |
| Anaheim | Orange | 0.4M | Hospitality, dental, cosmetology |
| Santa Ana | Orange | 0.3M | Healthcare, dental, contractors |
| Stockton | San Joaquin | 0.3M | Logistics-driven CSLB activity |
| Modesto | Stanislaus | 0.6M | Central Valley healthcare and ag-services contractors |

***

### Cost & Performance

| Metric | Value |
|---|---|
| Engine | Box Shared Items API + Playwright (Chromium) fallback for downloads |
| Runtime (single small board, e.g. Pharmacy) | ~1–3 minutes |
| Runtime (large board, e.g. CSLB, BRN, BBC) | ~5–15 minutes per file |
| Runtime (full bulk pull, all 36 boards) | 30–90 minutes, dominated by ZIP downloads |
| Cost per run | Varies — pay-per-event scales with records delivered; small targeted runs cost cents, full pulls are still cheap by industry standards |
| Pricing model | Pay-per-event (transparent line-item billing on Apify) |
| Data freshness | **Monthly** — DCA refreshes the Box folder roughly once a month |
| Auth required | None (Box folder is public) |
| Proxy required | No — supported but not needed |
| Concurrency | Safe to run multiple board-scoped configurations in parallel |
| Memory footprint | 2 GB minimum, 8 GB recommended for full-board pulls due to ZIP extraction |
| Storage temp footprint | ZIP extraction writes to `/tmp/dca_extract_<ts>/` and cleans up after parsing |

***

### Compliance, Privacy & Legal Notes

- **Public data only.** Every field in this dataset is published by the California Department of Consumer Affairs at [data.dca.ca.gov](https://data.dca.ca.gov/) and the public Box folder under the California Public Records Act (Gov. Code §§ 7920.000 et seq.).
- **No PHI.** The dataset contains no patient health information; it is licensing data, not clinical data. HIPAA does not apply.
- **No SSNs, no DOBs, no financial accounts.** Only public license-related information is published.
- **Addresses are the address of record reported to DCA**, typically a business or practice address. For solo practitioners and small contractors the address of record can be a home address; data consumers must apply judgement before mailing or door-knocking.
- **No email addresses or phone numbers.** DCA does not publish licensee emails in the bulk file. Phone is occasionally present for facility-type records.
- **CCPA / GDPR** — California licensing data is on the public record, but consumer-facing use of the data (B2C marketing, profiling) must comply with the CCPA / CPRA, and EU-resident usage must comply with GDPR. Compliance is the responsibility of the data consumer.
- **CAN-SPAM / TCPA** — the dataset does not include emails; if you append phone numbers from other sources, TCPA/DNC compliance applies.
- **DCA Terms of Use** — the actor accesses DCA's *intended public publication* on Box (which DCA explicitly distributes for re-use). Do not attempt to use it for unlawful purposes including identity fraud, stalking, harassment, or impersonation.

> **Important:** California license data may not be used as a substitute for the legally required disciplinary lookup on each board's primary verification portal where a board-mandated check is required (e.g. CSLB lien purposes, Joint Commission credentialing). Use this dataset to scale routine workflows; defer to each board's primary verification UI when statute requires it.

***

### Frequently Asked Questions

#### How fresh is the data?

DCA refreshes its Box bulk-data folder roughly **once a month**. The actor downloads the latest available file on each run, so worst-case staleness is the gap between the last DCA publication and your run.

#### Why monthly instead of daily?

DCA's bulk publication cadence is monthly. The boards' own search portals are real-time but rate-limited and CAPTCHA-protected. The actor optimises for **scale** (millions of records cheaply) rather than minute-by-minute freshness; combine with a per-license verification call on critical workflows if you need real-time confirmation.

#### How many records will I get?

A full unfiltered pull across all 36 boards returns roughly **3.3 million records**, dominated by CSLB (~280K active contractors), BRN (~500K nurse licenses across active and historical), BBC (~700K cosmetology licenses including establishments), and the various health-care boards. Pre-filter heavily for targeted runs.

#### Does the actor need a Box account or login?

No. The folder is a public Box share. The Box API path works anonymously via the `BoxApi: shared_link=…` header; the Playwright path navigates to the public folder URL.

#### Do I need an Apify residential proxy?

No. Box does not rate-limit the public share endpoint for typical workloads. Apify Proxy is supported but not required; enable it only for very heavy parallel scheduling.

#### Why is `maxRecords` defaulted to 1,000?

So a first-time user does not accidentally trigger a multi-million-record pull. Set `maxRecords: 0` for unlimited once you are confident in your filter set.

#### Does this scraper cover Board of Real Estate?

**No.** California real-estate license data is regulated by the **Department of Real Estate (DRE)**, which is *not* part of DCA and publishes its data separately. This actor covers the 36 boards under the DCA umbrella. DRE is on the roadmap as a sibling actor.

#### Does this cover BAR (State Bar of California) attorney data?

**No.** Attorney licensing is regulated by the **State Bar of California**, which is independent of DCA. This actor does not include attorney records.

#### Does the dataset include disciplinary action history?

The bulk file shows current **license status** (`current`, `delinquent`, `cancelled`, etc.) but does **not** include the full disciplinary action narrative. For full disciplinary text, consult each board's public disciplinary documents — DCA publishes those separately and the actor's `cancelled` and `delinquent` status fields are a reliable filter for "needs further review".

#### Can I get NPI numbers for CA healthcare licensees?

NPI is issued federally by **CMS / NPPES**, not by DCA. Join license records to the [NPPES NPI Registry](https://npiregistry.cms.hhs.gov/) on `lastName + firstName + state` (or by name + license-number lookup tables) to enrich.

#### Why does my CSLB run take longer than my Pharmacy run?

CSLB's data file is shipped as a large ZIP archive (often 50–150 MB) and contains 280K+ active contractors plus historical records. Extraction + parsing dominates runtime, and is much heavier than the Pharmacy file (~30 MB).

#### Does the actor deduplicate across boards?

No. A person may legitimately hold licenses across multiple boards (e.g. an RN who is also a pharmacist, or a contractor who is also an architect). Each board's record is preserved. Deduplicate on `(agencyCode, licenseNumber)` if you want one row per license.

#### Are out-of-state licensees included?

Yes — for boards that license out-of-state professionals (e.g. nonresident pharmacies, telehealth physicians, out-of-state contractors). Set `stateFilter: "CA"` to exclude them.

#### What if DCA changes the file format?

The actor fuzzy-matches header names AND falls back to **positional mapping** on the canonical 21-column DCA layout. Past header reformats have not broken the actor. If a future change does, file an issue on the Apify Store page and a patch will follow.

#### Can I schedule this on the Apify free plan?

Yes. The actor itself runs on the free tier — set a monthly Apify schedule on the 5th–7th of the month.

#### What export formats are supported?

JSON, CSV, Excel (XLSX), HTML, XML, RSS, and JSON Lines — directly from the Apify dataset view or the REST API.

#### Will this work for other US states?

Not this actor — DCA is California-specific. We maintain separate actors for Texas, Arizona, Washington, Virginia, Colorado, Minnesota, Ohio, Illinois, North Carolina, and federal sources. See **Related Apify Actors** below.

#### How do I report a bug or request a board that is missing?

Open an issue on the actor's Apify Store page or contact the developer directly through the Apify Console. Board additions usually ship within a release cycle.

#### What happens if a board temporarily ships an empty or corrupt file?

The actor logs the failure, skips the file, and continues with the next board. You receive partial output for the boards that succeeded. Re-run after DCA reposts the corrected file.

#### Does the actor write to disk?

Only `/tmp/` for ZIP extraction (immediately cleaned up after parsing). All output goes to the Apify dataset; nothing is persisted locally beyond the run lifetime.

***

### Related Apify Actors by Haketa

If you need licensing data from other US states or related regulatory bodies, the catalog below pairs naturally with this actor:

- [Texas Pharmacy License Scraper — TSBP](https://apify.com/haketa/tsbp-license-scraper) — Texas State Board of Pharmacy
- [Arizona ROC Contractor License Scraper](https://apify.com/haketa/az-roc-contractor-license-scraper) — Arizona Registrar of Contractors
- [Washington L\&I Contractor License Scraper](https://apify.com/haketa/washington-li-contractor-license-scraper) — Washington Department of Labor & Industries
- [NC Licensing Board for General Contractors Scraper](https://apify.com/haketa/nc-licensing-board-for-general-contractors-scraper) — North Carolina general contractors
- [Colorado Professional License Scraper](https://apify.com/haketa/colorado-professional-license-scraper) — Colorado DORA
- [Virginia DPOR Professional License Scraper](https://apify.com/haketa/virginia-dpor-license-scraper) — Virginia Dept. of Professional & Occupational Regulation
- [Minnesota DLI Professional License Scraper](https://apify.com/haketa/minnesota-dli-license-scraper) — Minnesota Dept. of Labor & Industry
- [Ohio eLicense Scraper](https://apify.com/haketa/ohio-elicense-scraper) — Ohio professional licenses
- [Illinois IDFPR License Scraper](https://apify.com/haketa/illinois-idfpr-license-scraper) — Illinois Dept. of Financial & Professional Regulation
- [TTB Alcohol Permittee Scraper](https://apify.com/haketa/ttb-alcohol-permittee-scraper) — federal alcohol permittees
- [SAM.gov Federal Contractor Entity Scraper](https://apify.com/haketa/sam-gov-federal-contractor-scraper) — federal contractor registry
- [BBB Business Scraper](https://apify.com/haketa/bbb-scraper) — Better Business Bureau profiles
- [Care.com Caregiver Scraper](https://apify.com/haketa/care-com-caregiver-scraper) — companion to BBS records for caregiver workforce research
- [WhatClinic.com Clinic Scraper](https://apify.com/haketa/whatclinic-scraper) — global clinic directory data

***

### Comparison vs. Alternatives

| Approach | Setup time | Data freshness | Cost (10K records) | Schema normalisation | Cross-board coverage |
|---|---|---|---|---|---|
| **This actor** | < 1 minute | Monthly bulk drop | Cents per run | Built-in | 36 boards in one tool |
| Manual Box download | 10–20 min/board/month | Monthly | Free | None | Per board, manual |
| Per-board search UI scraping | 4–16 hours dev / board | Real-time but CAPTCHA-gated | Slow + IP-cost | Per board | Build N scrapers |
| Custom Python + Playwright build | 8–16 hours dev | Monthly | Free + infra cost | DIY | DIY |
| Paid PSV verification API | Hours setup | Real-time | $100–500+/mo | Yes | Limited |
| DCA public-records request | Days–weeks | Stale by issue | Free / variable | None | Single response |

***

### Why Pay-Per-Event Pricing?

This actor uses **pay-per-event** pricing rather than a flat monthly subscription or per-Compute-Unit charge:

- You pay only when the actor runs — no idle-month bills.
- Charges scale with how much data you actually consume — a 50-record sample is essentially free.
- Transparent, line-item billing inside the Apify console.
- No monthly minimums and no commitment.
- Free to evaluate — sample with `maxRecords: 50` for pennies before committing to a full board pull.
- Plays well with monthly cadence — DCA refreshes monthly, so you pay roughly 12 times a year for full freshness.

***

### Changelog

| Version | Date | Notes |
|---|---|---|
| 1.0.0 | 2026-05 | Initial public release — Box folder + Shared Items API ingestion, Playwright download fallback, ZIP extraction, fuzzy + positional header mapping, six-value status enum, per-board / per-county / per-state filtering, category tagging across 36 boards. |

***

### Keywords

California DCA license lookup · California Department of Consumer Affairs scraper · CA professional license verification · CSLB scraper · California Contractors State License Board data · California contractor license search · CSLB Class B general contractor lookup · CSLB Class A general engineering contractor data · California medical board lookup · MBC physician verification · California physician license scraper · BRN nursing license scraper · California registered nurse data · LVN BVNPT license lookup · California pharmacy license data · Board of Pharmacy California scraper · Registered Pharmacist California verification · pharmacy technician California · California dental board scraper · DDS license California · CA real estate appraiser license · California Board of Accountancy CPA lookup · CBA CPA firm directory · California cosmetology license data · Board of Barbering and Cosmetology BBC scraper · California optometrist license data · veterinary license California · LMFT LCSW LPCC California verification · California Board of Behavioral Sciences scraper · CA professional engineer license BPELSG · California architect license search · landscape architect California lookup · California court reporter license · acupuncturist California license · chiropractor California license verification · CA license bulk download · DCA bulk data Box.com · monthly California licensee dataset · California license compliance automation · California credentialing PSV · California license API · CA license CSV download · Los Angeles pharmacist database · San Diego physician database · San Francisco CPA directory · San Jose engineer database · Sacramento nurse directory · Fresno contractor database · Long Beach dental directory · Oakland behavioral sciences directory · California pharmacy real estate accountant license data · CA contractor lead generation · California healthcare workforce research · CA licensed professional B2B leads

***

### Support

- **Bug reports:** open an issue on the actor's Apify Store page.
- **Feature requests / new board additions:** same place — please describe the board, the use case, and link to the source file if known.
- **Direct contact:** reach the developer through the Apify developer profile.

If this actor saves you time, a **5-star rating** on the Apify Store helps other California compliance, recruiting, sales, construction-tech and research teams discover it. Thank you.

# Actor input Schema

## `boards` (type: `array`):

Board/agency names to download. Leave empty for all boards. Examples: 'Board of Pharmacy', 'Board of Registered Nursing', 'Medical Board of California', 'Dental Board of California'. Case-insensitive partial match. See README for full list.

## `licenseTypes` (type: `array`):

License type names to filter. Leave empty for all types. Examples: 'Registered Pharmacist', 'Registered Nurse', 'Physician', 'Dentist'. Case-insensitive partial match.

## `licenseStatus` (type: `string`):

Filter by license status.

## `counties` (type: `array`):

Filter by California county. Leave empty for all counties. Examples: 'Los Angeles', 'San Francisco', 'San Diego', 'Orange', 'Sacramento'. Case-insensitive partial match.

## `stateFilter` (type: `string`):

Filter by state. Default 'CA' for California only. Set empty for all states (includes out-of-state licensees).

## `maxRecords` (type: `integer`):

Maximum total records to output. Set 0 for unlimited. Warning: some boards have 500K+ records — set a limit for testing.

## `proxyConfiguration` (type: `object`):

Proxy settings. Box.com public files work without proxy.

## Actor input object example

```json
{
  "boards": [
    "Board of Pharmacy"
  ],
  "licenseTypes": [],
  "licenseStatus": "all",
  "counties": [],
  "stateFilter": "",
  "maxRecords": 1000,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# Actor output Schema

## `agencyCode` (type: `string`):

DCA agency/board code

## `agencyName` (type: `string`):

Full board/bureau name

## `licenseTypeCode` (type: `string`):

License type abbreviation

## `licenseTypeName` (type: `string`):

Full license type name

## `licenseNumber` (type: `string`):

License number

## `individualOrOrg` (type: `string`):

Individual or Organization

## `lastName` (type: `string`):

Licensee last name

## `firstName` (type: `string`):

Licensee first name

## `city` (type: `string`):

City

## `county` (type: `string`):

California county

## `state` (type: `string`):

State abbreviation

## `zip` (type: `string`):

ZIP code

## `originalIssueDate` (type: `string`):

Original license issue date

## `expirationDate` (type: `string`):

License expiration date

## `licenseStatus` (type: `string`):

Current/Delinquent/Inactive etc.

## `licenseCategory` (type: `string`):

healthcare/engineering/business

## `scrapedAt` (type: `string`):

ISO timestamp

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "boards": [
        "Board of Pharmacy"
    ],
    "licenseTypes": [],
    "counties": [],
    "stateFilter": "",
    "maxRecords": 1000,
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("haketa/california-dca-license-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "boards": ["Board of Pharmacy"],
    "licenseTypes": [],
    "counties": [],
    "stateFilter": "",
    "maxRecords": 1000,
    "proxyConfiguration": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("haketa/california-dca-license-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "boards": [
    "Board of Pharmacy"
  ],
  "licenseTypes": [],
  "counties": [],
  "stateFilter": "",
  "maxRecords": 1000,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}' |
apify call haketa/california-dca-license-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=haketa/california-dca-license-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "California DCA Professional License Scraper",
        "description": "Download and parse California professional license data from DCA's public Box folder. 3.3M+ active licenses across 36 boards — pharmacy, nursing, medical, dental, engineering and more. Monthly updated bulk data, no browser needed.",
        "version": "0.0",
        "x-build-id": "EZvX3eDXv9p8SXp5B"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/haketa~california-dca-license-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-haketa-california-dca-license-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/haketa~california-dca-license-scraper/runs": {
            "post": {
                "operationId": "runs-sync-haketa-california-dca-license-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/haketa~california-dca-license-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-haketa-california-dca-license-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "boards": {
                        "title": "Boards",
                        "type": "array",
                        "description": "Board/agency names to download. Leave empty for all boards. Examples: 'Board of Pharmacy', 'Board of Registered Nursing', 'Medical Board of California', 'Dental Board of California'. Case-insensitive partial match. See README for full list.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "licenseTypes": {
                        "title": "License Types",
                        "type": "array",
                        "description": "License type names to filter. Leave empty for all types. Examples: 'Registered Pharmacist', 'Registered Nurse', 'Physician', 'Dentist'. Case-insensitive partial match.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "licenseStatus": {
                        "title": "License Status",
                        "enum": [
                            "all",
                            "current",
                            "delinquent",
                            "inactive",
                            "cancelled",
                            "expired"
                        ],
                        "type": "string",
                        "description": "Filter by license status.",
                        "default": "all"
                    },
                    "counties": {
                        "title": "Counties",
                        "type": "array",
                        "description": "Filter by California county. Leave empty for all counties. Examples: 'Los Angeles', 'San Francisco', 'San Diego', 'Orange', 'Sacramento'. Case-insensitive partial match.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "stateFilter": {
                        "title": "State Filter",
                        "type": "string",
                        "description": "Filter by state. Default 'CA' for California only. Set empty for all states (includes out-of-state licensees).",
                        "default": ""
                    },
                    "maxRecords": {
                        "title": "Max Records",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum total records to output. Set 0 for unlimited. Warning: some boards have 500K+ records — set a limit for testing.",
                        "default": 1000
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Proxy settings. Box.com public files work without proxy."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
