California DCA Professional License Scraper avatar

California DCA Professional License Scraper

Pricing

from $0.50 / 1,000 results

Go to Apify Store
California DCA Professional License Scraper

California DCA Professional License Scraper

Download and parse California professional license data from DCA's public Box folder. 3.3M+ active licenses across 36 boards — pharmacy, nursing, medical, dental, engineering and more. Monthly updated bulk data, no browser needed.

Pricing

from $0.50 / 1,000 results

Rating

0.0

(0)

Developer

Haketa

Haketa

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 hours ago

Last modified

Share

California DCA Professional License Scraper — Bulk License Lookup for CSLB, BRN, Medical Board, Pharmacy, Dental, Accountancy & 30+ More Boards

The most comprehensive California Department of Consumer Affairs (DCA) license extractor on Apify. Download structured records for 3.3M+ active California professional licenses across 36 state boards and bureaus — pharmacists, registered nurses, physicians, dentists, contractors, engineers, accountants, real estate appraisers, cosmetologists and more — from DCA's official monthly bulk data drop on Box. No CAPTCHA juggling, no per-search rate limit, no per-board scraper rewrites.

Apify Actor


What This Actor Does

The California DCA Professional License Scraper is a production-grade Apify Actor that downloads, normalises, filters, and serves the complete public licensing dataset published by the California Department of Consumer Affairs (DCA) — the umbrella agency that regulates professional licensing for the State of California across health care, building trades, engineering, accounting, personal care, legal support and more.

DCA publishes its license database as a monthly bulk drop on a public Box.com folder (dca.box.com/s/oss6hf8jys2bmgxqd2gdz7w4oepm2il9) — one pipe-delimited file per board or bureau. Each file is shipped with a non-trivial header that drifts between boards, sometimes ships zipped, and lives behind a JavaScript-rendered Box page that resists naive HTTP fetchers. This actor handles all of that automatically and returns a single unified JSON schema across every board, so a downstream pipeline does not have to special-case Pharmacy vs. CSLB vs. Medical vs. Dental.

In a typical run you receive structured records covering:

  • Health care licensees — Registered Pharmacists, Pharmacy Technicians, Registered Nurses (BRN), Vocational Nurses (BVNPT), Physicians & Surgeons (MBC), Physician Assistants, Dentists & RDAs (DBC), Optometrists, Psychologists, Veterinarians, Chiropractors, Acupuncturists, Naturopathic Doctors, Podiatric Physicians, Respiratory Therapists, Occupational Therapists, Physical Therapists, Behavioral Sciences (LMFT/LCSW/LPCC), Speech-Language Pathologists, Dietitians, Hearing Aid Dispensers
  • Building & engineering trades — California Contractors State License Board (CSLB) general and specialty contractors, Professional Engineers, Land Surveyors, Professional Geologists, Architects, Landscape Architects
  • Business & finance — California Board of Accountancy (CPAs and CPA firms), Cemetery & Funeral Bureau, real-estate-adjacent appraisers
  • Personal care — Board of Barbering & Cosmetology (BBC) licensees and establishments
  • Legal & specialty — Court Reporters Board, Guide Dogs for the Blind, Athletic Commission, Pilots

Every record carries the issuing agency code, license type, license number, current status, original issue date, expiration date, full mailing address, county, state, ZIP and licensee or organisation name — the building blocks for compliance, sales, credentialing, recruiting, due-diligence, location intelligence and research workflows.


Why scrape California DCA yourself when this exists?

DCA publishes the data freely, but actually consuming it at scale is its own engineering project. Common pain points the actor solves out of the box:

  • Box.com folder is JavaScript-rendered — a plain curl against the share URL returns an empty shell. The actor talks to the Box Shared Items API and falls back to a real Chromium download flow when needed.
  • Per-board file naming drifts — files include spaces, dates, version suffixes and the occasional spelling variant. The actor uses fuzzy keyword matching to find the right file for the board you asked for.
  • Some boards ship ZIPs, some ship raw pipe-delimited text — large files (CSLB, BRN, BBC) arrive as .zip archives. The actor detects the PK magic header, extracts the largest non-summary entry, and returns the underlying text without OOM-ing on multi-hundred-megabyte payloads.
  • Pipe-delimited files with shifting headers — column counts and header spellings vary per board. The actor fuzzy-matches header names, and falls back to positional mapping when DCA temporarily ships a file without the canonical header row.
  • CSLB alone has 280K+ active contractors — at the search UI you would need tens of thousands of paginated lookups; the bulk file solves it in one download.
  • Status fields are inconsistent (CURRENT, Current, current) — the actor normalises everything to lower-case canonical values that drive a clean enum filter.
  • No NPI, no SSN, no DOB — the bulk dataset is licensing-only, so no GLBA/HIPAA exposure if you treat it as the public record it is.
  • No daily update API — DCA only refreshes monthly, so you must redownload everything. The actor makes a fresh pull cheap (no browser-per-search overhead).
  • Address blocks differ — some boards split ADDRESS_LINE_1 / ADDRESS_LINE_2, others ship a single combined block. The actor exposes both fields and never silently drops content.
  • Out-of-state licensees — mail-order pharmacies, telehealth doctors and out-of-state contractors all hold California licenses with non-CA addresses. The actor exposes a stateFilter so you can choose to include or exclude them.
  • No incremental diff — DCA does not publish change-deltas. Schedule the actor monthly, archive the dataset, and diff yourself.

This actor encapsulates roughly eight to sixteen hours of one-off engineering — Box auth quirks, ZIP extraction, header normalisation, positional fallback, status canonicalisation, county filtering — into a single npm install of an actor.


Quick Start

One-Click Run (Apify UI)

  1. Open the actor page → click "Try for free".
  2. In the Boards field, type one or more board names (e.g. Board of Pharmacy, Contractors State License Board, Board of Registered Nursing). Leave it empty to attempt every board.
  3. Optionally narrow by License Type, License Status, County or State Filter, then set a sensible Max Records cap (default 1,000 — full bulk runs can return millions).
  4. Hit Start. Within a few minutes your dataset is ready as JSON, CSV, Excel, XML, RSS or HTML.

API Run (Python)

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("haketa/california-dca-license-scraper").call(run_input={
"boards": ["Board of Pharmacy", "Board of Registered Nursing"],
"licenseStatus": "current",
"counties": ["Los Angeles", "San Diego", "Orange"],
"stateFilter": "CA",
"maxRecords": 5000,
})
for record in client.dataset(run["defaultDatasetId"]).iterate_items():
print(
record["agencyName"],
record["licenseNumber"],
record.get("lastName") or record.get("organizationName"),
record["city"],
record["licenseStatus"],
)

API Run (Node.js / TypeScript)

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('haketa/california-dca-license-scraper').call({
boards: ['Contractors State License Board'],
licenseStatus: 'current',
counties: ['Los Angeles', 'Orange', 'Riverside', 'San Bernardino'],
stateFilter: 'CA',
maxRecords: 10000,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Got ${items.length} active SoCal CSLB contractors`);

API Run (cURL)

curl -X POST "https://api.apify.com/v2/acts/haketa~california-dca-license-scraper/runs?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"boards": ["Medical Board of California"],
"licenseStatus": "current",
"counties": ["San Francisco", "San Mateo", "Alameda", "Santa Clara"],
"maxRecords": 2000
}'

API Run (Apify CLI)

apify call haketa/california-dca-license-scraper --input='{
"boards": ["Board of Accountancy"],
"licenseStatus": "current",
"stateFilter": "CA",
"maxRecords": 0
}'

How It Works

Source of truth

DCA's public bulk-data folder lives at:

https://dca.box.com/s/oss6hf8jys2bmgxqd2gdz7w4oepm2il9

It contains one file per board or bureau, refreshed roughly every month. Most files are pipe-delimited (|) text; a handful (notably CSLB, BRN, BBC) are shipped as ZIP archives containing the same pipe-delimited data plus a small Counts / summary file.

Engineering pipeline

StageTechniqueNotes
1. Folder listingBox Shared Items API with the BoxApi: shared_link=… headerReturns each file's id, name, size. Falls back to parsing Box.prefetchedData embedded in the share page, then to regex extraction of {type:"file"} blobs.
2. File matchingFuzzy keyword match against the file nameStrips board of / bureau of / committee from the user's input and matches the first remaining token against the file name.
3. DownloadPlaywright (Chromium, headless) navigating the Box folder UIBox's download endpoint is signed, single-use, and JS-gated; a real browser session is the only reliable extraction path. Cookie consent banners are dismissed automatically.
4. Format detectionFirst two bytes of the buffer0x50 0x4B triggers ZIP extraction via adm-zip; everything else is treated as UTF-8 text.
5. ZIP extractionadm-zip extractEntryTo to /tmp/dca_extract_<ts>/Picks the largest entry that is not *count* / *summary* / *readme* — that is consistently the actual data file.
6. Header detectionPipe / tab / comma sniff on first rowReal DCA files use `
7. Header mappingFuzzy match on AGENCY_CODE, LICENSE_NUMBER, ORG/LAST_NAME, ZIP_CODE etc.Falls back to positional mapping (canonical 21-column DCA layout) when fewer than five header tokens are recognised.
8. Record normalisationTrim, strip wrapping quotes, null-coalesce emptiesOutput schema is identical across every board.
9. Category taggingKeyword-based board → category mappingBuckets each record into healthcare, engineering, business, personal_care, legal or other.
10. FilteringBoard / license-type / status / county / state in-memory passesAll filters are case-insensitive partial matches except state, which is an exact uppercase compare.
11. Dataset pushDataset.pushData(record)Progress is logged every 500 records. Run terminates cleanly when maxRecords is hit.

Notable behaviours

  • Memory: the actor is configured with 2 GB minimum, 8 GB maximum because large boards (CSLB, BRN, BBC) ship 50–250 MB files; ZIP extraction goes through disk (/tmp/) rather than RAM to keep the runtime stable.
  • Proxy: Apify Proxy is supported but not required — Box's public share endpoints do not rate-limit modest crawler traffic from datacentre IPs. Enable proxy only if you are running many parallel jobs from the same Apify region.
  • Playwright fallback: if Playwright cannot be imported (e.g. on a stripped-down build), the actor falls back to direct file URL patterns such as https://data.dca.ca.gov/DCALicenseData/<Board>.csv for boards that happen to expose them.
  • Deduplication: DCA's files are already keyed by license number; the actor does not collapse cross-board duplicates because a person can legitimately hold licenses across multiple boards (e.g. an RN who is also an RPh).

Input Parameters

{
"boards": ["Board of Pharmacy", "Board of Registered Nursing"],
"licenseTypes": ["Registered Pharmacist", "Registered Nurse"],
"licenseStatus": "current",
"counties": ["Los Angeles", "San Diego", "Orange"],
"stateFilter": "CA",
"maxRecords": 5000,
"proxyConfiguration": { "useApifyProxy": true }
}

Parameter reference

ParameterTypeDefaultDescription
boardsarray<string>["Board of Pharmacy"] (prefilled)Board / bureau names to download. Case-insensitive partial match against the file name. Leave empty to attempt every file in the Box folder. Examples: Board of Pharmacy, Medical Board of California, Contractors State License Board, Board of Accountancy, Dental Board of California, Board of Barbering and Cosmetology.
licenseTypesarray<string>[]Substring filter on licenseTypeName. Examples: Registered Pharmacist, Pharmacy Technician, Registered Nurse, Vocational Nurse, Physician and Surgeon, Certified Public Accountant, Class A General Engineering Contractor. Empty array = no type filter.
licenseStatusstring enumallOne of all, current, delinquent, inactive, cancelled, expired. See the Status Reference table below for the meaning of each value.
countiesarray<string>[]Filter by California county. Case-insensitive partial match. Examples: Los Angeles, San Francisco, San Diego, Orange, Sacramento, Alameda, Santa Clara, Riverside, San Bernardino, Fresno.
stateFilterstring"" (empty = all states)Two-letter state code. Set to CA to exclude out-of-state mail-order, telehealth or out-of-state contractor licensees.
maxRecordsinteger1000Cap total output across all boards. 0 = unlimited. The default 1,000 is a sane testing limit because a single full bulk pull can return millions of records and consume non-trivial compute.
proxyConfigurationobject{ "useApifyProxy": true } (prefilled)Standard Apify proxy config. Optional — Box does not rate-limit the share endpoint for typical workloads.

Output Schema

Every record uses the same flat JSON schema across every board so downstream consumers do not need per-board branching. Both individual and organisation records share the same envelope; organisation-only or individual-only fields are simply null where they do not apply.

Field reference

FieldTypeAlways presentDescription
agencyCodestringyesDCA-issued 4-letter board / bureau code (e.g. PHA, RN, MBC, DBC, CBA).
agencyNamestringyesFull board / bureau name as published by DCA (e.g. Board of Pharmacy, Medical Board of California).
licenseTypeCodestringusuallyShort type code (e.g. RPH, RN, PHY, DDS, CPA, B).
licenseTypeNamestringusuallyFull license-type label (e.g. Registered Pharmacist, Class B General Building Contractor).
licenseNumberstringyesBoard-issued license / registration number. Not always numeric — some boards prefix the type code.
individualOrOrgstringusuallyI for individuals, O for organisations (firms, facilities, partnerships).
lastNamestring | nullindividualsFamily name for individual licensees.
firstNamestring | nullindividualsGiven name for individual licensees.
middleNamestring | nullindividualsMiddle name / initial.
suffixstring | nullindividualsSuffix such as JR, SR, II, MD, DO.
organizationNamestring | nullorganisationsDBA / facility / firm name.
addressLine1stringusuallyFirst line of mailing address as reported to DCA.
addressLine2string | nullsometimesSuite, unit, building.
citystringusuallyCity.
countystring | nullCA-resident recordsCalifornia county. null for out-of-state licensees.
statestringusuallyTwo-letter US state abbreviation.
zipstringusually5- or 9-digit ZIP.
countrystringusuallyCountry abbreviation (USA for almost all CA licensees).
originalIssueDatestringusuallyOriginal license issue date.
expirationDatestringusuallyCurrent expiration date.
licenseStatusstringyesStatus — see Status Reference below.
licenseCategorystringyesDerived bucket: healthcare, engineering, business, personal_care, legal, other.
scrapedAtstring (ISO-8601)yesUTC timestamp when this record was emitted.

Example — Registered Pharmacist (Board of Pharmacy)

{
"agencyCode": "PHA",
"agencyName": "Board of Pharmacy",
"licenseTypeCode": "RPH",
"licenseTypeName": "Registered Pharmacist",
"licenseNumber": "99999",
"individualOrOrg": "I",
"lastName": "RODRIGUEZ",
"firstName": "MARIA",
"middleName": "T",
"suffix": null,
"organizationName": null,
"addressLine1": "350 S GRAND AVE",
"addressLine2": "STE 1200",
"city": "Los Angeles",
"county": "Los Angeles",
"state": "CA",
"zip": "90071",
"country": "USA",
"originalIssueDate": "2014-05-22",
"expirationDate": "2026-09-30",
"licenseStatus": "current",
"licenseCategory": "healthcare",
"scrapedAt": "2026-05-16T09:00:00.000Z"
}

Example — CSLB general contractor (Contractors State License Board)

{
"agencyCode": "CSLB",
"agencyName": "Contractors State License Board",
"licenseTypeCode": "B",
"licenseTypeName": "Class B - General Building Contractor",
"licenseNumber": "999999",
"individualOrOrg": "O",
"lastName": null,
"firstName": null,
"middleName": null,
"suffix": null,
"organizationName": "PACIFIC COAST BUILDERS INC",
"addressLine1": "1450 HARBOR BLVD",
"addressLine2": null,
"city": "San Diego",
"county": "San Diego",
"state": "CA",
"zip": "92101",
"country": "USA",
"originalIssueDate": "2009-11-04",
"expirationDate": "2026-11-30",
"licenseStatus": "current",
"licenseCategory": "engineering",
"scrapedAt": "2026-05-16T09:00:00.000Z"
}

License Status Reference

DCA boards do not share a single status vocabulary. The actor normalises everything to lower-case canonical values, then maps to the six-value enum below.

Statuses that signal the licensee may legally operate

StatusMeaning
currentLicense is active, paid up, and in good standing. The default for almost every working California professional.

Statuses that mean the licensee may NOT operate

StatusMeaning
delinquentRenewal payment lapsed; licensee has a grace window to cure but cannot legally practice in the meantime.
inactiveVoluntarily placed in an inactive bucket — common for retired CPAs, snow-bird physicians, and pharmacists in non-practising roles.
cancelledLicense terminated administratively or by board action.
expiredLicense expired and was not renewed within the cure window.

Tip: Use licenseStatus: "current" to receive only practising licensees. The bulk file also contains historical statuses, so a request without a status filter returns the broader population for trend analytics.


DCA Boards & Bureaus Covered

The Box folder ships one file per board / bureau. The actor's category-tagger maps each board to a high-level bucket so downstream stacks can group records without parsing names.

Health care

Board / BureauCodeExamples of license types
Board of PharmacyPHARegistered Pharmacist, Pharmacy Technician, Pharmacy, Wholesaler, Compounding facility
Board of Registered Nursing (BRN)RNRegistered Nurse, Nurse Practitioner, CNS, CNM, CRNA, Public Health Nurse
Board of Vocational Nursing & Psychiatric Technicians (BVNPT)LVNLicensed Vocational Nurse, Psychiatric Technician
Medical Board of California (MBC)MBCPhysician & Surgeon (MD), Podiatric Physician, Midwife, Polysomnographic Tech
Physician Assistant BoardPABPhysician Assistant
Osteopathic Medical BoardOMBOsteopathic Physician & Surgeon (DO)
Dental Board of California (DBC)DBCDentist (DDS), Registered Dental Assistant, Oral & Maxillofacial Surgeon
Dental Hygiene BoardDHBCRegistered Dental Hygienist, RDH in Alternative Practice
State Board of OptometryOPTOptometrist, Spectacle Lens Dispenser
Board of PsychologyPSYLicensed Psychologist, Registered Psychology Assistant
Veterinary Medical BoardVMBVeterinarian, Registered Veterinary Technician, Veterinary Premises
Board of Chiropractic ExaminersCHIRODoctor of Chiropractic
Acupuncture BoardACULicensed Acupuncturist
Naturopathic Medicine CommitteeNMCNaturopathic Doctor
Respiratory Care BoardRCBRespiratory Care Practitioner
Occupational Therapy BoardOTBOccupational Therapist, COTA
Physical Therapy BoardPTBPhysical Therapist, PTA
Board of Behavioral Sciences (BBS)BBSLMFT, LCSW, LPCC, LEP, plus their registered associates and trainees
Speech-Language Pathology BoardSLPABSLP, Audiologist, Hearing Aid Dispenser
Hearing Aid Dispensers BureauHADHearing Aid Dispenser
Dietetics & Nutrition BoardDIETRegistered Dietitian (where state-credentialed)

Building, engineering & design

Board / BureauCodeExamples
Contractors State License Board (CSLB)CSLBClass A General Engineering, Class B General Building, Class C specialty contractors (electrical C-10, plumbing C-36, HVAC C-20, roofing C-39, etc.)
Board of Professional Engineers, Land Surveyors & Geologists (BPELSG)BPELSGCivil, Electrical, Mechanical, Structural, Fire Protection, Geotechnical engineers; Land Surveyors; Geologists; Geophysicists
California Architects BoardCABLicensed Architect
Landscape Architects Technical CommitteeLATCLicensed Landscape Architect

Business & finance

Board / BureauCodeExamples
California Board of Accountancy (CBA)CBACertified Public Accountant (individual & firm)
Cemetery & Funeral BureauCFBCemetery brokers, funeral directors, embalmers, crematories

Personal care

Board / BureauCodeExamples
Board of Barbering & Cosmetology (BBC)BBCCosmetologist, Barber, Esthetician, Manicurist, Electrologist, Establishment
Board / BureauCodeExamples
Court Reporters BoardCRBCertified Shorthand Reporter
State Athletic CommissionSACBoxers, MMA fighters, promoters, matchmakers, seconds
Guide Dogs for the Blind BoardGDBGuide dog trainers, schools

The exact set of files in the Box folder shifts month to month as DCA reorganises its bureaus. The actor always reflects whatever DCA currently publishes.


Use Cases

Healthcare staffing, locum tenens & travel nursing

California is the largest healthcare labour market in the United States. Travel nursing, locum physician, locum pharmacist and allied-health agencies use this dataset to:

  • Verify a candidate's CA license before sending a credentialing packet to a hospital system or PBM.
  • Source candidates by metro — every active RN in Los Angeles, every active RPh in San Francisco, every active LMFT in San Diego.
  • Refresh expiration dates monthly so credentials never lapse mid-assignment and JCAHO / DNV audits stay clean.
  • Filter out cancelled and expired licensees automatically with licenseStatus: "current".
  • Cross-board match — find professionals dual-licensed as RN + LMFT or RPh + PharmD for high-tier assignments.

Compliance, credentialing & primary-source verification (PSV)

Hospital systems, PBMs, MSOs, telehealth platforms and Medicaid managed-care plans use bulk DCA data to:

  • Automate monthly PSV for every CA-licensed prescriber, dispenser or therapist on payroll.
  • Catch status changes (currentdelinquent / cancelled) within the monthly refresh window.
  • Maintain audit-ready logs with the scrapedAt timestamp on every record.
  • Replace expensive per-lookup verification subscriptions that bill per query.
  • Document due diligence for The Joint Commission, URAC, NCQA, DMHC and DEA inspector audits.
  • Detect dual practice — a pharmacist also showing up as a CSLB contractor is a red flag worth a closer look.

B2B sales & California-focused lead generation

Pharma reps, medical-device vendors, EHR / EMR companies, pharmacy management software, contractor SaaS, accounting tooling and PoS providers use the dataset to:

  • Build city- or county-targeted CA lead lists filtered by board, license type or facility size.
  • Identify newly issued licenses by diffing this month's run against last month's — fresh contractors, fresh CPAs, fresh dispensaries to onboard.
  • Route territory assignments by ZIP, county or DMA.
  • Enrich CRM records (Salesforce, HubSpot, Pipedrive, Apollo) with current license status, expiration and category.
  • Power direct-mail and door-knocker campaigns with verified business mailing addresses.

Construction tech & contractor lead generation (CSLB)

The Contractors State License Board (CSLB) regulates ~280,000 active California contractors. Construction-tech founders, materials suppliers, payment-app vendors, lien-management platforms and insurance brokers use the dataset to:

  • Find every active Class C-10 electrical contractor in Los Angeles County, every active Class C-36 plumber in Orange County, every active Class B general in the Bay Area.
  • Map contractor density by ZIP for last-mile sales territory planning.
  • Track newly licensed contractors — a freshly minted Class B in Sacramento is a perfect-fit prospect for a starter ERP, payment terminal or insurance package.
  • Spot expiring bonds and licenses for renewal-cycle marketing.
  • Validate sub-contractors before a GC adds them to a bid.

Real-estate, mortgage, title & insurance underwriting

Insurers, title companies and lenders use license validity as an underwriting signal:

  • Verify contractor licenses before bonding a project or writing a builder's-risk policy.
  • Confirm appraisers, surveyors and engineers are in current standing before relying on their attestations.
  • Adjust pricing for disciplinary or cancelled history automatically.
  • Monitor portfolio risk — flag insureds whose status flips mid-policy.
  • Geocode by county and ZIP to feed catastrophe-risk and wildfire-zone models.

M&A, due diligence & investor research

Private equity, family offices, search funds and corporate development teams use DCA data when underwriting California acquisitions:

  • Roll-up sourcing — every dental practice, every CPA firm, every Class B GC by county becomes a structured target list.
  • Pre-LOI verification — confirm the target's listed principals actually hold the licenses they claim.
  • Continuity diligence — for healthcare or contractor targets, check the responsible practitioner / qualifier has a clean status.
  • Market-sizing models — count active licensees by category to back into TAM for a SaaS thesis.
  • Post-close monitoring — watch the portfolio company's roster monthly for status drift.

Recruiting, sales-ops & talent sourcing

Recruiting platforms and outbound sales-ops teams use the dataset as a structured CA "people directory" for licensed professions:

  • Build candidate pipelines for CPA firms, hospital systems, dental DSOs, contractor roll-ups and law-adjacent (court reporter) businesses.
  • Match LinkedIn profiles against authoritative license data for outreach trust signals.
  • Enrich ATS records with current credential status.
  • Identify newly licensed professionals as warm leads for first-job recruiting.

Academic, public-health & policy research

Universities, state agencies and think tanks use DCA bulk data to:

  • Quantify healthcare-worker supply by county, ZIP and license type.
  • Map healthcare deserts — counties with low RN-per-capita, low RPh-per-capita or low MD-per-capita ratios.
  • Track licensure pipelines over time as DCA refreshes monthly.
  • Study disciplinary patterns — combine with each board's separate disciplinary roster.
  • Inform workforce-policy proposals with hard empirical data.

Investigative journalism & data reporting

Reporters covering healthcare, construction, finance, beauty industry and consumer protection use DCA data to:

  • Verify credentials of profile subjects before publication.
  • Map the geography of trades — pharmacy deserts, contractor concentration in fire-rebuild zones, CPA density in tax-prep season stories.
  • Cross-reference government contractor awards against CSLB records to spot mismatches.
  • Build interactive maps of licensed professionals for public-interest reporting.

Plaintiff and defense firms use the dataset to:

  • Confirm expert credentials before engagement.
  • Build chronologies of an individual's license history when combined with archived runs.
  • Identify every active dentist / pharmacist / contractor at a given address for litigation discovery.
  • Validate party allegations about license status during pleadings.

Sample Queries & Recipes

Recipe 1 — Every active CA pharmacist in Los Angeles County

{
"boards": ["Board of Pharmacy"],
"licenseTypes": ["Registered Pharmacist"],
"licenseStatus": "current",
"counties": ["Los Angeles"],
"stateFilter": "CA",
"maxRecords": 0
}

Recipe 2 — All active CSLB Class B general contractors in Southern California

{
"boards": ["Contractors State License Board"],
"licenseTypes": ["Class B"],
"licenseStatus": "current",
"counties": ["Los Angeles", "Orange", "San Diego", "Riverside", "San Bernardino", "Ventura"],
"stateFilter": "CA",
"maxRecords": 0
}

Recipe 3 — Every active Bay Area physician (Medical Board of California)

{
"boards": ["Medical Board of California"],
"licenseTypes": ["Physician and Surgeon"],
"licenseStatus": "current",
"counties": ["San Francisco", "San Mateo", "Alameda", "Santa Clara", "Contra Costa", "Marin"],
"stateFilter": "CA",
"maxRecords": 0
}

Recipe 4 — Active RNs in the Sacramento metro for travel-nursing recruiting

{
"boards": ["Board of Registered Nursing"],
"licenseTypes": ["Registered Nurse"],
"licenseStatus": "current",
"counties": ["Sacramento", "Placer", "El Dorado", "Yolo"],
"stateFilter": "CA",
"maxRecords": 50000
}

Recipe 5 — Every active CPA firm in California (Board of Accountancy)

{
"boards": ["Board of Accountancy"],
"licenseTypes": ["Public Accountancy Corporation", "Partnership"],
"licenseStatus": "current",
"stateFilter": "CA",
"maxRecords": 0
}

Recipe 6 — Dentists in Long Beach and Oakland for DSO M&A sourcing

{
"boards": ["Dental Board of California"],
"licenseTypes": ["Dentist"],
"licenseStatus": "current",
"counties": ["Los Angeles", "Alameda"],
"stateFilter": "CA"
}

Recipe 7 — Compliance sweep: every delinquent or cancelled licensee statewide

{
"licenseStatus": "delinquent",
"stateFilter": "CA",
"maxRecords": 0
}

Combine with a second run for cancelled and concatenate downstream.

Recipe 8 — Cosmetology establishments in Fresno County

{
"boards": ["Board of Barbering and Cosmetology"],
"licenseTypes": ["Establishment"],
"licenseStatus": "current",
"counties": ["Fresno"],
"stateFilter": "CA"
}

Recipe 9 — Quick 50-record sample for a new pipeline build

{
"boards": ["Board of Pharmacy"],
"maxRecords": 50
}

Recipe 10 — Out-of-state mail-order pharmacies licensed in California

{
"boards": ["Board of Pharmacy"],
"licenseTypes": ["Nonresident Pharmacy"],
"licenseStatus": "current",
"stateFilter": ""
}

Leave stateFilter empty (or omit it) and the run includes pharmacies physically located outside CA but holding a CA permit.


Integration Examples

Google Sheets (via Apify Integration)

  1. Set up an Apify schedule running the actor on the 5th of each month at 06:00 PT (DCA's bulk drop usually lands in the first week).
  2. Attach the "Export to Google Sheets" integration to the schedule.
  3. Receive a fresh CA license tab in your Sheet every month — ready for filtering, pivoting, or distribution to sales reps.

Make.com / Zapier / n8n

Use the Apify native connector. Trigger downstream automations on:

  • New records (current run minus previous run) → push to Slack or a CRM.
  • Status changes (currentcancelled) → open a Salesforce Case.
  • Address changes (relocations) → update HubSpot Company records.
  • Newly issued licenses by category → trigger an outbound email cadence.

Power BI / Tableau / Looker / Metabase

Add Apify's REST API as a data source. Refresh on schedule. Build dashboards covering:

  • Active licensee count by metro, county, ZIP, board.
  • CSLB contractor density per neighbourhood.
  • Healthcare-worker supply heat maps (RN, RPh, MD, DDS) per California county.
  • Monthly churn (newly cancelled vs. newly issued) by board.

Postgres / Snowflake / BigQuery / Redshift

Use the Apify webhook integration to POST run results directly to a warehouse ingestion endpoint. Suggested table layout:

CREATE TABLE ca_dca_licenses (
agency_code text,
agency_name text,
license_type_code text,
license_type_name text,
license_number text,
individual_or_org text,
last_name text,
first_name text,
middle_name text,
suffix text,
organization_name text,
address_line1 text,
address_line2 text,
city text,
county text,
state text,
zip text,
country text,
original_issue_date date,
expiration_date date,
license_status text,
license_category text,
scraped_at timestamptz,
PRIMARY KEY (agency_code, license_number, scraped_at)
);

Salesforce / HubSpot / Pipedrive CRM enrichment

Trigger an Apify run monthly, then upsert against Account / Contact records keyed on agency_code + license_number. Status-change events can create Tasks, open Cases, or post to a #compliance Slack channel automatically.

Webhooks & event triggers

Send each new run's results to an HTTP endpoint with the built-in Apify webhook. Use the licenseCategory field to route healthcare records to a credentialing service and engineering records to a contractor-onboarding service in the same run.

Esri ArcGIS / Mapbox / Kepler

Use state, county, city, zip, addressLine1 and addressLine2 as the geocode key. Each record becomes a point on a state-wide licensee map. Combine with U.S. Census tract data to study healthcare-access disparities.


Major California Metros at a Glance

Metro AreaPrimary CountiesPopulationNotable for licensing data
Los AngelesLos Angeles, Orange13.2MLargest CA healthcare market; ~3,000 pharmacies, 80K+ RNs, 30K+ MDs, ~50K CSLB contractors
San DiegoSan Diego3.3MBiotech, defense, large hospital systems, dense dental market
San Francisco Bay AreaSan Francisco, San Mateo, Alameda, Santa Clara, Contra Costa, Marin7.7MHighest CPA density in CA; Kaiser, UCSF, Stanford
San JoseSanta Clara2.0MEngineering-heavy: BPELSG licensees per capita is highest in the state
SacramentoSacramento, Placer, Yolo, El Dorado2.4MState-capital concentration of regulators, government health programs, BRN headquarters
FresnoFresno, Madera, Tulare1.2MCentral Valley healthcare hub, agriculture-adjacent contractor density
Long BeachLos Angeles0.5MPort-of-LA logistics, dense cosmetology and dental markets
OaklandAlameda0.4MKaiser HQ, dense BBC and BBS markets
Riverside / San Bernardino (Inland Empire)Riverside, San Bernardino4.7MBooming residential construction → CSLB density
BakersfieldKern0.9MOil-and-gas adjacent engineering and contractor licenses
AnaheimOrange0.4MHospitality, dental, cosmetology
Santa AnaOrange0.3MHealthcare, dental, contractors
StocktonSan Joaquin0.3MLogistics-driven CSLB activity
ModestoStanislaus0.6MCentral Valley healthcare and ag-services contractors

Cost & Performance

MetricValue
EngineBox Shared Items API + Playwright (Chromium) fallback for downloads
Runtime (single small board, e.g. Pharmacy)~1–3 minutes
Runtime (large board, e.g. CSLB, BRN, BBC)~5–15 minutes per file
Runtime (full bulk pull, all 36 boards)30–90 minutes, dominated by ZIP downloads
Cost per runVaries — pay-per-event scales with records delivered; small targeted runs cost cents, full pulls are still cheap by industry standards
Pricing modelPay-per-event (transparent line-item billing on Apify)
Data freshnessMonthly — DCA refreshes the Box folder roughly once a month
Auth requiredNone (Box folder is public)
Proxy requiredNo — supported but not needed
ConcurrencySafe to run multiple board-scoped configurations in parallel
Memory footprint2 GB minimum, 8 GB recommended for full-board pulls due to ZIP extraction
Storage temp footprintZIP extraction writes to /tmp/dca_extract_<ts>/ and cleans up after parsing

  • Public data only. Every field in this dataset is published by the California Department of Consumer Affairs at data.dca.ca.gov and the public Box folder under the California Public Records Act (Gov. Code §§ 7920.000 et seq.).
  • No PHI. The dataset contains no patient health information; it is licensing data, not clinical data. HIPAA does not apply.
  • No SSNs, no DOBs, no financial accounts. Only public license-related information is published.
  • Addresses are the address of record reported to DCA, typically a business or practice address. For solo practitioners and small contractors the address of record can be a home address; data consumers must apply judgement before mailing or door-knocking.
  • No email addresses or phone numbers. DCA does not publish licensee emails in the bulk file. Phone is occasionally present for facility-type records.
  • CCPA / GDPR — California licensing data is on the public record, but consumer-facing use of the data (B2C marketing, profiling) must comply with the CCPA / CPRA, and EU-resident usage must comply with GDPR. Compliance is the responsibility of the data consumer.
  • CAN-SPAM / TCPA — the dataset does not include emails; if you append phone numbers from other sources, TCPA/DNC compliance applies.
  • DCA Terms of Use — the actor accesses DCA's intended public publication on Box (which DCA explicitly distributes for re-use). Do not attempt to use it for unlawful purposes including identity fraud, stalking, harassment, or impersonation.

Important: California license data may not be used as a substitute for the legally required disciplinary lookup on each board's primary verification portal where a board-mandated check is required (e.g. CSLB lien purposes, Joint Commission credentialing). Use this dataset to scale routine workflows; defer to each board's primary verification UI when statute requires it.


Frequently Asked Questions

How fresh is the data?

DCA refreshes its Box bulk-data folder roughly once a month. The actor downloads the latest available file on each run, so worst-case staleness is the gap between the last DCA publication and your run.

Why monthly instead of daily?

DCA's bulk publication cadence is monthly. The boards' own search portals are real-time but rate-limited and CAPTCHA-protected. The actor optimises for scale (millions of records cheaply) rather than minute-by-minute freshness; combine with a per-license verification call on critical workflows if you need real-time confirmation.

How many records will I get?

A full unfiltered pull across all 36 boards returns roughly 3.3 million records, dominated by CSLB (~280K active contractors), BRN (~500K nurse licenses across active and historical), BBC (~700K cosmetology licenses including establishments), and the various health-care boards. Pre-filter heavily for targeted runs.

Does the actor need a Box account or login?

No. The folder is a public Box share. The Box API path works anonymously via the BoxApi: shared_link=… header; the Playwright path navigates to the public folder URL.

Do I need an Apify residential proxy?

No. Box does not rate-limit the public share endpoint for typical workloads. Apify Proxy is supported but not required; enable it only for very heavy parallel scheduling.

Why is maxRecords defaulted to 1,000?

So a first-time user does not accidentally trigger a multi-million-record pull. Set maxRecords: 0 for unlimited once you are confident in your filter set.

Does this scraper cover Board of Real Estate?

No. California real-estate license data is regulated by the Department of Real Estate (DRE), which is not part of DCA and publishes its data separately. This actor covers the 36 boards under the DCA umbrella. DRE is on the roadmap as a sibling actor.

Does this cover BAR (State Bar of California) attorney data?

No. Attorney licensing is regulated by the State Bar of California, which is independent of DCA. This actor does not include attorney records.

Does the dataset include disciplinary action history?

The bulk file shows current license status (current, delinquent, cancelled, etc.) but does not include the full disciplinary action narrative. For full disciplinary text, consult each board's public disciplinary documents — DCA publishes those separately and the actor's cancelled and delinquent status fields are a reliable filter for "needs further review".

Can I get NPI numbers for CA healthcare licensees?

NPI is issued federally by CMS / NPPES, not by DCA. Join license records to the NPPES NPI Registry on lastName + firstName + state (or by name + license-number lookup tables) to enrich.

Why does my CSLB run take longer than my Pharmacy run?

CSLB's data file is shipped as a large ZIP archive (often 50–150 MB) and contains 280K+ active contractors plus historical records. Extraction + parsing dominates runtime, and is much heavier than the Pharmacy file (~30 MB).

Does the actor deduplicate across boards?

No. A person may legitimately hold licenses across multiple boards (e.g. an RN who is also a pharmacist, or a contractor who is also an architect). Each board's record is preserved. Deduplicate on (agencyCode, licenseNumber) if you want one row per license.

Are out-of-state licensees included?

Yes — for boards that license out-of-state professionals (e.g. nonresident pharmacies, telehealth physicians, out-of-state contractors). Set stateFilter: "CA" to exclude them.

What if DCA changes the file format?

The actor fuzzy-matches header names AND falls back to positional mapping on the canonical 21-column DCA layout. Past header reformats have not broken the actor. If a future change does, file an issue on the Apify Store page and a patch will follow.

Can I schedule this on the Apify free plan?

Yes. The actor itself runs on the free tier — set a monthly Apify schedule on the 5th–7th of the month.

What export formats are supported?

JSON, CSV, Excel (XLSX), HTML, XML, RSS, and JSON Lines — directly from the Apify dataset view or the REST API.

Will this work for other US states?

Not this actor — DCA is California-specific. We maintain separate actors for Texas, Arizona, Washington, Virginia, Colorado, Minnesota, Ohio, Illinois, North Carolina, and federal sources. See Related Apify Actors below.

How do I report a bug or request a board that is missing?

Open an issue on the actor's Apify Store page or contact the developer directly through the Apify Console. Board additions usually ship within a release cycle.

What happens if a board temporarily ships an empty or corrupt file?

The actor logs the failure, skips the file, and continues with the next board. You receive partial output for the boards that succeeded. Re-run after DCA reposts the corrected file.

Does the actor write to disk?

Only /tmp/ for ZIP extraction (immediately cleaned up after parsing). All output goes to the Apify dataset; nothing is persisted locally beyond the run lifetime.


If you need licensing data from other US states or related regulatory bodies, the catalog below pairs naturally with this actor:


Comparison vs. Alternatives

ApproachSetup timeData freshnessCost (10K records)Schema normalisationCross-board coverage
This actor< 1 minuteMonthly bulk dropCents per runBuilt-in36 boards in one tool
Manual Box download10–20 min/board/monthMonthlyFreeNonePer board, manual
Per-board search UI scraping4–16 hours dev / boardReal-time but CAPTCHA-gatedSlow + IP-costPer boardBuild N scrapers
Custom Python + Playwright build8–16 hours devMonthlyFree + infra costDIYDIY
Paid PSV verification APIHours setupReal-time$100–500+/moYesLimited
DCA public-records requestDays–weeksStale by issueFree / variableNoneSingle response

Why Pay-Per-Event Pricing?

This actor uses pay-per-event pricing rather than a flat monthly subscription or per-Compute-Unit charge:

  • You pay only when the actor runs — no idle-month bills.
  • Charges scale with how much data you actually consume — a 50-record sample is essentially free.
  • Transparent, line-item billing inside the Apify console.
  • No monthly minimums and no commitment.
  • Free to evaluate — sample with maxRecords: 50 for pennies before committing to a full board pull.
  • Plays well with monthly cadence — DCA refreshes monthly, so you pay roughly 12 times a year for full freshness.

Changelog

VersionDateNotes
1.0.02026-05Initial public release — Box folder + Shared Items API ingestion, Playwright download fallback, ZIP extraction, fuzzy + positional header mapping, six-value status enum, per-board / per-county / per-state filtering, category tagging across 36 boards.

Keywords

California DCA license lookup · California Department of Consumer Affairs scraper · CA professional license verification · CSLB scraper · California Contractors State License Board data · California contractor license search · CSLB Class B general contractor lookup · CSLB Class A general engineering contractor data · California medical board lookup · MBC physician verification · California physician license scraper · BRN nursing license scraper · California registered nurse data · LVN BVNPT license lookup · California pharmacy license data · Board of Pharmacy California scraper · Registered Pharmacist California verification · pharmacy technician California · California dental board scraper · DDS license California · CA real estate appraiser license · California Board of Accountancy CPA lookup · CBA CPA firm directory · California cosmetology license data · Board of Barbering and Cosmetology BBC scraper · California optometrist license data · veterinary license California · LMFT LCSW LPCC California verification · California Board of Behavioral Sciences scraper · CA professional engineer license BPELSG · California architect license search · landscape architect California lookup · California court reporter license · acupuncturist California license · chiropractor California license verification · CA license bulk download · DCA bulk data Box.com · monthly California licensee dataset · California license compliance automation · California credentialing PSV · California license API · CA license CSV download · Los Angeles pharmacist database · San Diego physician database · San Francisco CPA directory · San Jose engineer database · Sacramento nurse directory · Fresno contractor database · Long Beach dental directory · Oakland behavioral sciences directory · California pharmacy real estate accountant license data · CA contractor lead generation · California healthcare workforce research · CA licensed professional B2B leads


Support

  • Bug reports: open an issue on the actor's Apify Store page.
  • Feature requests / new board additions: same place — please describe the board, the use case, and link to the source file if known.
  • Direct contact: reach the developer through the Apify developer profile.

If this actor saves you time, a 5-star rating on the Apify Store helps other California compliance, recruiting, sales, construction-tech and research teams discover it. Thank you.