California DCA Professional License Scraper
Pricing
from $0.50 / 1,000 results
California DCA Professional License Scraper
Download and parse California professional license data from DCA's public Box folder. 3.3M+ active licenses across 36 boards — pharmacy, nursing, medical, dental, engineering and more. Monthly updated bulk data, no browser needed.
Pricing
from $0.50 / 1,000 results
Rating
0.0
(0)
Developer
Haketa
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 hours ago
Last modified
Categories
Share
California DCA Professional License Scraper — Bulk License Lookup for CSLB, BRN, Medical Board, Pharmacy, Dental, Accountancy & 30+ More Boards
The most comprehensive California Department of Consumer Affairs (DCA) license extractor on Apify. Download structured records for 3.3M+ active California professional licenses across 36 state boards and bureaus — pharmacists, registered nurses, physicians, dentists, contractors, engineers, accountants, real estate appraisers, cosmetologists and more — from DCA's official monthly bulk data drop on Box. No CAPTCHA juggling, no per-search rate limit, no per-board scraper rewrites.
What This Actor Does
The California DCA Professional License Scraper is a production-grade Apify Actor that downloads, normalises, filters, and serves the complete public licensing dataset published by the California Department of Consumer Affairs (DCA) — the umbrella agency that regulates professional licensing for the State of California across health care, building trades, engineering, accounting, personal care, legal support and more.
DCA publishes its license database as a monthly bulk drop on a public Box.com folder (dca.box.com/s/oss6hf8jys2bmgxqd2gdz7w4oepm2il9) — one pipe-delimited file per board or bureau. Each file is shipped with a non-trivial header that drifts between boards, sometimes ships zipped, and lives behind a JavaScript-rendered Box page that resists naive HTTP fetchers. This actor handles all of that automatically and returns a single unified JSON schema across every board, so a downstream pipeline does not have to special-case Pharmacy vs. CSLB vs. Medical vs. Dental.
In a typical run you receive structured records covering:
- Health care licensees — Registered Pharmacists, Pharmacy Technicians, Registered Nurses (BRN), Vocational Nurses (BVNPT), Physicians & Surgeons (MBC), Physician Assistants, Dentists & RDAs (DBC), Optometrists, Psychologists, Veterinarians, Chiropractors, Acupuncturists, Naturopathic Doctors, Podiatric Physicians, Respiratory Therapists, Occupational Therapists, Physical Therapists, Behavioral Sciences (LMFT/LCSW/LPCC), Speech-Language Pathologists, Dietitians, Hearing Aid Dispensers
- Building & engineering trades — California Contractors State License Board (CSLB) general and specialty contractors, Professional Engineers, Land Surveyors, Professional Geologists, Architects, Landscape Architects
- Business & finance — California Board of Accountancy (CPAs and CPA firms), Cemetery & Funeral Bureau, real-estate-adjacent appraisers
- Personal care — Board of Barbering & Cosmetology (BBC) licensees and establishments
- Legal & specialty — Court Reporters Board, Guide Dogs for the Blind, Athletic Commission, Pilots
Every record carries the issuing agency code, license type, license number, current status, original issue date, expiration date, full mailing address, county, state, ZIP and licensee or organisation name — the building blocks for compliance, sales, credentialing, recruiting, due-diligence, location intelligence and research workflows.
Why scrape California DCA yourself when this exists?
DCA publishes the data freely, but actually consuming it at scale is its own engineering project. Common pain points the actor solves out of the box:
- Box.com folder is JavaScript-rendered — a plain
curlagainst the share URL returns an empty shell. The actor talks to the Box Shared Items API and falls back to a real Chromium download flow when needed. - Per-board file naming drifts — files include spaces, dates, version suffixes and the occasional spelling variant. The actor uses fuzzy keyword matching to find the right file for the board you asked for.
- Some boards ship ZIPs, some ship raw pipe-delimited text — large files (CSLB, BRN, BBC) arrive as
.ziparchives. The actor detects thePKmagic header, extracts the largest non-summary entry, and returns the underlying text without OOM-ing on multi-hundred-megabyte payloads. - Pipe-delimited files with shifting headers — column counts and header spellings vary per board. The actor fuzzy-matches header names, and falls back to positional mapping when DCA temporarily ships a file without the canonical header row.
- CSLB alone has 280K+ active contractors — at the search UI you would need tens of thousands of paginated lookups; the bulk file solves it in one download.
- Status fields are inconsistent (
CURRENT,Current,current) — the actor normalises everything to lower-case canonical values that drive a clean enum filter. - No NPI, no SSN, no DOB — the bulk dataset is licensing-only, so no GLBA/HIPAA exposure if you treat it as the public record it is.
- No daily update API — DCA only refreshes monthly, so you must redownload everything. The actor makes a fresh pull cheap (no browser-per-search overhead).
- Address blocks differ — some boards split
ADDRESS_LINE_1 / ADDRESS_LINE_2, others ship a single combined block. The actor exposes both fields and never silently drops content. - Out-of-state licensees — mail-order pharmacies, telehealth doctors and out-of-state contractors all hold California licenses with non-CA addresses. The actor exposes a
stateFilterso you can choose to include or exclude them. - No incremental diff — DCA does not publish change-deltas. Schedule the actor monthly, archive the dataset, and diff yourself.
This actor encapsulates roughly eight to sixteen hours of one-off engineering — Box auth quirks, ZIP extraction, header normalisation, positional fallback, status canonicalisation, county filtering — into a single npm install of an actor.
Quick Start
One-Click Run (Apify UI)
- Open the actor page → click "Try for free".
- In the Boards field, type one or more board names (e.g.
Board of Pharmacy,Contractors State License Board,Board of Registered Nursing). Leave it empty to attempt every board. - Optionally narrow by License Type, License Status, County or State Filter, then set a sensible Max Records cap (default 1,000 — full bulk runs can return millions).
- Hit Start. Within a few minutes your dataset is ready as JSON, CSV, Excel, XML, RSS or HTML.
API Run (Python)
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("haketa/california-dca-license-scraper").call(run_input={"boards": ["Board of Pharmacy", "Board of Registered Nursing"],"licenseStatus": "current","counties": ["Los Angeles", "San Diego", "Orange"],"stateFilter": "CA","maxRecords": 5000,})for record in client.dataset(run["defaultDatasetId"]).iterate_items():print(record["agencyName"],record["licenseNumber"],record.get("lastName") or record.get("organizationName"),record["city"],record["licenseStatus"],)
API Run (Node.js / TypeScript)
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });const run = await client.actor('haketa/california-dca-license-scraper').call({boards: ['Contractors State License Board'],licenseStatus: 'current',counties: ['Los Angeles', 'Orange', 'Riverside', 'San Bernardino'],stateFilter: 'CA',maxRecords: 10000,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(`Got ${items.length} active SoCal CSLB contractors`);
API Run (cURL)
curl -X POST "https://api.apify.com/v2/acts/haketa~california-dca-license-scraper/runs?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"boards": ["Medical Board of California"],"licenseStatus": "current","counties": ["San Francisco", "San Mateo", "Alameda", "Santa Clara"],"maxRecords": 2000}'
API Run (Apify CLI)
apify call haketa/california-dca-license-scraper --input='{"boards": ["Board of Accountancy"],"licenseStatus": "current","stateFilter": "CA","maxRecords": 0}'
How It Works
Source of truth
DCA's public bulk-data folder lives at:
https://dca.box.com/s/oss6hf8jys2bmgxqd2gdz7w4oepm2il9
It contains one file per board or bureau, refreshed roughly every month. Most files are pipe-delimited (|) text; a handful (notably CSLB, BRN, BBC) are shipped as ZIP archives containing the same pipe-delimited data plus a small Counts / summary file.
Engineering pipeline
| Stage | Technique | Notes |
|---|---|---|
| 1. Folder listing | Box Shared Items API with the BoxApi: shared_link=… header | Returns each file's id, name, size. Falls back to parsing Box.prefetchedData embedded in the share page, then to regex extraction of {type:"file"} blobs. |
| 2. File matching | Fuzzy keyword match against the file name | Strips board of / bureau of / committee from the user's input and matches the first remaining token against the file name. |
| 3. Download | Playwright (Chromium, headless) navigating the Box folder UI | Box's download endpoint is signed, single-use, and JS-gated; a real browser session is the only reliable extraction path. Cookie consent banners are dismissed automatically. |
| 4. Format detection | First two bytes of the buffer | 0x50 0x4B triggers ZIP extraction via adm-zip; everything else is treated as UTF-8 text. |
| 5. ZIP extraction | adm-zip extractEntryTo to /tmp/dca_extract_<ts>/ | Picks the largest entry that is not *count* / *summary* / *readme* — that is consistently the actual data file. |
| 6. Header detection | Pipe / tab / comma sniff on first row | Real DCA files use ` |
| 7. Header mapping | Fuzzy match on AGENCY_CODE, LICENSE_NUMBER, ORG/LAST_NAME, ZIP_CODE etc. | Falls back to positional mapping (canonical 21-column DCA layout) when fewer than five header tokens are recognised. |
| 8. Record normalisation | Trim, strip wrapping quotes, null-coalesce empties | Output schema is identical across every board. |
| 9. Category tagging | Keyword-based board → category mapping | Buckets each record into healthcare, engineering, business, personal_care, legal or other. |
| 10. Filtering | Board / license-type / status / county / state in-memory passes | All filters are case-insensitive partial matches except state, which is an exact uppercase compare. |
| 11. Dataset push | Dataset.pushData(record) | Progress is logged every 500 records. Run terminates cleanly when maxRecords is hit. |
Notable behaviours
- Memory: the actor is configured with 2 GB minimum, 8 GB maximum because large boards (CSLB, BRN, BBC) ship 50–250 MB files; ZIP extraction goes through disk (
/tmp/) rather than RAM to keep the runtime stable. - Proxy: Apify Proxy is supported but not required — Box's public share endpoints do not rate-limit modest crawler traffic from datacentre IPs. Enable proxy only if you are running many parallel jobs from the same Apify region.
- Playwright fallback: if Playwright cannot be imported (e.g. on a stripped-down build), the actor falls back to direct file URL patterns such as
https://data.dca.ca.gov/DCALicenseData/<Board>.csvfor boards that happen to expose them. - Deduplication: DCA's files are already keyed by license number; the actor does not collapse cross-board duplicates because a person can legitimately hold licenses across multiple boards (e.g. an RN who is also an RPh).
Input Parameters
{"boards": ["Board of Pharmacy", "Board of Registered Nursing"],"licenseTypes": ["Registered Pharmacist", "Registered Nurse"],"licenseStatus": "current","counties": ["Los Angeles", "San Diego", "Orange"],"stateFilter": "CA","maxRecords": 5000,"proxyConfiguration": { "useApifyProxy": true }}
Parameter reference
| Parameter | Type | Default | Description |
|---|---|---|---|
boards | array<string> | ["Board of Pharmacy"] (prefilled) | Board / bureau names to download. Case-insensitive partial match against the file name. Leave empty to attempt every file in the Box folder. Examples: Board of Pharmacy, Medical Board of California, Contractors State License Board, Board of Accountancy, Dental Board of California, Board of Barbering and Cosmetology. |
licenseTypes | array<string> | [] | Substring filter on licenseTypeName. Examples: Registered Pharmacist, Pharmacy Technician, Registered Nurse, Vocational Nurse, Physician and Surgeon, Certified Public Accountant, Class A General Engineering Contractor. Empty array = no type filter. |
licenseStatus | string enum | all | One of all, current, delinquent, inactive, cancelled, expired. See the Status Reference table below for the meaning of each value. |
counties | array<string> | [] | Filter by California county. Case-insensitive partial match. Examples: Los Angeles, San Francisco, San Diego, Orange, Sacramento, Alameda, Santa Clara, Riverside, San Bernardino, Fresno. |
stateFilter | string | "" (empty = all states) | Two-letter state code. Set to CA to exclude out-of-state mail-order, telehealth or out-of-state contractor licensees. |
maxRecords | integer | 1000 | Cap total output across all boards. 0 = unlimited. The default 1,000 is a sane testing limit because a single full bulk pull can return millions of records and consume non-trivial compute. |
proxyConfiguration | object | { "useApifyProxy": true } (prefilled) | Standard Apify proxy config. Optional — Box does not rate-limit the share endpoint for typical workloads. |
Output Schema
Every record uses the same flat JSON schema across every board so downstream consumers do not need per-board branching. Both individual and organisation records share the same envelope; organisation-only or individual-only fields are simply null where they do not apply.
Field reference
| Field | Type | Always present | Description |
|---|---|---|---|
agencyCode | string | yes | DCA-issued 4-letter board / bureau code (e.g. PHA, RN, MBC, DBC, CBA). |
agencyName | string | yes | Full board / bureau name as published by DCA (e.g. Board of Pharmacy, Medical Board of California). |
licenseTypeCode | string | usually | Short type code (e.g. RPH, RN, PHY, DDS, CPA, B). |
licenseTypeName | string | usually | Full license-type label (e.g. Registered Pharmacist, Class B General Building Contractor). |
licenseNumber | string | yes | Board-issued license / registration number. Not always numeric — some boards prefix the type code. |
individualOrOrg | string | usually | I for individuals, O for organisations (firms, facilities, partnerships). |
lastName | string | null | individuals | Family name for individual licensees. |
firstName | string | null | individuals | Given name for individual licensees. |
middleName | string | null | individuals | Middle name / initial. |
suffix | string | null | individuals | Suffix such as JR, SR, II, MD, DO. |
organizationName | string | null | organisations | DBA / facility / firm name. |
addressLine1 | string | usually | First line of mailing address as reported to DCA. |
addressLine2 | string | null | sometimes | Suite, unit, building. |
city | string | usually | City. |
county | string | null | CA-resident records | California county. null for out-of-state licensees. |
state | string | usually | Two-letter US state abbreviation. |
zip | string | usually | 5- or 9-digit ZIP. |
country | string | usually | Country abbreviation (USA for almost all CA licensees). |
originalIssueDate | string | usually | Original license issue date. |
expirationDate | string | usually | Current expiration date. |
licenseStatus | string | yes | Status — see Status Reference below. |
licenseCategory | string | yes | Derived bucket: healthcare, engineering, business, personal_care, legal, other. |
scrapedAt | string (ISO-8601) | yes | UTC timestamp when this record was emitted. |
Example — Registered Pharmacist (Board of Pharmacy)
{"agencyCode": "PHA","agencyName": "Board of Pharmacy","licenseTypeCode": "RPH","licenseTypeName": "Registered Pharmacist","licenseNumber": "99999","individualOrOrg": "I","lastName": "RODRIGUEZ","firstName": "MARIA","middleName": "T","suffix": null,"organizationName": null,"addressLine1": "350 S GRAND AVE","addressLine2": "STE 1200","city": "Los Angeles","county": "Los Angeles","state": "CA","zip": "90071","country": "USA","originalIssueDate": "2014-05-22","expirationDate": "2026-09-30","licenseStatus": "current","licenseCategory": "healthcare","scrapedAt": "2026-05-16T09:00:00.000Z"}
Example — CSLB general contractor (Contractors State License Board)
{"agencyCode": "CSLB","agencyName": "Contractors State License Board","licenseTypeCode": "B","licenseTypeName": "Class B - General Building Contractor","licenseNumber": "999999","individualOrOrg": "O","lastName": null,"firstName": null,"middleName": null,"suffix": null,"organizationName": "PACIFIC COAST BUILDERS INC","addressLine1": "1450 HARBOR BLVD","addressLine2": null,"city": "San Diego","county": "San Diego","state": "CA","zip": "92101","country": "USA","originalIssueDate": "2009-11-04","expirationDate": "2026-11-30","licenseStatus": "current","licenseCategory": "engineering","scrapedAt": "2026-05-16T09:00:00.000Z"}
License Status Reference
DCA boards do not share a single status vocabulary. The actor normalises everything to lower-case canonical values, then maps to the six-value enum below.
Statuses that signal the licensee may legally operate
| Status | Meaning |
|---|---|
current | License is active, paid up, and in good standing. The default for almost every working California professional. |
Statuses that mean the licensee may NOT operate
| Status | Meaning |
|---|---|
delinquent | Renewal payment lapsed; licensee has a grace window to cure but cannot legally practice in the meantime. |
inactive | Voluntarily placed in an inactive bucket — common for retired CPAs, snow-bird physicians, and pharmacists in non-practising roles. |
cancelled | License terminated administratively or by board action. |
expired | License expired and was not renewed within the cure window. |
Tip: Use
licenseStatus: "current"to receive only practising licensees. The bulk file also contains historical statuses, so a request without a status filter returns the broader population for trend analytics.
DCA Boards & Bureaus Covered
The Box folder ships one file per board / bureau. The actor's category-tagger maps each board to a high-level bucket so downstream stacks can group records without parsing names.
Health care
| Board / Bureau | Code | Examples of license types |
|---|---|---|
| Board of Pharmacy | PHA | Registered Pharmacist, Pharmacy Technician, Pharmacy, Wholesaler, Compounding facility |
| Board of Registered Nursing (BRN) | RN | Registered Nurse, Nurse Practitioner, CNS, CNM, CRNA, Public Health Nurse |
| Board of Vocational Nursing & Psychiatric Technicians (BVNPT) | LVN | Licensed Vocational Nurse, Psychiatric Technician |
| Medical Board of California (MBC) | MBC | Physician & Surgeon (MD), Podiatric Physician, Midwife, Polysomnographic Tech |
| Physician Assistant Board | PAB | Physician Assistant |
| Osteopathic Medical Board | OMB | Osteopathic Physician & Surgeon (DO) |
| Dental Board of California (DBC) | DBC | Dentist (DDS), Registered Dental Assistant, Oral & Maxillofacial Surgeon |
| Dental Hygiene Board | DHBC | Registered Dental Hygienist, RDH in Alternative Practice |
| State Board of Optometry | OPT | Optometrist, Spectacle Lens Dispenser |
| Board of Psychology | PSY | Licensed Psychologist, Registered Psychology Assistant |
| Veterinary Medical Board | VMB | Veterinarian, Registered Veterinary Technician, Veterinary Premises |
| Board of Chiropractic Examiners | CHIRO | Doctor of Chiropractic |
| Acupuncture Board | ACU | Licensed Acupuncturist |
| Naturopathic Medicine Committee | NMC | Naturopathic Doctor |
| Respiratory Care Board | RCB | Respiratory Care Practitioner |
| Occupational Therapy Board | OTB | Occupational Therapist, COTA |
| Physical Therapy Board | PTB | Physical Therapist, PTA |
| Board of Behavioral Sciences (BBS) | BBS | LMFT, LCSW, LPCC, LEP, plus their registered associates and trainees |
| Speech-Language Pathology Board | SLPAB | SLP, Audiologist, Hearing Aid Dispenser |
| Hearing Aid Dispensers Bureau | HAD | Hearing Aid Dispenser |
| Dietetics & Nutrition Board | DIET | Registered Dietitian (where state-credentialed) |
Building, engineering & design
| Board / Bureau | Code | Examples |
|---|---|---|
| Contractors State License Board (CSLB) | CSLB | Class A General Engineering, Class B General Building, Class C specialty contractors (electrical C-10, plumbing C-36, HVAC C-20, roofing C-39, etc.) |
| Board of Professional Engineers, Land Surveyors & Geologists (BPELSG) | BPELSG | Civil, Electrical, Mechanical, Structural, Fire Protection, Geotechnical engineers; Land Surveyors; Geologists; Geophysicists |
| California Architects Board | CAB | Licensed Architect |
| Landscape Architects Technical Committee | LATC | Licensed Landscape Architect |
Business & finance
| Board / Bureau | Code | Examples |
|---|---|---|
| California Board of Accountancy (CBA) | CBA | Certified Public Accountant (individual & firm) |
| Cemetery & Funeral Bureau | CFB | Cemetery brokers, funeral directors, embalmers, crematories |
Personal care
| Board / Bureau | Code | Examples |
|---|---|---|
| Board of Barbering & Cosmetology (BBC) | BBC | Cosmetologist, Barber, Esthetician, Manicurist, Electrologist, Establishment |
Legal & specialty
| Board / Bureau | Code | Examples |
|---|---|---|
| Court Reporters Board | CRB | Certified Shorthand Reporter |
| State Athletic Commission | SAC | Boxers, MMA fighters, promoters, matchmakers, seconds |
| Guide Dogs for the Blind Board | GDB | Guide dog trainers, schools |
The exact set of files in the Box folder shifts month to month as DCA reorganises its bureaus. The actor always reflects whatever DCA currently publishes.
Use Cases
Healthcare staffing, locum tenens & travel nursing
California is the largest healthcare labour market in the United States. Travel nursing, locum physician, locum pharmacist and allied-health agencies use this dataset to:
- Verify a candidate's CA license before sending a credentialing packet to a hospital system or PBM.
- Source candidates by metro — every active RN in Los Angeles, every active RPh in San Francisco, every active LMFT in San Diego.
- Refresh expiration dates monthly so credentials never lapse mid-assignment and JCAHO / DNV audits stay clean.
- Filter out cancelled and expired licensees automatically with
licenseStatus: "current". - Cross-board match — find professionals dual-licensed as RN + LMFT or RPh + PharmD for high-tier assignments.
Compliance, credentialing & primary-source verification (PSV)
Hospital systems, PBMs, MSOs, telehealth platforms and Medicaid managed-care plans use bulk DCA data to:
- Automate monthly PSV for every CA-licensed prescriber, dispenser or therapist on payroll.
- Catch status changes (
current→delinquent/cancelled) within the monthly refresh window. - Maintain audit-ready logs with the
scrapedAttimestamp on every record. - Replace expensive per-lookup verification subscriptions that bill per query.
- Document due diligence for The Joint Commission, URAC, NCQA, DMHC and DEA inspector audits.
- Detect dual practice — a pharmacist also showing up as a CSLB contractor is a red flag worth a closer look.
B2B sales & California-focused lead generation
Pharma reps, medical-device vendors, EHR / EMR companies, pharmacy management software, contractor SaaS, accounting tooling and PoS providers use the dataset to:
- Build city- or county-targeted CA lead lists filtered by board, license type or facility size.
- Identify newly issued licenses by diffing this month's run against last month's — fresh contractors, fresh CPAs, fresh dispensaries to onboard.
- Route territory assignments by ZIP, county or DMA.
- Enrich CRM records (Salesforce, HubSpot, Pipedrive, Apollo) with current license status, expiration and category.
- Power direct-mail and door-knocker campaigns with verified business mailing addresses.
Construction tech & contractor lead generation (CSLB)
The Contractors State License Board (CSLB) regulates ~280,000 active California contractors. Construction-tech founders, materials suppliers, payment-app vendors, lien-management platforms and insurance brokers use the dataset to:
- Find every active Class C-10 electrical contractor in Los Angeles County, every active Class C-36 plumber in Orange County, every active Class B general in the Bay Area.
- Map contractor density by ZIP for last-mile sales territory planning.
- Track newly licensed contractors — a freshly minted Class B in Sacramento is a perfect-fit prospect for a starter ERP, payment terminal or insurance package.
- Spot expiring bonds and licenses for renewal-cycle marketing.
- Validate sub-contractors before a GC adds them to a bid.
Real-estate, mortgage, title & insurance underwriting
Insurers, title companies and lenders use license validity as an underwriting signal:
- Verify contractor licenses before bonding a project or writing a builder's-risk policy.
- Confirm appraisers, surveyors and engineers are in
currentstanding before relying on their attestations. - Adjust pricing for disciplinary or cancelled history automatically.
- Monitor portfolio risk — flag insureds whose status flips mid-policy.
- Geocode by county and ZIP to feed catastrophe-risk and wildfire-zone models.
M&A, due diligence & investor research
Private equity, family offices, search funds and corporate development teams use DCA data when underwriting California acquisitions:
- Roll-up sourcing — every dental practice, every CPA firm, every Class B GC by county becomes a structured target list.
- Pre-LOI verification — confirm the target's listed principals actually hold the licenses they claim.
- Continuity diligence — for healthcare or contractor targets, check the responsible practitioner / qualifier has a clean status.
- Market-sizing models — count active licensees by category to back into TAM for a SaaS thesis.
- Post-close monitoring — watch the portfolio company's roster monthly for status drift.
Recruiting, sales-ops & talent sourcing
Recruiting platforms and outbound sales-ops teams use the dataset as a structured CA "people directory" for licensed professions:
- Build candidate pipelines for CPA firms, hospital systems, dental DSOs, contractor roll-ups and law-adjacent (court reporter) businesses.
- Match LinkedIn profiles against authoritative license data for outreach trust signals.
- Enrich ATS records with current credential status.
- Identify newly licensed professionals as warm leads for first-job recruiting.
Academic, public-health & policy research
Universities, state agencies and think tanks use DCA bulk data to:
- Quantify healthcare-worker supply by county, ZIP and license type.
- Map healthcare deserts — counties with low RN-per-capita, low RPh-per-capita or low MD-per-capita ratios.
- Track licensure pipelines over time as DCA refreshes monthly.
- Study disciplinary patterns — combine with each board's separate disciplinary roster.
- Inform workforce-policy proposals with hard empirical data.
Investigative journalism & data reporting
Reporters covering healthcare, construction, finance, beauty industry and consumer protection use DCA data to:
- Verify credentials of profile subjects before publication.
- Map the geography of trades — pharmacy deserts, contractor concentration in fire-rebuild zones, CPA density in tax-prep season stories.
- Cross-reference government contractor awards against CSLB records to spot mismatches.
- Build interactive maps of licensed professionals for public-interest reporting.
Legal discovery & expert-witness vetting
Plaintiff and defense firms use the dataset to:
- Confirm expert credentials before engagement.
- Build chronologies of an individual's license history when combined with archived runs.
- Identify every active dentist / pharmacist / contractor at a given address for litigation discovery.
- Validate party allegations about license status during pleadings.
Sample Queries & Recipes
Recipe 1 — Every active CA pharmacist in Los Angeles County
{"boards": ["Board of Pharmacy"],"licenseTypes": ["Registered Pharmacist"],"licenseStatus": "current","counties": ["Los Angeles"],"stateFilter": "CA","maxRecords": 0}
Recipe 2 — All active CSLB Class B general contractors in Southern California
{"boards": ["Contractors State License Board"],"licenseTypes": ["Class B"],"licenseStatus": "current","counties": ["Los Angeles", "Orange", "San Diego", "Riverside", "San Bernardino", "Ventura"],"stateFilter": "CA","maxRecords": 0}
Recipe 3 — Every active Bay Area physician (Medical Board of California)
{"boards": ["Medical Board of California"],"licenseTypes": ["Physician and Surgeon"],"licenseStatus": "current","counties": ["San Francisco", "San Mateo", "Alameda", "Santa Clara", "Contra Costa", "Marin"],"stateFilter": "CA","maxRecords": 0}
Recipe 4 — Active RNs in the Sacramento metro for travel-nursing recruiting
{"boards": ["Board of Registered Nursing"],"licenseTypes": ["Registered Nurse"],"licenseStatus": "current","counties": ["Sacramento", "Placer", "El Dorado", "Yolo"],"stateFilter": "CA","maxRecords": 50000}
Recipe 5 — Every active CPA firm in California (Board of Accountancy)
{"boards": ["Board of Accountancy"],"licenseTypes": ["Public Accountancy Corporation", "Partnership"],"licenseStatus": "current","stateFilter": "CA","maxRecords": 0}
Recipe 6 — Dentists in Long Beach and Oakland for DSO M&A sourcing
{"boards": ["Dental Board of California"],"licenseTypes": ["Dentist"],"licenseStatus": "current","counties": ["Los Angeles", "Alameda"],"stateFilter": "CA"}
Recipe 7 — Compliance sweep: every delinquent or cancelled licensee statewide
{"licenseStatus": "delinquent","stateFilter": "CA","maxRecords": 0}
Combine with a second run for cancelled and concatenate downstream.
Recipe 8 — Cosmetology establishments in Fresno County
{"boards": ["Board of Barbering and Cosmetology"],"licenseTypes": ["Establishment"],"licenseStatus": "current","counties": ["Fresno"],"stateFilter": "CA"}
Recipe 9 — Quick 50-record sample for a new pipeline build
{"boards": ["Board of Pharmacy"],"maxRecords": 50}
Recipe 10 — Out-of-state mail-order pharmacies licensed in California
{"boards": ["Board of Pharmacy"],"licenseTypes": ["Nonresident Pharmacy"],"licenseStatus": "current","stateFilter": ""}
Leave stateFilter empty (or omit it) and the run includes pharmacies physically located outside CA but holding a CA permit.
Integration Examples
Google Sheets (via Apify Integration)
- Set up an Apify schedule running the actor on the 5th of each month at 06:00 PT (DCA's bulk drop usually lands in the first week).
- Attach the "Export to Google Sheets" integration to the schedule.
- Receive a fresh CA license tab in your Sheet every month — ready for filtering, pivoting, or distribution to sales reps.
Make.com / Zapier / n8n
Use the Apify native connector. Trigger downstream automations on:
- New records (current run minus previous run) → push to Slack or a CRM.
- Status changes (
current→cancelled) → open a Salesforce Case. - Address changes (relocations) → update HubSpot Company records.
- Newly issued licenses by category → trigger an outbound email cadence.
Power BI / Tableau / Looker / Metabase
Add Apify's REST API as a data source. Refresh on schedule. Build dashboards covering:
- Active licensee count by metro, county, ZIP, board.
- CSLB contractor density per neighbourhood.
- Healthcare-worker supply heat maps (RN, RPh, MD, DDS) per California county.
- Monthly churn (newly cancelled vs. newly issued) by board.
Postgres / Snowflake / BigQuery / Redshift
Use the Apify webhook integration to POST run results directly to a warehouse ingestion endpoint. Suggested table layout:
CREATE TABLE ca_dca_licenses (agency_code text,agency_name text,license_type_code text,license_type_name text,license_number text,individual_or_org text,last_name text,first_name text,middle_name text,suffix text,organization_name text,address_line1 text,address_line2 text,city text,county text,state text,zip text,country text,original_issue_date date,expiration_date date,license_status text,license_category text,scraped_at timestamptz,PRIMARY KEY (agency_code, license_number, scraped_at));
Salesforce / HubSpot / Pipedrive CRM enrichment
Trigger an Apify run monthly, then upsert against Account / Contact records keyed on agency_code + license_number. Status-change events can create Tasks, open Cases, or post to a #compliance Slack channel automatically.
Webhooks & event triggers
Send each new run's results to an HTTP endpoint with the built-in Apify webhook. Use the licenseCategory field to route healthcare records to a credentialing service and engineering records to a contractor-onboarding service in the same run.
Esri ArcGIS / Mapbox / Kepler
Use state, county, city, zip, addressLine1 and addressLine2 as the geocode key. Each record becomes a point on a state-wide licensee map. Combine with U.S. Census tract data to study healthcare-access disparities.
Major California Metros at a Glance
| Metro Area | Primary Counties | Population | Notable for licensing data |
|---|---|---|---|
| Los Angeles | Los Angeles, Orange | 13.2M | Largest CA healthcare market; ~3,000 pharmacies, 80K+ RNs, 30K+ MDs, ~50K CSLB contractors |
| San Diego | San Diego | 3.3M | Biotech, defense, large hospital systems, dense dental market |
| San Francisco Bay Area | San Francisco, San Mateo, Alameda, Santa Clara, Contra Costa, Marin | 7.7M | Highest CPA density in CA; Kaiser, UCSF, Stanford |
| San Jose | Santa Clara | 2.0M | Engineering-heavy: BPELSG licensees per capita is highest in the state |
| Sacramento | Sacramento, Placer, Yolo, El Dorado | 2.4M | State-capital concentration of regulators, government health programs, BRN headquarters |
| Fresno | Fresno, Madera, Tulare | 1.2M | Central Valley healthcare hub, agriculture-adjacent contractor density |
| Long Beach | Los Angeles | 0.5M | Port-of-LA logistics, dense cosmetology and dental markets |
| Oakland | Alameda | 0.4M | Kaiser HQ, dense BBC and BBS markets |
| Riverside / San Bernardino (Inland Empire) | Riverside, San Bernardino | 4.7M | Booming residential construction → CSLB density |
| Bakersfield | Kern | 0.9M | Oil-and-gas adjacent engineering and contractor licenses |
| Anaheim | Orange | 0.4M | Hospitality, dental, cosmetology |
| Santa Ana | Orange | 0.3M | Healthcare, dental, contractors |
| Stockton | San Joaquin | 0.3M | Logistics-driven CSLB activity |
| Modesto | Stanislaus | 0.6M | Central Valley healthcare and ag-services contractors |
Cost & Performance
| Metric | Value |
|---|---|
| Engine | Box Shared Items API + Playwright (Chromium) fallback for downloads |
| Runtime (single small board, e.g. Pharmacy) | ~1–3 minutes |
| Runtime (large board, e.g. CSLB, BRN, BBC) | ~5–15 minutes per file |
| Runtime (full bulk pull, all 36 boards) | 30–90 minutes, dominated by ZIP downloads |
| Cost per run | Varies — pay-per-event scales with records delivered; small targeted runs cost cents, full pulls are still cheap by industry standards |
| Pricing model | Pay-per-event (transparent line-item billing on Apify) |
| Data freshness | Monthly — DCA refreshes the Box folder roughly once a month |
| Auth required | None (Box folder is public) |
| Proxy required | No — supported but not needed |
| Concurrency | Safe to run multiple board-scoped configurations in parallel |
| Memory footprint | 2 GB minimum, 8 GB recommended for full-board pulls due to ZIP extraction |
| Storage temp footprint | ZIP extraction writes to /tmp/dca_extract_<ts>/ and cleans up after parsing |
Compliance, Privacy & Legal Notes
- Public data only. Every field in this dataset is published by the California Department of Consumer Affairs at data.dca.ca.gov and the public Box folder under the California Public Records Act (Gov. Code §§ 7920.000 et seq.).
- No PHI. The dataset contains no patient health information; it is licensing data, not clinical data. HIPAA does not apply.
- No SSNs, no DOBs, no financial accounts. Only public license-related information is published.
- Addresses are the address of record reported to DCA, typically a business or practice address. For solo practitioners and small contractors the address of record can be a home address; data consumers must apply judgement before mailing or door-knocking.
- No email addresses or phone numbers. DCA does not publish licensee emails in the bulk file. Phone is occasionally present for facility-type records.
- CCPA / GDPR — California licensing data is on the public record, but consumer-facing use of the data (B2C marketing, profiling) must comply with the CCPA / CPRA, and EU-resident usage must comply with GDPR. Compliance is the responsibility of the data consumer.
- CAN-SPAM / TCPA — the dataset does not include emails; if you append phone numbers from other sources, TCPA/DNC compliance applies.
- DCA Terms of Use — the actor accesses DCA's intended public publication on Box (which DCA explicitly distributes for re-use). Do not attempt to use it for unlawful purposes including identity fraud, stalking, harassment, or impersonation.
Important: California license data may not be used as a substitute for the legally required disciplinary lookup on each board's primary verification portal where a board-mandated check is required (e.g. CSLB lien purposes, Joint Commission credentialing). Use this dataset to scale routine workflows; defer to each board's primary verification UI when statute requires it.
Frequently Asked Questions
How fresh is the data?
DCA refreshes its Box bulk-data folder roughly once a month. The actor downloads the latest available file on each run, so worst-case staleness is the gap between the last DCA publication and your run.
Why monthly instead of daily?
DCA's bulk publication cadence is monthly. The boards' own search portals are real-time but rate-limited and CAPTCHA-protected. The actor optimises for scale (millions of records cheaply) rather than minute-by-minute freshness; combine with a per-license verification call on critical workflows if you need real-time confirmation.
How many records will I get?
A full unfiltered pull across all 36 boards returns roughly 3.3 million records, dominated by CSLB (~280K active contractors), BRN (~500K nurse licenses across active and historical), BBC (~700K cosmetology licenses including establishments), and the various health-care boards. Pre-filter heavily for targeted runs.
Does the actor need a Box account or login?
No. The folder is a public Box share. The Box API path works anonymously via the BoxApi: shared_link=… header; the Playwright path navigates to the public folder URL.
Do I need an Apify residential proxy?
No. Box does not rate-limit the public share endpoint for typical workloads. Apify Proxy is supported but not required; enable it only for very heavy parallel scheduling.
Why is maxRecords defaulted to 1,000?
So a first-time user does not accidentally trigger a multi-million-record pull. Set maxRecords: 0 for unlimited once you are confident in your filter set.
Does this scraper cover Board of Real Estate?
No. California real-estate license data is regulated by the Department of Real Estate (DRE), which is not part of DCA and publishes its data separately. This actor covers the 36 boards under the DCA umbrella. DRE is on the roadmap as a sibling actor.
Does this cover BAR (State Bar of California) attorney data?
No. Attorney licensing is regulated by the State Bar of California, which is independent of DCA. This actor does not include attorney records.
Does the dataset include disciplinary action history?
The bulk file shows current license status (current, delinquent, cancelled, etc.) but does not include the full disciplinary action narrative. For full disciplinary text, consult each board's public disciplinary documents — DCA publishes those separately and the actor's cancelled and delinquent status fields are a reliable filter for "needs further review".
Can I get NPI numbers for CA healthcare licensees?
NPI is issued federally by CMS / NPPES, not by DCA. Join license records to the NPPES NPI Registry on lastName + firstName + state (or by name + license-number lookup tables) to enrich.
Why does my CSLB run take longer than my Pharmacy run?
CSLB's data file is shipped as a large ZIP archive (often 50–150 MB) and contains 280K+ active contractors plus historical records. Extraction + parsing dominates runtime, and is much heavier than the Pharmacy file (~30 MB).
Does the actor deduplicate across boards?
No. A person may legitimately hold licenses across multiple boards (e.g. an RN who is also a pharmacist, or a contractor who is also an architect). Each board's record is preserved. Deduplicate on (agencyCode, licenseNumber) if you want one row per license.
Are out-of-state licensees included?
Yes — for boards that license out-of-state professionals (e.g. nonresident pharmacies, telehealth physicians, out-of-state contractors). Set stateFilter: "CA" to exclude them.
What if DCA changes the file format?
The actor fuzzy-matches header names AND falls back to positional mapping on the canonical 21-column DCA layout. Past header reformats have not broken the actor. If a future change does, file an issue on the Apify Store page and a patch will follow.
Can I schedule this on the Apify free plan?
Yes. The actor itself runs on the free tier — set a monthly Apify schedule on the 5th–7th of the month.
What export formats are supported?
JSON, CSV, Excel (XLSX), HTML, XML, RSS, and JSON Lines — directly from the Apify dataset view or the REST API.
Will this work for other US states?
Not this actor — DCA is California-specific. We maintain separate actors for Texas, Arizona, Washington, Virginia, Colorado, Minnesota, Ohio, Illinois, North Carolina, and federal sources. See Related Apify Actors below.
How do I report a bug or request a board that is missing?
Open an issue on the actor's Apify Store page or contact the developer directly through the Apify Console. Board additions usually ship within a release cycle.
What happens if a board temporarily ships an empty or corrupt file?
The actor logs the failure, skips the file, and continues with the next board. You receive partial output for the boards that succeeded. Re-run after DCA reposts the corrected file.
Does the actor write to disk?
Only /tmp/ for ZIP extraction (immediately cleaned up after parsing). All output goes to the Apify dataset; nothing is persisted locally beyond the run lifetime.
Related Apify Actors by Haketa
If you need licensing data from other US states or related regulatory bodies, the catalog below pairs naturally with this actor:
- Texas Pharmacy License Scraper — TSBP — Texas State Board of Pharmacy
- Arizona ROC Contractor License Scraper — Arizona Registrar of Contractors
- Washington L&I Contractor License Scraper — Washington Department of Labor & Industries
- NC Licensing Board for General Contractors Scraper — North Carolina general contractors
- Colorado Professional License Scraper — Colorado DORA
- Virginia DPOR Professional License Scraper — Virginia Dept. of Professional & Occupational Regulation
- Minnesota DLI Professional License Scraper — Minnesota Dept. of Labor & Industry
- Ohio eLicense Scraper — Ohio professional licenses
- Illinois IDFPR License Scraper — Illinois Dept. of Financial & Professional Regulation
- TTB Alcohol Permittee Scraper — federal alcohol permittees
- SAM.gov Federal Contractor Entity Scraper — federal contractor registry
- BBB Business Scraper — Better Business Bureau profiles
- Care.com Caregiver Scraper — companion to BBS records for caregiver workforce research
- WhatClinic.com Clinic Scraper — global clinic directory data
Comparison vs. Alternatives
| Approach | Setup time | Data freshness | Cost (10K records) | Schema normalisation | Cross-board coverage |
|---|---|---|---|---|---|
| This actor | < 1 minute | Monthly bulk drop | Cents per run | Built-in | 36 boards in one tool |
| Manual Box download | 10–20 min/board/month | Monthly | Free | None | Per board, manual |
| Per-board search UI scraping | 4–16 hours dev / board | Real-time but CAPTCHA-gated | Slow + IP-cost | Per board | Build N scrapers |
| Custom Python + Playwright build | 8–16 hours dev | Monthly | Free + infra cost | DIY | DIY |
| Paid PSV verification API | Hours setup | Real-time | $100–500+/mo | Yes | Limited |
| DCA public-records request | Days–weeks | Stale by issue | Free / variable | None | Single response |
Why Pay-Per-Event Pricing?
This actor uses pay-per-event pricing rather than a flat monthly subscription or per-Compute-Unit charge:
- You pay only when the actor runs — no idle-month bills.
- Charges scale with how much data you actually consume — a 50-record sample is essentially free.
- Transparent, line-item billing inside the Apify console.
- No monthly minimums and no commitment.
- Free to evaluate — sample with
maxRecords: 50for pennies before committing to a full board pull. - Plays well with monthly cadence — DCA refreshes monthly, so you pay roughly 12 times a year for full freshness.
Changelog
| Version | Date | Notes |
|---|---|---|
| 1.0.0 | 2026-05 | Initial public release — Box folder + Shared Items API ingestion, Playwright download fallback, ZIP extraction, fuzzy + positional header mapping, six-value status enum, per-board / per-county / per-state filtering, category tagging across 36 boards. |
Keywords
California DCA license lookup · California Department of Consumer Affairs scraper · CA professional license verification · CSLB scraper · California Contractors State License Board data · California contractor license search · CSLB Class B general contractor lookup · CSLB Class A general engineering contractor data · California medical board lookup · MBC physician verification · California physician license scraper · BRN nursing license scraper · California registered nurse data · LVN BVNPT license lookup · California pharmacy license data · Board of Pharmacy California scraper · Registered Pharmacist California verification · pharmacy technician California · California dental board scraper · DDS license California · CA real estate appraiser license · California Board of Accountancy CPA lookup · CBA CPA firm directory · California cosmetology license data · Board of Barbering and Cosmetology BBC scraper · California optometrist license data · veterinary license California · LMFT LCSW LPCC California verification · California Board of Behavioral Sciences scraper · CA professional engineer license BPELSG · California architect license search · landscape architect California lookup · California court reporter license · acupuncturist California license · chiropractor California license verification · CA license bulk download · DCA bulk data Box.com · monthly California licensee dataset · California license compliance automation · California credentialing PSV · California license API · CA license CSV download · Los Angeles pharmacist database · San Diego physician database · San Francisco CPA directory · San Jose engineer database · Sacramento nurse directory · Fresno contractor database · Long Beach dental directory · Oakland behavioral sciences directory · California pharmacy real estate accountant license data · CA contractor lead generation · California healthcare workforce research · CA licensed professional B2B leads
Support
- Bug reports: open an issue on the actor's Apify Store page.
- Feature requests / new board additions: same place — please describe the board, the use case, and link to the source file if known.
- Direct contact: reach the developer through the Apify developer profile.
If this actor saves you time, a 5-star rating on the Apify Store helps other California compliance, recruiting, sales, construction-tech and research teams discover it. Thank you.