Arizona ROC Contractor License Scraper avatar

Arizona ROC Contractor License Scraper

Pricing

from $6.00 / 1,000 results

Go to Apify Store
Arizona ROC Contractor License Scraper

Arizona ROC Contractor License Scraper

Scrape Arizona Registrar of Contractors (AZ ROC) license records. Search by license number, company name, qualifying party, city, license type and classification. Returns status, bond info, complaint history and full license details.

Pricing

from $6.00 / 1,000 results

Rating

0.0

(0)

Developer

Haketa

Haketa

Maintained by Community

Actor stats

0

Bookmarked

7

Total users

4

Monthly active users

21 days ago

Last modified

Share

🏗 Arizona ROC Contractor License Scraper

Scrape the Arizona Registrar of Contractors (AZ ROC) public license database with full pagination support. Search by license number, company name, qualifying party, city, license type, or classification. Returns complete license details including bond status, complaint history, and personnel information.


What's new in v1.2

  • Correct pagination: Identifies Next/Prev buttons via their CSS classes (right-btn / left-btn) and SVG direction instead of brittle title="Next Page" attributes (which don't exist in the live DOM).
  • Correct page-size options: The native page-size <select> offers 10 / 20 / 50 (not 25 as previously assumed). Default is 50.
  • Proper rowspan handling: A single business may hold 2 or 3 licences in adjacent rows with rowspan=N on the Business Name / Name&Title / Address / Phone cells. The parser now correctly associates every child licence row with its business context.
  • Separator-row skipping: The grey <td colspan="9" data-label="line"> separator rows between business blocks are detected and ignored.
  • Client-side pagination confirmed: Pagination triggers no network request — clicks on Next simply re-render the DOM from an already-loaded result set. We use a staleness-based wait (first-row-text changes) to detect successful page flips.

Overview

The AZ ROC public portal is built on Salesforce Experience Cloud with Lightning Web Components. It returns server-rendered HTML with a paginated results table. This actor:

  1. Navigates to the search page using a headless Chromium browser (Playwright).
  2. Fills in the search term and optional Advanced Search filters.
  3. Sets page size to 50 using the native <select> element.
  4. Automatically paginates through all result pages by clicking Next until the button is disabled.
  5. For each result, optionally visits the individual contractor detail page to collect the full record.
  6. Pushes all records to the Apify Dataset.

Field Reference

FieldDescription
licenseNumberAZ ROC six-digit license number
licenseTypeRESIDENTIAL, COMMERCIAL, DUAL, or Specialty Dual
licenseStatusACTIVE, SUSPENDED, EXPIRED, REVOKED, or CANCELLED
businessNameRegistered business / company name
dbaNameDoing Business As name (if applicable)
qualifyingPartyIndividual responsible for the trade qualifications
personnelArray of all personnel with name and role
entityTypeLegal entity type (e.g. Corporation, LLC)
primaryClassificationPrimary classification code (e.g. B-1, R-11, C-37)
classificationDescDescription of the primary classification
classificationsAll classification codes and descriptions
city / state / zipBusiness location
phoneBusiness phone number
issuedDateLicense originally issued date (YYYY-MM-DD)
renewedThroughDateLicense valid through date (YYYY-MM-DD)
bondTypeType of surety bond
bondStatusACTIVE or INACTIVE
bondAmountSurety bond dollar amount
bondCompanySurety bond company name
bondNumberBond policy / certificate number
bondEffectiveDateBond start date (YYYY-MM-DD)
bondExpirationDateBond expiration date (YYYY-MM-DD)
openCasesNumber of currently open complaint cases
disciplinedCasesNumber of disciplined complaint cases
resolvedCasesNumber of resolved / settled cases
complaintCountTotal complaints (portal shows last 2 years)
complaintsArray of individual complaint records
profileUrlDirect URL to the contractor's ROC detail page
scrapedAtISO 8601 timestamp of when the record was scraped

Search Modes

You can combine multiple search modes in a single run. Each mode enqueues an independent search job:

1. License Number Lookup (fastest)

Direct lookup by AZ ROC license number. Leading zeros are added automatically.

{ "licenseNumbers": ["123456", "789012"] }

Partial name matching is supported.

{ "companyNames": ["Acme Plumbing", "Desert HVAC"] }
{ "qualifyingPartyNames": ["John Smith", "Maria Garcia"] }

Large cities yield thousands of records across many pages — all are traversed.

{
"cities": ["Scottsdale", "Mesa"],
"licenseStatus": "ACTIVE",
"licenseType": "RESIDENTIAL"
}

Full Pagination Support

How it works (v1.2)

AZ ROC displays results in a Salesforce LWC table with 10, 20, or 50 rows per page and Next / Previous buttons. The buttons are identified by their class names and inner SVG direction — not by title or aria-label (which are absent):

<!-- Next button -->
<button type="button" class="slds-button slds-button_neutral right-btn">
<svg data-key="right"></svg>
</button>
<!-- Prev button -->
<button type="button" class="slds-button slds-button_neutral left-btn">
<svg data-key="left"></svg>
</button>

The scraper:

  1. Sets <select class="slds-select"> to 50 and dispatches a native change event to force Aura to rebind.
  2. Parses the current table's HTML into flat records (handling rowspan layouts).
  3. Locates the Next button via button.slds-button.right-btn.
  4. Checks disabled / aria-disabled / disabled class to know if we've reached the last page.
  5. If enabled, captures the first data row's text content as a staleness marker, clicks Next, and polls the DOM until the first row text changes.
  6. Repeats until the Next button is disabled or maxResultsPerSearch is reached.

Pagination does not trigger network requests — AZ ROC pre-loads the entire result set after search and pagination is purely DOM-level. This makes the click loop fast and reliable.

Handling rowspan layouts

A single business may hold multiple licences, rendered with rowspan:

┌────────────────────┬─────────────────┬──────────────────┬─────────┬──────┬────────┬─────────┬──────┬─────────┐
│ Business Name │ Name & Title │ License No │ Class │ QP │ Status │ Address │ Phone│ MoreInfo│ ← row 1
rowspan=2rowspan=2 │ ROC 333239 │ CR-61 │ … │ Active │ City,AZ │ … │ More │
├ ┼ ├──────────────────┼─────────┼──────┼────────┤ ┼ ├─────────┤
(merged)(merged) │ ROC 335547 │ CR-3 │ … │ Active │ (merged)(m) │ More │ ← row 2
└────────────────────┴─────────────────┴──────────────────┴─────────┴──────┴────────┴─────────┴──────┴─────────┘
<td colspan="9" data-label="line"> ← grey separator (skipped)

The parser tracks the most recent business block context. Rows without a Business Name cell are treated as additional licences belonging to the most recently-seen business.

Stopping early

Set maxResultsPerSearch to a positive integer to cap the number of records collected per search query. Set to 0 (default) for unlimited.


Input Parameters

ParameterTypeDefaultDescription
licenseNumbersstring[][]AZ ROC license numbers
companyNamesstring[][]Business names
qualifyingPartyNamesstring[][]Qualifying party names
citiesstring[][]Arizona city names
licenseTypestringALLALL / RESIDENTIAL / COMMERCIAL / DUAL
licenseStatusstringALLALL / ACTIVE / SUSPENDED / EXPIRED / REVOKED / CANCELLED
licenseClassificationstring""Classification code filter (e.g. B-1, CR-39)
maxResultsPerSearchinteger00 = unlimited (paginates all pages)
resultsPerPageinteger5010, 20, or 50
scrapeDetailPagebooleantrueVisit each detail page for full data
scrapeComplaintsbooleantrueInclude complaint history
proxyConfigurationobjectApify ResidentialProxy settings
maxConcurrencyinteger3Parallel browser tabs (1–10)

Example: Full city scrape

{
"cities": ["Phoenix"],
"licenseType": "COMMERCIAL",
"licenseStatus": "ACTIVE",
"licenseClassification": "B-1",
"maxResultsPerSearch": 0,
"resultsPerPage": 50,
"scrapeDetailPage": true,
"scrapeComplaints": true,
"maxConcurrency": 5,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Example: Batch license lookup

{
"licenseNumbers": ["333282", "333239", "333681"],
"scrapeDetailPage": true,
"scrapeComplaints": false,
"maxConcurrency": 3
}

Output Schema

Records are saved to the Apify Dataset as flat JSON objects. Arrays (personnel, classifications, complaints) are stored as JSON arrays.

Sample record (list-only, scrapeDetailPage: false)

{
"licenseId": "a0o8y0000004CQAAA2",
"licenseNumber": "333282",
"businessName": "Grey Wolf Drywall LLC",
"dbaName": null,
"qualifyingParty": "Alfonso Lopez",
"personnel": [
{ "name": "Alfonso Lopez", "role": "Member" },
{ "name": "Alfonso Lopez", "role": "Qualifying Party" }
],
"primaryClassification": "CR-10",
"classificationDesc": "Drywall",
"licenseStatus": "Active",
"city": "Waddell",
"state": "AZ",
"zip": "85355",
"phone": "(602) 317-1895",
"profileUrl": "https://azroc.my.site.com/AZRoc/s/contractor-search?licenseId=a0o8y0000004CQAAA2"
}

Sample record (with detail page, scrapeDetailPage: true)

Adds: entityType, issuedDate, renewedThroughDate, bondType, bondStatus, bondAmount, bondCompany, bondNumber, bondEffectiveDate, bondExpirationDate, openCases, disciplinedCases, resolvedCases, complaintCount, complaints, plus classifications (all licences for the business).


Proxy Recommendations

AZ ROC is hosted on Salesforce with standard bot protection. For production runs:

  • Residential proxies (Apify Residential) are strongly recommended.
  • Datacenter proxies may work for small runs but can trigger challenges.
  • For development/testing, no proxy may work but is not reliable.

Cost & Performance

ScenarioApprox records/hrNotes
License number lookups300-500No pagination; direct detail page
Company name search200-400Small result sets
City search (small)150-3001-3 pages typically
City search (Phoenix ACTIVE)80-150Many pages + detail pages

Cost tips:

  • scrapeDetailPage: false → list-only, ~3× faster.
  • scrapeComplaints: false → skips complaint parsing.
  • resultsPerPage: 50 → fewest Next clicks.
  • Use narrow filters (classification, city, status) to stay within result caps.

Limitations

  • Complaint history: AZ ROC portal only displays complaints from the prior two years.
  • No official API: All data parsed from rendered HTML — may break if Salesforce updates the DOM.
  • Pagination caps: Very broad queries (e.g. all Phoenix contractors, no filters) may exceed 1000-1500 records; use filters to narrow.
  • Bot protection: Salesforce may rate-limit; residential proxies and delays are used to mitigate.

Technical Architecture

Actor Input
buildJobs() ── one job per search term
PlaywrightCrawler (Crawlee + fingerprint generator)
├─► SEARCH requests
│ │
│ ├─ goto SEARCH_URL
│ ├─ fill input + optional Advanced Search filters
│ ├─ click Search → wait for first License No row
│ ├─ setPageSize(50) — dispatch native 'change' event on <select>
│ │
│ └─► PAGINATION LOOP
│ │
│ ├─ getTableHtml()
│ ├─ parseResultsFromHtml() ← handles rowspan + separator rows
│ ├─ getNextButton() ← via .right-btn class + disabled check
│ │ └─ if null, break (last page)
│ ├─ snapshot first row text
│ ├─ click Next, waitForPageChange()
│ └─ continue
└─► DETAIL requests (one per unique licenceId)
├─ goto profileUrl
├─ parseDetailPage()
│ ├─ CONTRACTOR section → businessName, phone, status
│ ├─ LICENSE section → type, classification, dates
│ ├─ PERSONNEL section → qualifying party
│ ├─ COMPLAINT section → case counts + complaint IDs
│ └─ BOND section → bond status, amount, company
└─ Actor.pushData()

Changelog

v1.2.0 (current)

  • ✅ Fixed Next/Prev button detection (class-based, not title-based).
  • ✅ Fixed page-size options (10 / 20 / 50).
  • ✅ Fixed rowspan handling — correct association of child licences with their business context.
  • ✅ Separator rows (data-label="line") are skipped.
  • ✅ Native change event dispatched after page-size change.

v1.1.0

  • Full pagination support with staleness-based change detection.
  • resultsPerPage input parameter.
  • Multiple selector fallbacks for Next button.

v1.0.0

  • Initial release.