Arizona ROC Contractor License Scraper avatar

Arizona ROC Contractor License Scraper

Pricing

from $6.00 / 1,000 results

Go to Apify Store
Arizona ROC Contractor License Scraper

Arizona ROC Contractor License Scraper

Scrape Arizona Registrar of Contractors (AZ ROC) license records. Search by license number, company name, qualifying party, city, license type and classification. Returns status, bond info, complaint history and full license details.

Pricing

from $6.00 / 1,000 results

Rating

0.0

(0)

Developer

Haketa

Haketa

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

15 hours ago

Last modified

Share

🏗 Arizona ROC Contractor License Scraper

Scrape the Arizona Registrar of Contractors (AZ ROC) public license database with full pagination support. Search by license number, company name, qualifying party, city, license type, or classification. Returns complete license details including bond status, complaint history, and personnel information.


Table of Contents


Overview

The AZ ROC public portal is built on Salesforce Experience Cloud with Lightning Web Components (LWC). It returns server-rendered HTML with a paginated results table. This actor:

  1. Navigates to the search page using a headless Chromium browser (Playwright).
  2. Fills in the search term and optional Advanced Search filters.
  3. Automatically paginates through all result pages until the last page is reached or the configured limit is hit.
  4. For each result, optionally visits the individual contractor detail page to collect the full record.
  5. Pushes all records to the Apify Dataset.

What Data Is Returned

Each scraped record contains:

FieldDescription
licenseNumberAZ ROC six-digit license number
licenseTypeRESIDENTIAL, COMMERCIAL, DUAL, or Specialty Dual
licenseStatusACTIVE, SUSPENDED, EXPIRED, REVOKED, or CANCELLED
businessNameRegistered business / company name
dbaNameDoing Business As name (if applicable)
qualifyingPartyIndividual responsible for the trade qualifications
personnelArray of all personnel with name and position
entityTypeLegal entity type (e.g. Corporation, LLC)
primaryClassificationPrimary classification code (e.g. B-1, R-11, C-37)
classificationDescDescription of the primary classification
classificationsAll classification codes and descriptions
city / state / zipBusiness location
phoneBusiness phone number
issuedDateLicense originally issued date (YYYY-MM-DD)
renewedThroughDateLicense valid through date (YYYY-MM-DD)
bondTypeType of surety bond
bondStatusACTIVE or INACTIVE
bondAmountSurety bond dollar amount
bondCompanySurety bond company name
bondNumberBond policy / certificate number
bondEffectiveDateBond start date (YYYY-MM-DD)
bondExpirationDateBond expiration date (YYYY-MM-DD)
openCasesNumber of currently open complaint cases
disciplinedCasesNumber of disciplined complaint cases
resolvedCasesNumber of resolved / settled cases
complaintCountTotal complaints (portal shows last 2 years)
complaintsArray of individual complaint records
profileUrlDirect URL to the contractor's ROC detail page
scrapedAtISO 8601 timestamp of when the record was scraped

Search Modes

You can combine multiple search modes in a single run. Each mode enqueues an independent search job:

1. License Number Lookup (fastest)

Direct lookup by AZ ROC license number. Leading zeros are added automatically (e.g. 12345012345).

{
"licenseNumbers": ["123456", "789012"]
}

Partial name matching is supported.

{
"companyNames": ["Acme Plumbing", "Desert HVAC"]
}

Search by the name of the individual responsible for the license.

{
"qualifyingPartyNames": ["John Smith", "Maria Garcia"]
}

Return all contractors registered in a given Arizona city. Works best with additional filters. For large cities (Phoenix, Tucson), pagination will collect thousands of records.

{
"cities": ["Scottsdale", "Mesa"],
"licenseStatus": "ACTIVE",
"licenseType": "RESIDENTIAL"
}

Full Pagination Support

Version 1.1 adds complete, automatic pagination through all results pages.

How it works

The AZ ROC portal displays results in a table with 10, 25, or 50 rows per page and a Next Page / Previous Page button. Previous versions of the scraper only ever read the first page of results, which could mean missing dozens or hundreds of matching records.

The updated performSearch() function now:

  1. Sets the items-per-page selector to 50 (configurable via resultsPerPage) immediately after the first results load.
  2. Collects all rows from the current page.
  3. Detects whether the Next Page button is present and enabled using multiple CSS selector fallbacks:
    • button[title="Next Page"]
    • button[aria-label="Next Page"]
    • button.nextPage
    • a[title="Next Page"]
    • li.next button
    • button:has-text("Next")
  4. Before clicking Next, captures a staleness marker (the text content of the first row).
  5. Clicks the Next button and polls the DOM until the first row changes, indicating the new page has rendered.
  6. Repeats until either no Next button is found (last page) or maxResultsPerSearch is reached.

This approach is resilient to network delays and Salesforce's asynchronous re-rendering.

Stopping early

Set maxResultsPerSearch to a positive integer to cap the number of records collected per search query. Set to 0 (default) for unlimited — the scraper will traverse every page.


Input Parameters

ParameterTypeDefaultDescription
licenseNumbersstring[][]AZ ROC license numbers to look up
companyNamesstring[][]Business names to search
qualifyingPartyNamesstring[][]Qualifying party names to search
citiesstring[][]Arizona city names to search
licenseTypestringALLFilter: ALL, RESIDENTIAL, COMMERCIAL, DUAL
licenseStatusstringALLFilter: ALL, ACTIVE, SUSPENDED, EXPIRED, REVOKED, CANCELLED
licenseClassificationstring""Filter by classification code (e.g. B-1, C-37)
maxResultsPerSearchinteger0Max records per search job (0 = unlimited, paginates all pages)
resultsPerPageinteger50Items per page: 10, 25, or 50
scrapeDetailPagebooleantrueVisit each contractor's detail page for full data
scrapeComplaintsbooleantrueInclude complaint history (requires scrapeDetailPage)
proxyConfigurationobjectApify ResidentialProxy settings
maxConcurrencyinteger3Maximum parallel browser tabs (1–10)

Example: Full city scrape with all filters

{
"cities": ["Phoenix"],
"licenseType": "COMMERCIAL",
"licenseStatus": "ACTIVE",
"licenseClassification": "B-1",
"maxResultsPerSearch": 0,
"resultsPerPage": 50,
"scrapeDetailPage": true,
"scrapeComplaints": true,
"maxConcurrency": 5,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Example: Batch license lookup

{
"licenseNumbers": ["100001", "200002", "300003"],
"scrapeDetailPage": true,
"scrapeComplaints": false,
"maxConcurrency": 3
}

Output Schema

Records are saved to the Apify Dataset. Each record is a flat JSON object. Arrays (personnel, classifications, complaints) are stored as JSON arrays.

Sample record

{
"licenseNumber": "123456",
"licenseType": "Specialty Dual",
"licenseStatus": "ACTIVE",
"businessName": "Desert Star HVAC LLC",
"dbaName": null,
"qualifyingParty": "John Michael Smith",
"personnel": [
{ "name": "John Michael Smith", "position": "Qualifying Party" },
{ "name": "Jane Smith", "position": "Member/Manager" }
],
"entityType": "Limited Liability Company",
"primaryClassification": "CR-39",
"classificationDesc": "Air Conditioning and Refrigeration",
"classifications": [
{ "code": "CR-39", "description": "Air Conditioning and Refrigeration" }
],
"city": "Scottsdale",
"state": "AZ",
"zip": "85251",
"phone": "(480) 555-0100",
"issuedDate": "2010-03-15",
"renewedThroughDate": "2026-03-31",
"bondType": "Contractor License Bond",
"bondStatus": "ACTIVE",
"bondAmount": "$ 15,000",
"bondCompany": "Western Surety Company",
"bondNumber": "123456789",
"bondEffectiveDate": "2024-04-01",
"bondExpirationDate": "2026-03-31",
"openCases": 0,
"disciplinedCases": 0,
"resolvedCases": 1,
"complaintCount": 1,
"complaints": [
{
"complaintId": "2023-00001",
"type": "Workmanship",
"outcome": "Resolved"
}
],
"profileUrl": "https://azroc.my.site.com/AZRoc/s/contractor-search?licenseId=a0o8y0000007D05AAE",
"scrapedAt": "2024-11-15T14:32:01.000Z"
}

Proxy Recommendations

The AZ ROC portal is hosted on Salesforce and uses Cloudflare / Akamai bot protection. To avoid detection and rate limiting:

  • Residential proxies (Apify Residential) are strongly recommended for production runs.
  • Datacenter proxies may trigger CAPTCHA challenges or return empty results.
  • For development and testing, no proxy may work, but is not reliable for large runs.

Configure via the proxyConfiguration input parameter.


Cost & Performance

ScenarioApprox. records/hrNotes
License number lookups300–500No pagination, direct detail page
Company name search200–400Small result sets, no pagination needed
City search (small city)150–3001–3 pages typically
City search (Phoenix ACTIVE)80–150Many pages + detail pages

Performance depends on proxy speed, portal response times, and maxConcurrency.

Cost tips

  • Set scrapeDetailPage: false to collect list-only data — much faster and cheaper.
  • Set scrapeComplaints: false to skip complaint parsing if not needed.
  • Set resultsPerPage: 50 (default) to minimize page loads.
  • Use maxResultsPerSearch to cap large open-ended searches.

Limitations

  • Complaint history: The AZ ROC portal only displays complaints from the prior two years. Older complaints are not accessible via the public portal.
  • No official API: The portal does not expose a public API. All data is parsed from server-rendered HTML and may break if Salesforce updates the DOM structure.
  • Pagination cap: The portal may limit total results per search to 500–1000 records depending on the query. Use narrow filters (classification, city, status) to stay within limits.
  • Bot protection: Salesforce bot protection may challenge scrapers. Residential proxies and human-like delays are used to mitigate this.

Technical Architecture

Actor Input
buildJobs()
│ Creates one job per search term / license number / city / qualifying party
PlaywrightCrawler (Crawlee)
├─► SEARCH requests (one per job)
│ │
│ ├─ Navigate to SEARCH_URL
│ ├─ Fill search input
│ ├─ Apply Advanced Search filters (if any)
│ ├─ Set items-per-page to 50
│ │
│ └─► performSearch()PAGINATION LOOP
│ │
│ ├─ parseResultsPage() ←── Page 1
│ ├─ clickNext() + waitForPageChange()
│ ├─ parseResultsPage() ←── Page 2
│ ├─ clickNext() + waitForPageChange()
│ ├─ ...
│ └─ Return all collected rows
└─► DETAIL requests (one per unique licenseId)
├─ Navigate to profileUrl
├─ parseDetailPage()
│ ├─ CONTRACTOR section → businessName, phone, status
│ ├─ LICENSE section → type, classification, dates
│ ├─ PERSONNEL section → qualifying party, all personnel
│ ├─ COMPLAINT section → case counts + complaint IDs
│ └─ BOND section → bond status, amount, company
└─ Actor.pushData(record)

Key design decisions

  • Staleness-based pagination: Instead of relying on unreliable network events, the scraper captures the first row's text content before clicking Next and polls until it changes. This works reliably across Salesforce's asynchronous LWC re-rendering.
  • Multiple Next-button selectors: The Salesforce portal's CSS classes vary between versions. Multiple fallback selectors ensure pagination detection remains robust.
  • Regex-based HTML parsing: Section content is extracted with targeted regex patterns rather than a full DOM parser, which is faster and avoids issues with Salesforce's deeply nested shadow DOM.
  • Deduplication via seen Set: Records are deduplicated by licenseId or licenseNumber across all search jobs to prevent duplicate dataset entries when multiple search terms match the same contractor.

Changelog

v1.1.0

  • Full pagination supportperformSearch() now loops through all result pages automatically.
  • ✅ Added resultsPerPage input parameter (10 / 25 / 50).
  • maxResultsPerSearch default changed to 0 (unlimited).
  • ✅ Staleness-based page-change detection for reliable Salesforce LWC pagination.
  • ✅ Multiple CSS selector fallbacks for Next button detection.
  • ✅ Bond expiration date now extracted from detail page.
  • ✅ All classification codes extracted into classifications array.

v1.0.0

  • Initial release with single-page search, detail page scraping, and bond/complaint parsing.