Arizona ROC Contractor License Scraper
Pricing
from $6.00 / 1,000 results
Arizona ROC Contractor License Scraper
Scrape Arizona Registrar of Contractors (AZ ROC) license records. Search by license number, company name, qualifying party, city, license type and classification. Returns status, bond info, complaint history and full license details.
Pricing
from $6.00 / 1,000 results
Rating
0.0
(0)
Developer
Haketa
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
15 hours ago
Last modified
Categories
Share
🏗 Arizona ROC Contractor License Scraper
Scrape the Arizona Registrar of Contractors (AZ ROC) public license database with full pagination support. Search by license number, company name, qualifying party, city, license type, or classification. Returns complete license details including bond status, complaint history, and personnel information.
Table of Contents
- Overview
- What Data Is Returned
- Search Modes
- Full Pagination Support
- Input Parameters
- Output Schema
- Usage Examples
- Proxy Recommendations
- Cost & Performance
- Limitations
- Technical Architecture
- Changelog
Overview
The AZ ROC public portal is built on Salesforce Experience Cloud with Lightning Web Components (LWC). It returns server-rendered HTML with a paginated results table. This actor:
- Navigates to the search page using a headless Chromium browser (Playwright).
- Fills in the search term and optional Advanced Search filters.
- Automatically paginates through all result pages until the last page is reached or the configured limit is hit.
- For each result, optionally visits the individual contractor detail page to collect the full record.
- Pushes all records to the Apify Dataset.
What Data Is Returned
Each scraped record contains:
| Field | Description |
|---|---|
licenseNumber | AZ ROC six-digit license number |
licenseType | RESIDENTIAL, COMMERCIAL, DUAL, or Specialty Dual |
licenseStatus | ACTIVE, SUSPENDED, EXPIRED, REVOKED, or CANCELLED |
businessName | Registered business / company name |
dbaName | Doing Business As name (if applicable) |
qualifyingParty | Individual responsible for the trade qualifications |
personnel | Array of all personnel with name and position |
entityType | Legal entity type (e.g. Corporation, LLC) |
primaryClassification | Primary classification code (e.g. B-1, R-11, C-37) |
classificationDesc | Description of the primary classification |
classifications | All classification codes and descriptions |
city / state / zip | Business location |
phone | Business phone number |
issuedDate | License originally issued date (YYYY-MM-DD) |
renewedThroughDate | License valid through date (YYYY-MM-DD) |
bondType | Type of surety bond |
bondStatus | ACTIVE or INACTIVE |
bondAmount | Surety bond dollar amount |
bondCompany | Surety bond company name |
bondNumber | Bond policy / certificate number |
bondEffectiveDate | Bond start date (YYYY-MM-DD) |
bondExpirationDate | Bond expiration date (YYYY-MM-DD) |
openCases | Number of currently open complaint cases |
disciplinedCases | Number of disciplined complaint cases |
resolvedCases | Number of resolved / settled cases |
complaintCount | Total complaints (portal shows last 2 years) |
complaints | Array of individual complaint records |
profileUrl | Direct URL to the contractor's ROC detail page |
scrapedAt | ISO 8601 timestamp of when the record was scraped |
Search Modes
You can combine multiple search modes in a single run. Each mode enqueues an independent search job:
1. License Number Lookup (fastest)
Direct lookup by AZ ROC license number. Leading zeros are added automatically (e.g. 12345 → 012345).
{"licenseNumbers": ["123456", "789012"]}
2. Company / Business Name Search
Partial name matching is supported.
{"companyNames": ["Acme Plumbing", "Desert HVAC"]}
3. Qualifying Party Search
Search by the name of the individual responsible for the license.
{"qualifyingPartyNames": ["John Smith", "Maria Garcia"]}
4. City Search
Return all contractors registered in a given Arizona city. Works best with additional filters. For large cities (Phoenix, Tucson), pagination will collect thousands of records.
{"cities": ["Scottsdale", "Mesa"],"licenseStatus": "ACTIVE","licenseType": "RESIDENTIAL"}
Full Pagination Support
Version 1.1 adds complete, automatic pagination through all results pages.
How it works
The AZ ROC portal displays results in a table with 10, 25, or 50 rows per page and a Next Page / Previous Page button. Previous versions of the scraper only ever read the first page of results, which could mean missing dozens or hundreds of matching records.
The updated performSearch() function now:
- Sets the items-per-page selector to
50(configurable viaresultsPerPage) immediately after the first results load. - Collects all rows from the current page.
- Detects whether the Next Page button is present and enabled using multiple CSS selector fallbacks:
button[title="Next Page"]button[aria-label="Next Page"]button.nextPagea[title="Next Page"]li.next buttonbutton:has-text("Next")
- Before clicking Next, captures a staleness marker (the text content of the first row).
- Clicks the Next button and polls the DOM until the first row changes, indicating the new page has rendered.
- Repeats until either no Next button is found (last page) or
maxResultsPerSearchis reached.
This approach is resilient to network delays and Salesforce's asynchronous re-rendering.
Stopping early
Set maxResultsPerSearch to a positive integer to cap the number of records collected per search query. Set to 0 (default) for unlimited — the scraper will traverse every page.
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
licenseNumbers | string[] | [] | AZ ROC license numbers to look up |
companyNames | string[] | [] | Business names to search |
qualifyingPartyNames | string[] | [] | Qualifying party names to search |
cities | string[] | [] | Arizona city names to search |
licenseType | string | ALL | Filter: ALL, RESIDENTIAL, COMMERCIAL, DUAL |
licenseStatus | string | ALL | Filter: ALL, ACTIVE, SUSPENDED, EXPIRED, REVOKED, CANCELLED |
licenseClassification | string | "" | Filter by classification code (e.g. B-1, C-37) |
maxResultsPerSearch | integer | 0 | Max records per search job (0 = unlimited, paginates all pages) |
resultsPerPage | integer | 50 | Items per page: 10, 25, or 50 |
scrapeDetailPage | boolean | true | Visit each contractor's detail page for full data |
scrapeComplaints | boolean | true | Include complaint history (requires scrapeDetailPage) |
proxyConfiguration | object | Apify Residential | Proxy settings |
maxConcurrency | integer | 3 | Maximum parallel browser tabs (1–10) |
Example: Full city scrape with all filters
{"cities": ["Phoenix"],"licenseType": "COMMERCIAL","licenseStatus": "ACTIVE","licenseClassification": "B-1","maxResultsPerSearch": 0,"resultsPerPage": 50,"scrapeDetailPage": true,"scrapeComplaints": true,"maxConcurrency": 5,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Example: Batch license lookup
{"licenseNumbers": ["100001", "200002", "300003"],"scrapeDetailPage": true,"scrapeComplaints": false,"maxConcurrency": 3}
Output Schema
Records are saved to the Apify Dataset. Each record is a flat JSON object. Arrays (personnel, classifications, complaints) are stored as JSON arrays.
Sample record
{"licenseNumber": "123456","licenseType": "Specialty Dual","licenseStatus": "ACTIVE","businessName": "Desert Star HVAC LLC","dbaName": null,"qualifyingParty": "John Michael Smith","personnel": [{ "name": "John Michael Smith", "position": "Qualifying Party" },{ "name": "Jane Smith", "position": "Member/Manager" }],"entityType": "Limited Liability Company","primaryClassification": "CR-39","classificationDesc": "Air Conditioning and Refrigeration","classifications": [{ "code": "CR-39", "description": "Air Conditioning and Refrigeration" }],"city": "Scottsdale","state": "AZ","zip": "85251","phone": "(480) 555-0100","issuedDate": "2010-03-15","renewedThroughDate": "2026-03-31","bondType": "Contractor License Bond","bondStatus": "ACTIVE","bondAmount": "$ 15,000","bondCompany": "Western Surety Company","bondNumber": "123456789","bondEffectiveDate": "2024-04-01","bondExpirationDate": "2026-03-31","openCases": 0,"disciplinedCases": 0,"resolvedCases": 1,"complaintCount": 1,"complaints": [{"complaintId": "2023-00001","type": "Workmanship","outcome": "Resolved"}],"profileUrl": "https://azroc.my.site.com/AZRoc/s/contractor-search?licenseId=a0o8y0000007D05AAE","scrapedAt": "2024-11-15T14:32:01.000Z"}
Proxy Recommendations
The AZ ROC portal is hosted on Salesforce and uses Cloudflare / Akamai bot protection. To avoid detection and rate limiting:
- Residential proxies (Apify Residential) are strongly recommended for production runs.
- Datacenter proxies may trigger CAPTCHA challenges or return empty results.
- For development and testing, no proxy may work, but is not reliable for large runs.
Configure via the proxyConfiguration input parameter.
Cost & Performance
| Scenario | Approx. records/hr | Notes |
|---|---|---|
| License number lookups | 300–500 | No pagination, direct detail page |
| Company name search | 200–400 | Small result sets, no pagination needed |
| City search (small city) | 150–300 | 1–3 pages typically |
| City search (Phoenix ACTIVE) | 80–150 | Many pages + detail pages |
Performance depends on proxy speed, portal response times, and maxConcurrency.
Cost tips
- Set
scrapeDetailPage: falseto collect list-only data — much faster and cheaper. - Set
scrapeComplaints: falseto skip complaint parsing if not needed. - Set
resultsPerPage: 50(default) to minimize page loads. - Use
maxResultsPerSearchto cap large open-ended searches.
Limitations
- Complaint history: The AZ ROC portal only displays complaints from the prior two years. Older complaints are not accessible via the public portal.
- No official API: The portal does not expose a public API. All data is parsed from server-rendered HTML and may break if Salesforce updates the DOM structure.
- Pagination cap: The portal may limit total results per search to 500–1000 records depending on the query. Use narrow filters (classification, city, status) to stay within limits.
- Bot protection: Salesforce bot protection may challenge scrapers. Residential proxies and human-like delays are used to mitigate this.
Technical Architecture
Actor Input│▼buildJobs()│ Creates one job per search term / license number / city / qualifying party▼PlaywrightCrawler (Crawlee)│├─► SEARCH requests (one per job)│ ││ ├─ Navigate to SEARCH_URL│ ├─ Fill search input│ ├─ Apply Advanced Search filters (if any)│ ├─ Set items-per-page to 50│ ││ └─► performSearch() — PAGINATION LOOP│ ││ ├─ parseResultsPage() ←── Page 1│ ├─ clickNext() + waitForPageChange()│ ├─ parseResultsPage() ←── Page 2│ ├─ clickNext() + waitForPageChange()│ ├─ ...│ └─ Return all collected rows│└─► DETAIL requests (one per unique licenseId)│├─ Navigate to profileUrl├─ parseDetailPage()│ ├─ CONTRACTOR section → businessName, phone, status│ ├─ LICENSE section → type, classification, dates│ ├─ PERSONNEL section → qualifying party, all personnel│ ├─ COMPLAINT section → case counts + complaint IDs│ └─ BOND section → bond status, amount, company└─ Actor.pushData(record)
Key design decisions
- Staleness-based pagination: Instead of relying on unreliable network events, the scraper captures the first row's text content before clicking Next and polls until it changes. This works reliably across Salesforce's asynchronous LWC re-rendering.
- Multiple Next-button selectors: The Salesforce portal's CSS classes vary between versions. Multiple fallback selectors ensure pagination detection remains robust.
- Regex-based HTML parsing: Section content is extracted with targeted regex patterns rather than a full DOM parser, which is faster and avoids issues with Salesforce's deeply nested shadow DOM.
- Deduplication via
seenSet: Records are deduplicated bylicenseIdorlicenseNumberacross all search jobs to prevent duplicate dataset entries when multiple search terms match the same contractor.
Changelog
v1.1.0
- ✅ Full pagination support —
performSearch()now loops through all result pages automatically. - ✅ Added
resultsPerPageinput parameter (10 / 25 / 50). - ✅
maxResultsPerSearchdefault changed to0(unlimited). - ✅ Staleness-based page-change detection for reliable Salesforce LWC pagination.
- ✅ Multiple CSS selector fallbacks for Next button detection.
- ✅ Bond expiration date now extracted from detail page.
- ✅ All classification codes extracted into
classificationsarray.
v1.0.0
- Initial release with single-page search, detail page scraping, and bond/complaint parsing.