Arizona ROC Contractor License Scraper
Pricing
from $6.00 / 1,000 results
Arizona ROC Contractor License Scraper
Scrape Arizona Registrar of Contractors (AZ ROC) license records. Search by license number, company name, qualifying party, city, license type and classification. Returns status, bond info, complaint history and full license details.
Pricing
from $6.00 / 1,000 results
Rating
0.0
(0)
Developer
Haketa
Actor stats
0
Bookmarked
7
Total users
4
Monthly active users
21 days ago
Last modified
Categories
Share
🏗 Arizona ROC Contractor License Scraper
Scrape the Arizona Registrar of Contractors (AZ ROC) public license database with full pagination support. Search by license number, company name, qualifying party, city, license type, or classification. Returns complete license details including bond status, complaint history, and personnel information.
What's new in v1.2
- ✅ Correct pagination: Identifies Next/Prev buttons via their CSS classes (
right-btn/left-btn) and SVG direction instead of brittletitle="Next Page"attributes (which don't exist in the live DOM). - ✅ Correct page-size options: The native page-size
<select>offers10 / 20 / 50(not25as previously assumed). Default is50. - ✅ Proper rowspan handling: A single business may hold 2 or 3 licences in adjacent rows with
rowspan=Non the Business Name / Name&Title / Address / Phone cells. The parser now correctly associates every child licence row with its business context. - ✅ Separator-row skipping: The grey
<td colspan="9" data-label="line">separator rows between business blocks are detected and ignored. - ✅ Client-side pagination confirmed: Pagination triggers no network request — clicks on Next simply re-render the DOM from an already-loaded result set. We use a staleness-based wait (first-row-text changes) to detect successful page flips.
Overview
The AZ ROC public portal is built on Salesforce Experience Cloud with Lightning Web Components. It returns server-rendered HTML with a paginated results table. This actor:
- Navigates to the search page using a headless Chromium browser (Playwright).
- Fills in the search term and optional Advanced Search filters.
- Sets page size to 50 using the native
<select>element. - Automatically paginates through all result pages by clicking Next until the button is disabled.
- For each result, optionally visits the individual contractor detail page to collect the full record.
- Pushes all records to the Apify Dataset.
Field Reference
| Field | Description |
|---|---|
licenseNumber | AZ ROC six-digit license number |
licenseType | RESIDENTIAL, COMMERCIAL, DUAL, or Specialty Dual |
licenseStatus | ACTIVE, SUSPENDED, EXPIRED, REVOKED, or CANCELLED |
businessName | Registered business / company name |
dbaName | Doing Business As name (if applicable) |
qualifyingParty | Individual responsible for the trade qualifications |
personnel | Array of all personnel with name and role |
entityType | Legal entity type (e.g. Corporation, LLC) |
primaryClassification | Primary classification code (e.g. B-1, R-11, C-37) |
classificationDesc | Description of the primary classification |
classifications | All classification codes and descriptions |
city / state / zip | Business location |
phone | Business phone number |
issuedDate | License originally issued date (YYYY-MM-DD) |
renewedThroughDate | License valid through date (YYYY-MM-DD) |
bondType | Type of surety bond |
bondStatus | ACTIVE or INACTIVE |
bondAmount | Surety bond dollar amount |
bondCompany | Surety bond company name |
bondNumber | Bond policy / certificate number |
bondEffectiveDate | Bond start date (YYYY-MM-DD) |
bondExpirationDate | Bond expiration date (YYYY-MM-DD) |
openCases | Number of currently open complaint cases |
disciplinedCases | Number of disciplined complaint cases |
resolvedCases | Number of resolved / settled cases |
complaintCount | Total complaints (portal shows last 2 years) |
complaints | Array of individual complaint records |
profileUrl | Direct URL to the contractor's ROC detail page |
scrapedAt | ISO 8601 timestamp of when the record was scraped |
Search Modes
You can combine multiple search modes in a single run. Each mode enqueues an independent search job:
1. License Number Lookup (fastest)
Direct lookup by AZ ROC license number. Leading zeros are added automatically.
{ "licenseNumbers": ["123456", "789012"] }
2. Company / Business Name Search
Partial name matching is supported.
{ "companyNames": ["Acme Plumbing", "Desert HVAC"] }
3. Qualifying Party Search
{ "qualifyingPartyNames": ["John Smith", "Maria Garcia"] }
4. City Search
Large cities yield thousands of records across many pages — all are traversed.
{"cities": ["Scottsdale", "Mesa"],"licenseStatus": "ACTIVE","licenseType": "RESIDENTIAL"}
Full Pagination Support
How it works (v1.2)
AZ ROC displays results in a Salesforce LWC table with 10, 20, or 50 rows per page and Next / Previous buttons. The buttons are identified by their class names and inner SVG direction — not by title or aria-label (which are absent):
<!-- Next button --><button type="button" class="slds-button slds-button_neutral right-btn"><svg data-key="right">…</svg></button><!-- Prev button --><button type="button" class="slds-button slds-button_neutral left-btn"><svg data-key="left">…</svg></button>
The scraper:
- Sets
<select class="slds-select">to50and dispatches a nativechangeevent to force Aura to rebind. - Parses the current table's HTML into flat records (handling rowspan layouts).
- Locates the Next button via
button.slds-button.right-btn. - Checks
disabled/aria-disabled/disabledclass to know if we've reached the last page. - If enabled, captures the first data row's text content as a staleness marker, clicks Next, and polls the DOM until the first row text changes.
- Repeats until the Next button is disabled or
maxResultsPerSearchis reached.
Pagination does not trigger network requests — AZ ROC pre-loads the entire result set after search and pagination is purely DOM-level. This makes the click loop fast and reliable.
Handling rowspan layouts
A single business may hold multiple licences, rendered with rowspan:
┌────────────────────┬─────────────────┬──────────────────┬─────────┬──────┬────────┬─────────┬──────┬─────────┐│ Business Name │ Name & Title │ License No │ Class │ QP │ Status │ Address │ Phone│ MoreInfo│ ← row 1│ rowspan=2 │ rowspan=2 │ ROC 333239 │ CR-61 │ … │ Active │ City,AZ │ … │ More │├ ┼ ├──────────────────┼─────────┼──────┼────────┤ ┼ ├─────────┤│ (merged) │ (merged) │ ROC 335547 │ CR-3 │ … │ Active │ (merged)│ (m) │ More │ ← row 2└────────────────────┴─────────────────┴──────────────────┴─────────┴──────┴────────┴─────────┴──────┴─────────┘│ <td colspan="9" data-label="line"> ← grey separator (skipped)
The parser tracks the most recent business block context. Rows without a Business Name cell are treated as additional licences belonging to the most recently-seen business.
Stopping early
Set maxResultsPerSearch to a positive integer to cap the number of records collected per search query. Set to 0 (default) for unlimited.
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
licenseNumbers | string[] | [] | AZ ROC license numbers |
companyNames | string[] | [] | Business names |
qualifyingPartyNames | string[] | [] | Qualifying party names |
cities | string[] | [] | Arizona city names |
licenseType | string | ALL | ALL / RESIDENTIAL / COMMERCIAL / DUAL |
licenseStatus | string | ALL | ALL / ACTIVE / SUSPENDED / EXPIRED / REVOKED / CANCELLED |
licenseClassification | string | "" | Classification code filter (e.g. B-1, CR-39) |
maxResultsPerSearch | integer | 0 | 0 = unlimited (paginates all pages) |
resultsPerPage | integer | 50 | 10, 20, or 50 |
scrapeDetailPage | boolean | true | Visit each detail page for full data |
scrapeComplaints | boolean | true | Include complaint history |
proxyConfiguration | object | Apify Residential | Proxy settings |
maxConcurrency | integer | 3 | Parallel browser tabs (1–10) |
Example: Full city scrape
{"cities": ["Phoenix"],"licenseType": "COMMERCIAL","licenseStatus": "ACTIVE","licenseClassification": "B-1","maxResultsPerSearch": 0,"resultsPerPage": 50,"scrapeDetailPage": true,"scrapeComplaints": true,"maxConcurrency": 5,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Example: Batch license lookup
{"licenseNumbers": ["333282", "333239", "333681"],"scrapeDetailPage": true,"scrapeComplaints": false,"maxConcurrency": 3}
Output Schema
Records are saved to the Apify Dataset as flat JSON objects. Arrays (personnel, classifications, complaints) are stored as JSON arrays.
Sample record (list-only, scrapeDetailPage: false)
{"licenseId": "a0o8y0000004CQAAA2","licenseNumber": "333282","businessName": "Grey Wolf Drywall LLC","dbaName": null,"qualifyingParty": "Alfonso Lopez","personnel": [{ "name": "Alfonso Lopez", "role": "Member" },{ "name": "Alfonso Lopez", "role": "Qualifying Party" }],"primaryClassification": "CR-10","classificationDesc": "Drywall","licenseStatus": "Active","city": "Waddell","state": "AZ","zip": "85355","phone": "(602) 317-1895","profileUrl": "https://azroc.my.site.com/AZRoc/s/contractor-search?licenseId=a0o8y0000004CQAAA2"}
Sample record (with detail page, scrapeDetailPage: true)
Adds: entityType, issuedDate, renewedThroughDate, bondType, bondStatus, bondAmount, bondCompany, bondNumber, bondEffectiveDate, bondExpirationDate, openCases, disciplinedCases, resolvedCases, complaintCount, complaints, plus classifications (all licences for the business).
Proxy Recommendations
AZ ROC is hosted on Salesforce with standard bot protection. For production runs:
- Residential proxies (Apify Residential) are strongly recommended.
- Datacenter proxies may work for small runs but can trigger challenges.
- For development/testing, no proxy may work but is not reliable.
Cost & Performance
| Scenario | Approx records/hr | Notes |
|---|---|---|
| License number lookups | 300-500 | No pagination; direct detail page |
| Company name search | 200-400 | Small result sets |
| City search (small) | 150-300 | 1-3 pages typically |
| City search (Phoenix ACTIVE) | 80-150 | Many pages + detail pages |
Cost tips:
scrapeDetailPage: false→ list-only, ~3× faster.scrapeComplaints: false→ skips complaint parsing.resultsPerPage: 50→ fewest Next clicks.- Use narrow filters (classification, city, status) to stay within result caps.
Limitations
- Complaint history: AZ ROC portal only displays complaints from the prior two years.
- No official API: All data parsed from rendered HTML — may break if Salesforce updates the DOM.
- Pagination caps: Very broad queries (e.g. all Phoenix contractors, no filters) may exceed 1000-1500 records; use filters to narrow.
- Bot protection: Salesforce may rate-limit; residential proxies and delays are used to mitigate.
Technical Architecture
Actor Input│▼buildJobs() ── one job per search term│▼PlaywrightCrawler (Crawlee + fingerprint generator)│├─► SEARCH requests│ ││ ├─ goto SEARCH_URL│ ├─ fill input + optional Advanced Search filters│ ├─ click Search → wait for first License No row│ ├─ setPageSize(50) — dispatch native 'change' event on <select>│ ││ └─► PAGINATION LOOP│ ││ ├─ getTableHtml()│ ├─ parseResultsFromHtml() ← handles rowspan + separator rows│ ├─ getNextButton() ← via .right-btn class + disabled check│ │ └─ if null, break (last page)│ ├─ snapshot first row text│ ├─ click Next, waitForPageChange()│ └─ continue│└─► DETAIL requests (one per unique licenceId)│├─ goto profileUrl├─ parseDetailPage()│ ├─ CONTRACTOR section → businessName, phone, status│ ├─ LICENSE section → type, classification, dates│ ├─ PERSONNEL section → qualifying party│ ├─ COMPLAINT section → case counts + complaint IDs│ └─ BOND section → bond status, amount, company└─ Actor.pushData()
Changelog
v1.2.0 (current)
- ✅ Fixed Next/Prev button detection (class-based, not title-based).
- ✅ Fixed page-size options (10 / 20 / 50).
- ✅ Fixed rowspan handling — correct association of child licences with their business context.
- ✅ Separator rows (
data-label="line") are skipped. - ✅ Native
changeevent dispatched after page-size change.
v1.1.0
- Full pagination support with staleness-based change detection.
resultsPerPageinput parameter.- Multiple selector fallbacks for Next button.
v1.0.0
- Initial release.