Home Service Business Lead Scraper avatar

Home Service Business Lead Scraper

Pricing

from $0.20 / 1,000 results

Go to Apify Store
Home Service Business Lead Scraper

Home Service Business Lead Scraper

Scrape publicly available home service business leads from Houzz, Yellow Pages Directory, and BuildZoom.

Pricing

from $0.20 / 1,000 results

Rating

0.0

(0)

Developer

DigitalNomadPH

DigitalNomadPH

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

2

Monthly active users

18 hours ago

Last modified

Share

Collect verified contact data — phone numbers, addresses, websites, ratings, and service details — for home service contractors across the United States. The actor simultaneously scrapes Houzz, Yellow Pages Directory, and BuildZoom, deduplicates records across sources, and scores each lead by data completeness. Use it to build targeted outreach lists for plumbers, electricians, HVAC companies, roofers, landscapers, and more — no code required.

Features

  • Scrapes four validated public directories: Houzz, Yellow Pages / YP, Yellow Pages Directory, and BuildZoom
  • Supports 8 home service trade categories
  • Deduplicates records across sources by phone, domain, name + location
  • Quality scoring: each lead receives a 0–100 quality score and a high / medium / low band
  • Configurable result cap (1–1,000 records) with per-source distribution logic
  • Optional email extraction (only from explicitly labeled email fields)
  • Optional website extraction
  • Apify Proxy support for reliable access
  • Debug mode for diagnosing source-level parsing issues

Why use Home Service Business Lead Scraper?

This actor is useful for anyone who needs a list of local home service contractors:

  • Sales teams prospecting HVAC, roofing, or plumbing companies for B2B outreach
  • Marketing agencies building contact lists for local service verticals
  • Local SEO tools seeding contractor data for a new market
  • Aggregator platforms bootstrapping a contractor directory without a manual data entry effort
  • Researchers studying the density and distribution of trade contractors by metro area

How much will it cost?

This actor uses Cheerio (fast HTTP scraping) for Yellow Pages Directory and BuildZoom, and Playwright (headless browser) for Houzz. Playwright runs cost more than plain HTTP requests.

Run sizeApprox. compute unitsApprox. cost (pay-as-you-go)
30 results0.05–0.20 CU~$0.01–$0.04
100 results0.20–0.60 CU~$0.04–$0.12
500 results0.80–2.50 CU~$0.16–$0.50

Proxy usage (Apify Proxy residential) adds approximately $0.40/GB. Typical runs consume under 20 MB per 100 results. Costs vary based on selected sources and proxy tier.

How to use

  1. Go to the Apify Store page for this actor and click Try for free.
  2. In the Input tab, select a Category (e.g., Plumber) and enter a Location (e.g., Chicago, IL).
  3. Adjust Maximum results and select which Sources to scrape.
  4. Click Start to run the actor.
  5. When the run completes, open the Storage tab to download your leads as JSON, CSV, or Excel.

You can also run this actor via the Apify API or schedule it to run on a recurring basis from the Schedules tab.

Input Parameters

ParameterTypeRequiredDefaultDescription
categorystringYesplumberTrade category to search. One of: plumber, electrician, hvac, roofer, landscaper, cleaning_service, handyman, general_contractor
locationstringYesAustin, TXUS city or metro area. Example: Denver, CO or Portland, OR
maxResultsintegerNo30Maximum unique records to save (1–1,000). Distributed evenly across sources.
sourcesarrayNo["houzz", "yp", "yellow_pages_directory", "build_zoom"]Directories to scrape. Select one or more.
includeEmailsbooleanNofalseExtract emails from explicitly labeled email fields on directory pages.
includeWebsitebooleanNotrueExtract business website URLs.
deduplicatebooleanNotrueMerge duplicate businesses found across sources.
debugModebooleanNofalseLog detailed source and parsing diagnostics to the run log.
proxyConfigurationobjectNo{ "useApifyProxy": true }Apify Proxy settings. Recommended to keep enabled for reliable scraping.

Example input

{
"category": "plumber",
"location": "Austin, TX",
"maxResults": 30,
"includeEmails": false,
"includeWebsite": true,
"deduplicate": true,
"sources": ["houzz", "yp", "yellow_pages_directory", "build_zoom"],
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Output

Each saved dataset item follows this schema:

{
"businessName": "ABC Plumbing LLC",
"sector": "home_services",
"category": "plumber",
"trade": "plumber",
"phone": "+15125551212",
"email": null,
"website": "https://example.com",
"profileUrl": "https://www.buildzoom.com/contractor/abc-plumbing-llc",
"address": "123 Main St",
"city": "Austin",
"region": "TX",
"postalCode": "78701",
"country": "US",
"rating": 4.7,
"reviewCount": 128,
"description": "Local plumbing company serving Austin and surrounding areas.",
"services": ["Drain cleaning", "Water heater repair"],
"serviceArea": ["Austin", "Round Rock"],
"licenseNumber": null,
"yearsInBusiness": null,
"emergencyService": true,
"source": "build_zoom",
"sourceName": "BuildZoom",
"sourcesSeen": ["build_zoom", "houzz"],
"profileUrlsSeen": [
"https://www.buildzoom.com/contractor/abc-plumbing-llc",
"https://www.houzz.com/professionals/abc-plumbing"
],
"scrapedAt": "2026-06-15T00:00:00.000Z",
"qualityScore": 87,
"qualityBand": "high"
}

Field notes:

  • qualityScore — 0–100 based on data completeness (phone, address, website, email, rating, description).
  • qualityBandhigh (≥80), medium (≥50), or low (<50).
  • sourcesSeen — all source IDs where this business was found (populated after deduplication).
  • profileUrlsSeen — all directory profile URLs found for this business.
  • emergencyServicetrue if the description mentions 24/7 or emergency service; null if unknown.

Supported Sources

Source IDDisplay NameCrawler type
houzzHouzzPlaywright
ypYellow Pages / YPPlaywright
yellow_pages_directoryYellow Pages DirectoryCheerio
build_zoomBuildZoomCheerio

Houzz and Yellow Pages / YP use a headless browser (Playwright) due to JavaScript rendering and bot protection. Yellow Pages Directory and BuildZoom are scraped via fast HTTP requests (Cheerio).

Supported Categories

Input valueDescription
plumberPlumbers
electricianElectricians
hvacHVAC companies
rooferRoofers
landscaperLandscapers
cleaning_serviceCleaning services
handymanHandyman services
general_contractorGeneral contractors

Category aliases are accepted (e.g. "plumbing contractor" normalizes to "plumber").

Tips

  • Per-source cap: When multiple sources are selected, the actor applies a per-source limit of ceil(maxResults / numberOfSources) to prevent a single fast source from filling the entire result quota before slower sources contribute.
  • Proxy: Residential proxy is recommended (and set by default). Yellow Pages / YP and Houzz block datacenter IPs on the Apify platform. Yellow Pages Directory and BuildZoom work with datacenter proxy if you want to lower costs.
  • Yellow Pages / YP reliability: YP actively rotates bot defenses (403, rate limits, empty pages). Expect 24–28 records on a 30-result run — when YP is blocked, the other three sources still deliver results cleanly. This is a target-site limitation, not an actor bug.
  • Location format: Use City, ST format (e.g. Austin, TX, Miami, FL). Multi-word cities work fine: San Antonio, TX.
  • Email extraction: Enable includeEmails only when you specifically need email addresses. Emails are only extracted from labeled email fields on profile pages — not guessed from description text.
  • Deduplication: When deduplicate is true, records found across multiple sources are merged and the sourcesSeen array reflects all sources where the business was found.

Limitations

  • US locations only (MVP scope).
  • Public pages only — no login support.
  • No CAPTCHA solving.
  • No deep website crawling.
  • No email verification.
  • No CRM integrations.
  • Source selectors may require maintenance when directory layouts change.

Responsible Use

This actor extracts publicly available business information from supported directory pages. Users are responsible for ensuring that their use of the scraped data complies with applicable laws, platform terms of service, privacy regulations (including CAN-SPAM, GDPR where applicable), and marketing rules. The actor must not be used to collect private, login-gated, sensitive, or restricted information.

Local Development

npm install
npx playwright install chromium # required for Houzz (Playwright source)
npm test
npm run build
npm start

Local runs use Apify local storage under storage/. Set APIFY_HEADLESS=1 to run Playwright in headless mode locally.

Support

Found a bug or want to request a new source or category? Open an issue on the actor's GitHub repository or contact the author through Apify.