# Houzz Lead Scraper & Contact Enrichment (`nocodeninja_ng/houzz-lead-scraper`) Actor

Extract Houzz leads with this lightweight Python scraper. Get business names, websites, phone numbers, and social media. Features optional email enrichment by crawling business sites. Cost-efficient, fast, and ideal for B2B sales, architects, and contractor lead generation. Supports proxies.

- **URL**: https://apify.com/nocodeninja\_ng/houzz-lead-scraper.md
- **Developed by:** [Mohammed Yusuf](https://apify.com/nocodeninja_ng) (community)
- **Categories:** Lead generation, Automation, Real estate
- **Stats:** 4 total users, 3 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Houzz Lead Scraper & Contact Enrichment Actor

Lightweight Houzz scraper built with Python, Requests, BeautifulSoup, and the Apify SDK.

Extract Houzz business leads including company websites, social media profiles, ratings, reviews, project counts, and optional contact email enrichment from business websites.

Unlike heavy browser-based scrapers, this Actor uses a lightweight requests-based architecture designed to reduce runtime costs while maintaining high-quality lead extraction.

Ideal for:

* B2B lead generation
* Sales prospecting
* CRM enrichment
* Marketing agencies
* Architecture and remodeling outreach
* Interior design lead sourcing
* Contractor lead discovery

---

## Features

### Houzz Lead Extraction

Extract structured business lead data from Houzz professional listings:

* Company / professional name
* Business location
* Phone number
* Website URL
* Ratings
* Review counts
* Project counts
* Services provided
* Houzz profile URL
* Social media profiles

Supported social platforms:

* LinkedIn
* Instagram
* Facebook
* Twitter/X

---

### Optional Contact Enrichment

Enable Contact Enrichment mode to visit discovered business websites and extract contact emails.

The Actor checks lightweight high-value pages only:

* Homepage
* `/contact`
* `/contact-us`
* `/about`

Supported extraction methods:

* `mailto:` links
* Visible page text
* Common email obfuscations
* Cloudflare email protection decoding

False positives such as image filenames and static assets are filtered automatically.

### Pricing

This Actor is monetized on a **pay-per-result** basis.  

- Each run is priced per lead extracted, with enriched contact emails considered premium.  
- As a reference, standard usage (Houzz data + websites + socials) is priced at $3.99 per 1,000 leads.    
**Note**: Due to Apify limits, the maximum cost per run is currently $3.00. This typically allows 500–750 results per run.

> ⚠️ Actual cost may vary depending on:
> - number of leads extracted
> - whether contact enrichment is enabled
> - optional proxy usage
> - platform fees

---

## Why This Actor?

### Lightweight & Cost Efficient

Most Houzz scrapers rely on full browser automation, which significantly increases runtime costs.

This Actor uses:

* Requests
* BeautifulSoup
* Lightweight HTML parsing

Benefits:

* Lower runtime cost
* Faster execution
* Lower memory usage
* Better scalability for lead generation
* Configurable concurrent email enrichment for faster large-scale runs

---

## Input

| Field                | Type    | Default                | Description                                     |
| -------------------- | ------- | ---------------------- | ----------------------------------------------- |
| `startUrl`           | string  | NYC architects example | Houzz search/results URL to start scraping from |
| `maxResults`         | integer | `10`                   | Maximum profiles to push to the dataset         |
| `maxPages`           | integer | `3`                    | Maximum paginated search pages to inspect       |
| `extractEmails`      | boolean | `false`                | Enable Contact Enrichment mode                  |
| `proxyConfiguration` | object  | No proxy               | Optional Apify proxy configuration              |

---

## Example Input

```json
{
  "startUrl": "https://www.houzz.com/professionals/architect/new-york-city-ny-us-probr0-bo~t_11784~r_5128581",
  "maxResults": 25,
  "maxPages": 2,
  "extractEmails": true,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
````

***

## Proxy Support

Proxy usage is optional.

Small scraping jobs can run efficiently without proxy.

For larger scraping runs, users can enable:

- Residential proxies
- Datacenter proxies
- Own proxies

through the Apify proxy configuration UI.

The Actor uses official Apify proxy integration and automatically applies configured proxies to:

- Houzz requests
- External website enrichment requests

### Proxy Usage Notes

We have tested this Actor extensively using default settings **without enabling a proxy**, and it was able to extract hundreds of leads with enrichment successfully.

⚠️ **Important:** Your experience may vary depending on:

- network location
- Houzz rate-limiting
- size of your scraping run
- email enrichment enabled

For larger runs, or if you encounter blocks or slowdowns, enabling Apify Proxy (Datacenter or Residential) is recommended. The Actor fully supports optional proxy configuration via the input UI.

***

## Output

Each dataset item contains:

```json
{
  "name": "Example Studio",
  "location": "New York, NY",
  "phone": "+1 555 123 4567",
  "website": "https://example.com",
  "rating": 5.0,
  "review_count": 12,
  "project_count": 8,
  "services": "Architecture, Interior Design",
  "emails": ["hello@example.com"],
  "profile_url": "https://www.houzz.com/professionals/...",
  "socials": {
    "linkedin": null,
    "instagram": "https://www.instagram.com/example",
    "facebook": null,
    "twitter": null
  }
}
```

Fields may be empty or `null` if data is unavailable.

***

## Contact Enrichment Mode

When `extractEmails` is enabled, the Actor performs lightweight website enrichment to discover business contact emails.

This mode increases runtime because additional external websites are visited.

Recommended for:

- Outreach campaigns
- Lead generation
- CRM enrichment
- Sales prospecting

***

## Local Development

### Install dependencies

```powershell
python -m venv venv
.\venv\Scripts\Activate.ps1
pip install -r requirements.txt
```

### Run locally

```powershell
python main.py
```

### Local environment variables

```powershell
$env:HOUZZ_START_URL="https://www.houzz.com/professionals/architect/new-york-city-ny-us-probr0-bo~t_11784~r_5128581"
$env:MAX_PAGES="3"
$env:MAX_RESULTS="10"
$env:EXTRACT_EMAILS="true"

python main.py
```

Local runs export results to:

```plaintext
houzz_results.json
```

Apify platform runs push results to the default dataset.

***

## Recommended Use Cases

- Houzz lead generation
- Architect lead scraping
- Interior designer lead discovery
- Remodeling contractor prospecting
- Agency outreach
- CRM enrichment
- Market research
- Competitor research

***

## Performance Notes

- Lightweight BeautifulSoup architecture
- Lower resource usage than browser-based scrapers
- Contact Enrichment mode increases runtime
- Some external websites may block scraping at scale
- JavaScript-rendered emails may not always be accessible through requests-based extraction

***

## Scaling Recommendations

### Small Runs

- Run without proxy
- Faster and cheaper

### Larger Runs

Enable:

- Residential proxy
- Datacenter proxy

through the Apify UI for improved reliability.

***

## Disclaimer

This Actor is intended for lawful data extraction and business research workflows. Users are responsible for complying with applicable website terms and local regulations.

# Actor input Schema

## `startUrl` (type: `string`):

Houzz search/results URL to start scraping from. The URL should have profession and city/state specified to ensure accurate results.

## `maxResults` (type: `integer`):

Maximum number of profiles to push to the dataset.

## `maxPages` (type: `integer`):

Maximum number of paginated search pages to inspect.

## `extractEmails` (type: `boolean`):

Visit company websites and attempt to extract contact emails. Increases runtime.

## `proxyConfiguration` (type: `object`):

Optional proxy configuration for large-scale scraping runs.

## Actor input object example

```json
{
  "startUrl": "https://www.houzz.com/professionals/architect/new-york-city-ny-us-probr0-bo~t_11784~r_5128581",
  "maxResults": 10,
  "maxPages": 3,
  "extractEmails": false,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `results` (type: `string`):

Dataset items produced by the Actor. Each item contains one Houzz professional profile lead.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("nocodeninja_ng/houzz-lead-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("nocodeninja_ng/houzz-lead-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call nocodeninja_ng/houzz-lead-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=nocodeninja_ng/houzz-lead-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Houzz Lead Scraper & Contact Enrichment",
        "description": "Extract Houzz leads with this lightweight Python scraper. Get business names, websites, phone numbers, and social media. Features optional email enrichment by crawling business sites. Cost-efficient, fast, and ideal for B2B sales, architects, and contractor lead generation. Supports proxies.",
        "version": "0.1",
        "x-build-id": "GM8hAzRhiXXA3M4T7"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/nocodeninja_ng~houzz-lead-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-nocodeninja_ng-houzz-lead-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/nocodeninja_ng~houzz-lead-scraper/runs": {
            "post": {
                "operationId": "runs-sync-nocodeninja_ng-houzz-lead-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/nocodeninja_ng~houzz-lead-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-nocodeninja_ng-houzz-lead-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrl"
                ],
                "properties": {
                    "startUrl": {
                        "title": "Start URL",
                        "type": "string",
                        "description": "Houzz search/results URL to start scraping from. The URL should have profession and city/state specified to ensure accurate results.",
                        "default": "https://www.houzz.com/professionals/architect/new-york-city-ny-us-probr0-bo~t_11784~r_5128581"
                    },
                    "maxResults": {
                        "title": "Maximum results",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum number of profiles to push to the dataset.",
                        "default": 10
                    },
                    "maxPages": {
                        "title": "Maximum search pages",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum number of paginated search pages to inspect.",
                        "default": 3
                    },
                    "extractEmails": {
                        "title": "Contact Enrichment",
                        "type": "boolean",
                        "description": "Visit company websites and attempt to extract contact emails. Increases runtime.",
                        "default": false
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Optional proxy configuration for large-scale scraping runs.",
                        "default": {
                            "useApifyProxy": false
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
