# thebluebook scraper (`fayoussef/thebluebook-scraper`) Actor

Our thebluebook.com scraper makes it simple to collect contractor profiles at scale. It automatically gathers URLs from all search pages and extracts complete details for every profile including company info, contacts, trades, certifications, and project history.

- **URL**: https://apify.com/fayoussef/thebluebook-scraper.md
- **Developed by:** [youssef farhan](https://apify.com/fayoussef) (community)
- **Categories:** Automation, Integrations, Lead generation
- **Stats:** 3 total users, 1 monthly users, 100.0% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

from $1.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Blue Book Construction Directory Scraper — Extract Contractor Profiles at Scale

Scrape contractor and subcontractor profiles from thebluebook.com — the construction industry's largest online directory — and get structured company data including contacts, trade categories, certifications, project history, and email addresses sourced from company websites. Built for lead generation, market research, and construction intelligence workflows.

### What you get

**Company identity**
- `name` — Company name
- `company_id` — Unique Blue Book ID
- `profile_link` — Direct URL to Blue Book profile
- `city_state` — City and state from profile header
- `full_address` — Full street address (scraped from contacts sub-page)
- `website` — Company website URL
- `scrape_date` — ISO date the record was collected

**Contact data**
- `phone` — Main office phone
- `email` — Best contact email scraped from company website (domain-match prioritized over free providers)
- `contact_name` / `contact_role` / `contact_phone` — Primary key contact (promoted to top level for easy access)
- `contacts[]` — Full list of key contacts with name, role, and direct phone

**Business profile**
- `trade[]` — Trade categories (e.g. Electrical Contractors, Plumbing)
- `service_area[]` — Counties and regions serviced
- `certifications[]` — Diversity certifications: MBE, WBE, DBE, SBE, etc.
- `established` — Year founded
- `company_size` — Employee headcount range
- `annual_volume` — Annual revenue range
- `listed_since` — Year first listed on Blue Book

**Project history**
- `projects[]` — Construction projects with name, type, location, status, date, general contractor name and role

### Sample output

```json
{
  "profile_link": "https://www.thebluebook.com/iProView/1424972",
  "company_id": "1424972",
  "name": "ABC Electrical Contractors",
  "phone": "(713) 555-0100",
  "email": "info@abcelectrical.com",
  "website": "https://www.abcelectrical.com",
  "city_state": "Houston, TX",
  "full_address": "1234 Main St, Houston, TX 77002",
  "trade": ["Electrical Contractors", "Lighting Contractors"],
  "certifications": ["MBE", "DBE"],
  "service_area": ["Harris County", "Fort Bend County", "Montgomery County"],
  "established": "1998",
  "company_size": "10-24 Employees",
  "annual_volume": "$1M - $5M",
  "listed_since": "2005",
  "scrape_date": "2026-04-10",
  "contact_name": "John Smith",
  "contact_role": "Owners, Principals & Senior Executives",
  "contact_phone": "(713) 555-0101",
  "contacts": [
    { "name": "John Smith", "role": "Owners, Principals & Senior Executives", "phone": "(713) 555-0101" }
  ],
  "projects": [
    {
      "project_name": "Downtown Office Tower",
      "project_location": "Houston, TX",
      "project_type": "Commercial",
      "project_status": "Completed",
      "project_date": "Mar 2023",
      "gc_role": "General Contractor",
      "gc_name": "Turner Construction",
      "project_url": "https://www.thebluebook.com/iProView/1424972/project/..."
    }
  ]
}
````

### Use cases

- **Construction material suppliers** building targeted outreach lists by trade and region
- **Staffing and recruitment agencies** sourcing contractor companies with headcount and revenue data
- **Market research firms** benchmarking contractor density, certifications, and project volume by metro
- **CRM and sales teams** enriching leads with verified phone, email, address, and project history
- **Government and compliance teams** identifying MBE/WBE/DBE-certified contractors for procurement
- **Proptech and construction intelligence platforms** aggregating subcontractor data at scale

### Pricing

| Event | Price |
|-------|-------|
| Per company profile scraped | $0.001 |

**Real example:** Scrape 1000 contractor profiles ≈ **$1**

### How it works

- **Input:** Provide one or more thebluebook.com search result URLs (e.g. electricians in Texas) or direct profile URLs
- **Pagination:** Detects total page count automatically and fetches all pages concurrently (up to 5 parallel search requests)
- **Profile extraction:** For each company, fetches the main profile + `/locations-contacts/` + `/construction-projects/` sub-pages in parallel — 3 pages per company
- **Email discovery:** Crawls the company's own website (homepage + contact/about pages) to find and rank email addresses — domain-matching emails returned first
- **Output:** Structured JSON records delivered to Apify Dataset, downloadable as JSON, CSV, or Excel — or pushed to your webhook in real time
- **Resumable:** State is saved after every page and every profile — interrupted runs pick up where they left off

### Why this scraper

- **Email included** — Most Blue Book scrapers return empty email fields. This one crawls each company's own website and returns the best email found, prioritizing domain-matched addresses over generic inboxes.
- **Three sub-pages per profile** — Contacts page (full address + key people) and projects page fetched concurrently with the main profile, not skipped.
- **Two-tier proxy strategy** — Residential proxies on search/pagination pages where bot detection is tightest; standard proxies on profile pages to minimize cost. No proxy setup needed on your end.
- **Pay-per-profile, not per run** — Idle time, retries, and failed requests don't cost you anything. You pay only for successfully scraped profiles.

### Input example

```json
{
  "startUrls": [
    { "url": "https://www.thebluebook.com/iSearch/results/tx/houston/electrical-contractors/sc/261/" },
    { "url": "https://www.thebluebook.com/iSearch/results/ca/los-angeles/general-contractors/sc/240/" }
  ],
  "maxItems": 500
}
```

To scrape a single company, pass its profile URL directly:

```json
{
  "startUrls": [
    { "url": "https://www.thebluebook.com/iProView/1424972" }
  ]
}
```

### FAQ

**Does it handle pagination automatically?**
Yes. The actor detects total page count from the search results page and fetches all pages without any additional input from you.

**What output formats are supported?**
JSON, CSV, Excel, XML, and JSONL — all available from the Apify Dataset UI or via API.

**How fresh is the data?**
Data is scraped live on each run. Schedule the actor daily, weekly, or monthly via Apify's built-in scheduler to keep your dataset current.

**How does it find email addresses?**
It fetches the company's website (homepage + `/contact`, `/contact-us`, `/about` variants) and extracts emails from mailto links and page text. Domain-matched emails (e.g. `info@companysite.com`) are returned before Gmail or Yahoo addresses.

**Can I run it on a schedule or trigger it via webhook?**
Yes to both. Use Apify's scheduler for recurring runs, or trigger via webhook/API on any external event.

**Can I scrape a specific trade or region only?**
Yes — filter by constructing the right thebluebook.com search URL for your target trade category and location, then pass it as a `startUrl`.

**What if a company has no website or contacts listed?**
The `email`, `full_address`, and `contacts` fields will be empty strings or empty arrays. The core company record is always returned.

### Use via API or MCP

Call this actor programmatically via the [Apify API](https://docs.apify.com/api/v2) or as an **MCP server** for AI agents (Claude, ChatGPT, Cursor, and others):

```
https://mcp.apify.com/actors/fayoussef/thebluebook-scraper
```

AI agents can trigger runs, pass input, and retrieve structured output directly — no manual steps required.

### Need a custom scraper?

Need a different site, additional fields, or a managed data pipeline? Visit [automationbyexperts.com](https://automationbyexperts.com) for custom builds, retainers, and data-as-a-service.

# Actor input Schema

## `startUrls` (type: `array`):

One or more thebluebook.com URLs to scrape.

Supported formats:
• Search results page — the actor will paginate through all results:
https://www.thebluebook.com/iSearch/results/tx/houston/electrical-contractors/sc/261/
• Legacy search URL:
https://www.thebluebook.com/search.html?region=2\&class=3370\&searchTerm=Plumbing+Contractors
• Direct company profile — scrapes a single company:
https://www.thebluebook.com/iProView/1424972

## `maxItems` (type: `integer`):

Maximum number of company profiles to scrape and save. Set to 0 or leave empty for no limit — the actor will scrape all discovered profiles.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://www.thebluebook.com/search.html?region=2&searchsrc=index&class=3370&searchTerm=Plumbing%20Contractors"
    }
  ],
  "maxItems": 0
}
```

# Actor output Schema

## `results` (type: `string`):

thebluebook scraper

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://www.thebluebook.com/search.html?region=2&searchsrc=index&class=3370&searchTerm=Plumbing%20Contractors"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("fayoussef/thebluebook-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "startUrls": [{ "url": "https://www.thebluebook.com/search.html?region=2&searchsrc=index&class=3370&searchTerm=Plumbing%20Contractors" }] }

# Run the Actor and wait for it to finish
run = client.actor("fayoussef/thebluebook-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://www.thebluebook.com/search.html?region=2&searchsrc=index&class=3370&searchTerm=Plumbing%20Contractors"
    }
  ]
}' |
apify call fayoussef/thebluebook-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=fayoussef/thebluebook-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "thebluebook scraper",
        "description": "Our thebluebook.com scraper makes it simple to collect contractor profiles at scale. It automatically gathers URLs from all search pages and extracts complete details for every profile including company info, contacts, trades, certifications, and project history.",
        "version": "0.0",
        "x-build-id": "9zJbI8dVQc09PjfZA"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/fayoussef~thebluebook-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-fayoussef-thebluebook-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/fayoussef~thebluebook-scraper/runs": {
            "post": {
                "operationId": "runs-sync-fayoussef-thebluebook-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/fayoussef~thebluebook-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-fayoussef-thebluebook-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "One or more thebluebook.com URLs to scrape.\n\nSupported formats:\n• Search results page — the actor will paginate through all results:\n  https://www.thebluebook.com/iSearch/results/tx/houston/electrical-contractors/sc/261/\n• Legacy search URL:\n  https://www.thebluebook.com/search.html?region=2&class=3370&searchTerm=Plumbing+Contractors\n• Direct company profile — scrapes a single company:\n  https://www.thebluebook.com/iProView/1424972",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of company profiles to scrape and save. Set to 0 or leave empty for no limit — the actor will scrape all discovered profiles.",
                        "default": 0
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
