# Yellow Pages Scraper — US Business Leads | from $1.50/1K (`bovi/yellowpages-scraper`) Actor

Scrape Yellow Pages US business listings — name, phone, address, website, categories, rating. Bulk lead-gen by search term + location. Dual address parsing, organic/ad flag, clean website URLs. Each record has parse\_confidence.

- **URL**: https://apify.com/bovi/yellowpages-scraper.md
- **Developed by:** [Vitalii Bondarev](https://apify.com/bovi) (community)
- **Categories:** Lead generation, Marketing, Business
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Yellow Pages Scraper — US Local Business Directory

**Pay-per-result: $1.50/1K businesses scraped. Phone numbers and ratings included. No proxy, no monthly fee.**

Yellow Pages still indexes 20 million+ US local businesses. This scraper gives you clean, structured leads — name, phone, address, website, categories, rating — in seconds. Pay only for results you get.

Scrape **yellowpages.com** search results into clean, lead-gen-ready JSON.
Extract business name, phone, address, website, categories, rating, review count,
and years in business — with a `parse_confidence` score on every record.

### What you get

| Field | Description |
|---|---|
| `business_name` | Full business name |
| `phone` | Primary phone number |
| `street` | Street address |
| `city` | City |
| `state` | State (2-letter code) |
| `zip` | ZIP code |
| `website` | Business website URL (external, not a YP redirect) |
| `categories` | List of YP business categories |
| `rating` | Star rating (1.0–5.0, half-star precision) |
| `review_count` | Number of reviews |
| `years_in_business` | Years in business (from YP badge) |
| `yp_url` | Canonical Yellow Pages listing URL |
| `is_ad` | Whether this is a sponsored listing |
| `parse_confidence` | Parse quality score (1.0 = perfect) |
| `warnings` | List of machine-readable quality codes |

### Input

| Parameter | Type | Default | Description |
|---|---|---|---|
| `searchTerms` | string | `"plumbers"` | Business type or keyword |
| `location` | string | `"Austin, TX"` | City, state, or ZIP |
| `maxResults` | integer | `30` | Max records to return |
| `includeAds` | boolean | `false` | Include sponsored listings |
| `startPage` | integer | `1` | Start page (for resuming runs) |

### Why this scraper is different

Most Yellow Pages scrapers on the market break within weeks because they anchor
on CSS class names that change with every site deploy. This scraper uses
**structural HTML anchors** (`div.v-card`, `a.business-name`, `p.adr`,
`div.result-rating` class words) that are tied to semantic meaning, not
generated class names. When YP updates their CSS, this parser keeps working.

Every record ships a `parse_confidence` score (0.0–1.0). Below 0.7 is a machine-readable signal that the page structure has drifted — your data pipeline can filter automatically.

| | This actor | Generic YP scrapers |
|---|---|---|
| Parse method | Structural HTML anchors (resilient) | CSS class names (breaks on redeploy) |
| parse_confidence | Yes — per record | No |
| Address parsing | Dual-pattern (service-area + physical) | Often breaks on service-area listings |
| Proxy required | No | Often yes |
| `is_ad` flag | Yes (filter sponsored) | Rarely |
| `years_in_business` | Yes (unique YP data point) | Rarely |
| Price | $1.50/1K | $2-5/1K |

### Use with AI agents (MCP)

This scraper is callable as a **tool by AI agents** (Claude Desktop, Cursor, VS Code, n8n, LangGraph, CrewAI, or any MCP-compatible client) via Apify's hosted Model Context Protocol server. Any AI agent can look up US business contacts mid-conversation.

Point your MCP client at this tool:

```json
{
  "mcpServers": {
    "apify": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "https://mcp.apify.com/?tools=bovi/yellowpages-scraper",
        "--header",
        "Authorization: Bearer <YOUR_APIFY_TOKEN>"
      ]
    }
  }
}
````

### Pricing example

Pay-per-result: **$1.50 per 1,000 business records** ($0.0015/record). You only pay for actual results scraped — no monthly fee, no minimum.

| Run size | Cost |
|---|---|
| 100 businesses | $0.15 |
| 500 businesses | $0.75 |
| 1,000 businesses | $1.50 |
| 5,000 businesses | $7.50 |

### Output sample

```json
{
  "business_name": "Austin Plumbing Co",
  "phone": "(512) 555-0147",
  "street": "1423 S Lamar Blvd",
  "city": "Austin",
  "state": "TX",
  "zip": "78704",
  "website": "https://austinplumbing.com",
  "categories": ["Plumbers", "Drain Cleaning"],
  "rating": 4.5,
  "review_count": 87,
  "years_in_business": 12,
  "yp_url": "https://www.yellowpages.com/austin-tx/mip/austin-plumbing-co-123456",
  "is_ad": false,
  "parse_confidence": 1.0,
  "warnings": []
}
```

### FAQ

**Do I need a proxy or API key?** No — Yellow Pages serves plain HTML without heavy bot protection. No proxy or API key required. The actor runs zero-COGS.

**What export formats are available?** JSON, CSV, Excel, and XML — downloadable from the dataset page or via the Apify REST API.

**Can I schedule regular runs?** Yes. Use Apify Scheduler (or n8n/Zapier) to run on a schedule and push new records to Google Sheets, a CRM, or a webhook.

**What if the actor returns empty results?** Confirm the `searchTerms` and `location` match Yellow Pages conventions (e.g. "Austin, TX" not "Austin Texas"). YP returns empty when there are no results for that exact search pair.

### Use cases

- **Lead generation**: build targeted prospect lists (plumbers in Austin, dentists in Chicago)
- **Local SEO research**: audit competitors' listings, ratings, and categories
- **Market research**: map service providers in a region
- **Data enrichment**: match phone/address data to existing business lists

### Technical notes

- No proxy required for most runs — YP is plain HTML without heavy anti-bot
- Handles both service-area listings (no street address) and physical locations
- Filters YP redirect links to return the actual business website URL
- Pagination via `?page=N` — 30 results per page

### Integrations

Built for lead-gen teams and local-market researchers extracting US business contacts and ratings by category and city — the JSON/dataset output drops into the tools you already run, no glue code:

- **n8n / Make / Zapier** — trigger a run or pipe every new dataset item into 500+ apps (Google Sheets, Airtable, Slack, HubSpot, your database) with no code: [n8n](https://docs.apify.com/platform/integrations/n8n), [Make](https://docs.apify.com/platform/integrations/make), [Zapier](https://docs.apify.com/platform/integrations/zapier).
- **Webhooks** — fire your own endpoint the moment a run finishes, to push results straight into your pipeline ([docs](https://docs.apify.com/platform/integrations/webhooks)).
- **MCP server** — expose this actor as a tool to Claude, Cursor, or any [MCP client](https://mcp.apify.com) so an AI agent can pull this data mid-conversation ([guide](https://blog.apify.com/how-to-use-mcp/)).
- **API & SDKs** — fetch the dataset as JSON, CSV, or Excel through the Apify REST API or the Python / JS SDKs.

See all [Apify integrations](https://apify.com/integrations).

# Actor input Schema

## `searchTerms` (type: `string`):

Business type, keyword, or category to search. Example: 'plumbers', 'dentists', 'restaurants', 'auto repair'.

## `location` (type: `string`):

City + state or ZIP code to search in. Example: 'Austin, TX', 'New York, NY', '90210'.

## `maxResults` (type: `integer`):

Maximum number of business records to return. Each returned record costs one PPE charge ($1.50/1K). Yellow Pages shows up to 30 per page; set higher to paginate. Default: 30.

## `includeAds` (type: `boolean`):

When true, sponsored (ad) listings are included alongside organic results. Default: false (organic only).

## `startPage` (type: `integer`):

Page number to start scraping from (1 = first page). Useful for resuming interrupted runs or sampling mid-list. Default: 1.

## `proxyConfiguration` (type: `object`):

Proxy used to fetch Yellow Pages. Defaults to Apify Residential proxy — helps avoid transient blocks. Billed to the run owner; no external proxy account needed.

## Actor input object example

```json
{
  "searchTerms": "plumbers",
  "location": "Austin, TX",
  "maxResults": 30,
  "includeAds": false,
  "startPage": 1,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}
```

# Actor output Schema

## `results` (type: `string`):

Dataset containing Yellowpages Scraper records (business\_name, phone, street, city, state, website, categories, rating, review\_count, years\_in\_business, yp\_url, parse\_confidence).

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchTerms": "plumbers",
    "location": "Austin, TX",
    "maxResults": 30,
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ]
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("bovi/yellowpages-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "searchTerms": "plumbers",
    "location": "Austin, TX",
    "maxResults": 30,
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
    },
}

# Run the Actor and wait for it to finish
run = client.actor("bovi/yellowpages-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchTerms": "plumbers",
  "location": "Austin, TX",
  "maxResults": 30,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}' |
apify call bovi/yellowpages-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=bovi/yellowpages-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Yellow Pages Scraper — US Business Leads | from $1.50/1K",
        "description": "Scrape Yellow Pages US business listings — name, phone, address, website, categories, rating. Bulk lead-gen by search term + location. Dual address parsing, organic/ad flag, clean website URLs. Each record has parse_confidence.",
        "version": "0.1",
        "x-build-id": "QPmtrmyTrAg027Ff8"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/bovi~yellowpages-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-bovi-yellowpages-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/bovi~yellowpages-scraper/runs": {
            "post": {
                "operationId": "runs-sync-bovi-yellowpages-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/bovi~yellowpages-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-bovi-yellowpages-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "searchTerms",
                    "location"
                ],
                "properties": {
                    "searchTerms": {
                        "title": "Search terms",
                        "type": "string",
                        "description": "Business type, keyword, or category to search. Example: 'plumbers', 'dentists', 'restaurants', 'auto repair'."
                    },
                    "location": {
                        "title": "Location",
                        "type": "string",
                        "description": "City + state or ZIP code to search in. Example: 'Austin, TX', 'New York, NY', '90210'."
                    },
                    "maxResults": {
                        "title": "Max results",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of business records to return. Each returned record costs one PPE charge ($1.50/1K). Yellow Pages shows up to 30 per page; set higher to paginate. Default: 30.",
                        "default": 30
                    },
                    "includeAds": {
                        "title": "Include sponsored/ad listings",
                        "type": "boolean",
                        "description": "When true, sponsored (ad) listings are included alongside organic results. Default: false (organic only).",
                        "default": false
                    },
                    "startPage": {
                        "title": "Start page",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Page number to start scraping from (1 = first page). Useful for resuming interrupted runs or sampling mid-list. Default: 1.",
                        "default": 1
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Proxy used to fetch Yellow Pages. Defaults to Apify Residential proxy — helps avoid transient blocks. Billed to the run owner; no external proxy account needed.",
                        "default": {
                            "useApifyProxy": true,
                            "apifyProxyGroups": [
                                "RESIDENTIAL"
                            ]
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
