Chamber Directory Scraper pro avatar

Chamber Directory Scraper pro

Pricing

from $0.00005 / actor start

Go to Apify Store
Chamber Directory Scraper pro

Chamber Directory Scraper pro

Scrape business listings from Chamber of Commerce directory pages. Extracts name, category, address, phone, and website. Works with static HTML directories (Squarespace, WordPress, custom). ChamberMaster/GrowthZone sites are auto-detected and gracefully rejected.

Pricing

from $0.00005 / actor start

Rating

0.0

(0)

Developer

Anh Nguyen

Anh Nguyen

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

2 days ago

Last modified

Share

Chamber Directory Scraper

Extract business listings from chamber of commerce member directories. Get company names, contact details, addresses, websites, and social links in a clean, structured format.

Key Features

  • Automatic Pagination - Crawls multiple pages automatically; no manual configuration for page links when pagination is detected
  • Smart Adapters - Detects and handles different directory platforms (static HTML, GrowthZone, ChamberMaster)
  • Clean Data - Removes duplicates, strips UTM parameters, keeps business websites separate from social profiles
  • Structured Output - Consistent JSON: name, phone (normalized when possible), email, address, optional city/state/zip, website, optional social URLs
  • Optional Directory Discovery - Paste a chamber homepage and let the actor pick a likely member-directory URL before crawling
  • Fast & Reliable - Built with Crawlee + Playwright

Use Cases

  • Lead Generation - Build targeted B2B contact lists
  • Market Research - Analyze local business landscapes
  • Sales Prospecting - Find potential customers in specific regions
  • Data Enrichment - Supplement existing business databases

Input

{
"startUrl": "https://www.carmichaelchamber.com/list",
"maxPages": 5,
"discoverDirectory": false
}

Parameters:

  • startUrl (required) - Directory listing URL, or chamber homepage if discoverDirectory is true
  • maxPages (optional, default: 10) - Maximum number of pages to crawl
  • discoverDirectory (optional, default: false) - When true, opens startUrl, scores same-site links that look like a member/directory page, and crawls the best match; if none qualify, falls back to startUrl

Output

Each business listing may include:

{
"name": "Biegler CPA Inc",
"phone": "(916) 485-1040",
"email": "info@bieglercpa.com",
"address": "6608 Folsom Auburn Rd, Folsom, CA 95630",
"city": "Folsom",
"state": "CA",
"zip": "95630",
"website": "https://www.bieglercpa.com",
"facebook": "https://www.facebook.com/example",
"linkedin": "https://www.linkedin.com/company/example",
"sourceUrl": "https://www.example-chamber.com/directory"
}

Notes:

  • city, state, zip are best-effort (US-style lines ending in ST 12345 or City, ST ZIP). If parsing is uncertain, only address is set.
  • US phone numbers with 10 national digits are formatted as (XXX) XXX-XXXX. Other valid numbers (7+ digits) are kept after sanitizing encoding artifacts (e.g. %20, non-breaking spaces).
  • Social fields appear when those links exist on the listing or profile page.

Supported Platforms

  • Static HTML directories - Traditional chamber websites
  • GrowthZone/ChamberMaster - Modern chamber management platforms
  • Squarespace directories - Including profile-style listings
  • Generic fallback - Works with many directory-like layouts

How It Works

  1. Optional discovery - If enabled, finds a directory URL from the homepage
  2. Detects directory type - Chooses an adapter pipeline
  3. Extracts listings - Pulls fields using platform-specific logic
  4. Profile enrichment - For same-site profile URLs, may open the detail page for richer data
  5. Cleans & normalizes - Phone, URLs, optional address parts, deduplication

Tips for Best Results

  • Prefer the main directory listing URL for predictable results
  • For discoverDirectory, use the real chamber homepage; review the log line that prints the chosen URL
  • For large directories, increase maxPages
  • Some GrowthZone sites work best with URLs like /list/FindStartsWith?term=A

Example Runs

Carmichael Chamber (Squarespace-style):

  • Input: startUrl https://www.carmichaelchamber.com/list, maxPages as needed
  • Expect on the order of hundreds of listings depending on pagination

Greensboro Chamber (GrowthZone):

  • Input: https://chamber.greensboro.org/list/FindStartsWith?term=A
  • Large alphabetical result sets

For store listings, capture a few successful run logs and dataset previews (screenshots of the Apify run overview and a sample JSON row) so users can see what to expect.

Technical Details

  • Crawlee and Playwright (TypeScript)
  • Adapter pattern for extensibility
  • Automatic request retries for failed pages
  • Crawler uses bounded concurrency and request rate limits (see src/main.ts)

Notes

  • Data quality depends on the source HTML
  • Some chambers require login (you may get few or no rows)
  • Listings need at least one of: phone, email, physical address, or business website
  • Website live checks (HTTP ping) are intentionally not part of this actor to keep runs fast and avoid false negatives

Troubleshooting

No results returned?

  • Confirm the URL is a public listing (not behind login)
  • Try discoverDirectory if you pasted a homepage by mistake
  • For GrowthZone, try the /list/FindStartsWith?term=A pattern

Fewer results than expected?

  • Increase maxPages
  • Check logs for pagination and adapter selection

Support

For issues or feature requests, contact the actor maintainer.


Last updated: 2026-04-11