Lawyer.com Directory Scraper avatar

Lawyer.com Directory Scraper

Pricing

Pay per event

Go to Apify Store
Lawyer.com Directory Scraper

Lawyer.com Directory Scraper

Extract Lawyer.com attorney and law firm listings with profile URLs, phones, addresses, firms, practice areas, and pagination source data.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share

Overview

This actor extracts law firm and lawyer directory entries from Lawyer.com for a given search term and location. It resolves nearby results through Lawyer.com autosuggest endpoints, builds profile URLs, and enriches each profile with contact details where available.

What does it do

Lawyer.com Directory Scraper turns public Lawyer.com directory pages into structured lead and research data. It can start from a practice area and location, such as bankruptcy in New York, NY, or from one or more direct Lawyer.com listing URLs. For supported practice/location combinations, the actor follows listing pagination, extracts the visible lawyer cards, and returns clean profile links, firm names, phone numbers, addresses, practice areas, and source metadata.

When a search term cannot be mapped to a public directory page, the actor falls back to Lawyer.com autosuggest/profile discovery. This keeps common prospecting runs useful while still allowing exact listing URL runs for repeatable coverage.

Why use it

Use this actor when you need a repeatable legal-directory extraction workflow instead of manually copying Lawyer.com search results. It is built for directory-scale collection, CRM enrichment, market research, and local legal services analysis.

The actor is HTTP-first and does not open a browser, so it is fast and cost-efficient for supported public pages. It also exposes direct listing URL input, which lets analysts run large batches across multiple cities or practice categories with deterministic source pages.

Who is it for

  • Legal services agencies building local lead lists.
  • Go-to-market teams collecting attorney contact information.
  • Sales teams validating CRM contacts for legal firms.
  • Researchers tracking directories in a specific geography.

Core use cases

  • Collecting names, types, and profile links for lawyers/firms.
  • Pulling public contact details for outreach workflows.
  • Enriching prospecting pipelines by location or practice niche.
  • Auditing the presence of a firm or lawyer in specific markets.

Input

Required inputs

  • searchQuery (string, required)

    • Keywords for the directory search.
    • Examples: bankruptcy, family law, Johnson legal services.
  • locationQuery (string, required)

    • Location string that can be resolved by Lawyer.com geosuggest.
    • Examples: New York, 90210, Austin TX.

Optional controls

  • maxResults (number)

    • Maximum number of profile entries to attempt.
    • Default: 50.
    • Practical cap for cost control: we recommend 20 for initial tests.
  • maxRequestRetries (number)

    • Retry limit for failed HTTP calls.
    • Default: 3.

Input validation behavior

  • Provide either listingUrls or both searchQuery and locationQuery.
  • Direct Lawyer.com listing URLs are scraped first and support pagination.
  • When listing URLs are omitted, the actor builds a Lawyer.com directory URL from common practice/location terms and falls back to autosuggest discovery.
  • Empty queries are rejected before web traffic begins.

Output

Each dataset item has these fields:

  • name — Lawyer or firm display name.
  • firmName — Firm or office name shown on the directory card.
  • profileUrl — Public profile URL.
  • firmUrl — Public firm profile URL when shown.
  • profileTypelawyer or office (inferred from source payload).
  • sourceLocation — Source location string returned by search response.
  • city — City parsed from the listing address.
  • distanceMiles — Distance text/number from geolocation source.
  • phone — Telephone number if available.
  • address — Address fields merged from profile payload and fallback selectors.
  • description — Biography or summary text if present.
  • practiceAreas — Practice area array (normalized from structured and page sources).
  • website — External website URL if present.
  • profileImageUrl — Profile image URL when available.
  • state — State extracted from location metadata.
  • countryCode — Country code from location metadata.
  • sourceListingUrl — Directory page where the record was found.
  • pageNumber — Pagination page visited for the record.
  • positionOnPage — Position on that source page.

Output dataset / examples

Minimal output row

{
"name": "Harbor Family Law",
"firmName": "Harbor Family Law PLLC",
"profileType": "office",
"profileUrl": "https://www.lawyer.com/firm/harbor-family-law",
"firmUrl": "https://www.lawyer.com/firms/harbor-family-law-pllc.html",
"sourceLocation": "New York, NY",
"city": "New York",
"distanceMiles": 0,
"phone": "+1 555-123-4567",
"address": "12 Legal Ave, New York, NY 10001",
"description": "Family law attorneys focused on divorce and custody.",
"practiceAreas": ["Family Law", "Divorce"],
"website": "https://examplelaw.com",
"profileImageUrl": "https://www.lawyer.com/img/logo.png",
"state": "NY",
"countryCode": "US",
"sourceListingUrl": "https://www.lawyer.com/new-york-bankruptcy-debt-lawyer-ny.htm",
"pageNumber": 1,
"positionOnPage": 3
}

Notes on missing fields

Fields are best-effort extracted. If a source page does not expose a value, this actor may return an empty string or empty array. Rows without address are still useful for follow-up enrichment and can be filtered downstream.

Example input

{
"searchQuery": "bankruptcy",
"locationQuery": "New York",
"maxResults": 20,
"maxRequestRetries": 3
}

Example Run API usage

Node.js (Apify SDK)

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({
token: 'APIFY_API_TOKEN',
});
const actor = client.actor('automation-lab/lawyer-com-directory-scraper');
const run = await actor.call({
searchQuery: 'bankruptcy',
locationQuery: 'New York, NY',
listingUrls: ['https://www.lawyer.com/new-york-bankruptcy-debt-lawyer-ny.htm'],
maxResults: 20,
maxPages: 2,
maxRequestRetries: 3,
});
console.log('Run ID:', run.id);
console.log('Default dataset items:', run.defaultDatasetId);

Python (apify-client)

from apify_client import ApifyClient
client = ApifyClient('APIFY_API_TOKEN')
actor = client.actor('automation-lab/lawyer-com-directory-scraper')
run = actor.call(run_input={
'searchQuery': 'bankruptcy',
'locationQuery': 'New York, NY',
'listingUrls': ['https://www.lawyer.com/new-york-bankruptcy-debt-lawyer-ny.htm'],
'maxResults': 20,
'maxPages': 2,
'maxRequestRetries': 3,
})
print('Run ID:', run['id'])
print('Dataset:', run['defaultDatasetId'])

cURL

curl -X POST \
-H 'Content-Type: application/json' \
-d '{"input":{"searchQuery":"bankruptcy","locationQuery":"New York, NY","listingUrls":["https://www.lawyer.com/new-york-bankruptcy-debt-lawyer-ny.htm"],"maxResults":20,"maxPages":2,"maxRequestRetries":3}}' \
"https://api.apify.com/v2/acts/automation-lab~lawyer-com-directory-scraper/runs?token=$APIFY_API_TOKEN"

MCP integration

Use Apify's hosted MCP endpoint for this actor:

https://mcp.apify.com?tools=automation-lab/lawyer-com-directory-scraper

The actor maps naturally to a simple MCP tool configuration.

  • Expose one tool: run_lawyer_directory_scraper.
  • Input schema should mirror the actor input fields.
  • Return value should be the dataset item array from the run.
  • Keep maxResults small during smoke test loops for deterministic cost behavior.

Claude Desktop MCP setup

{
"mcpServers": {
"lawyer-scraper": {
"url": "https://mcp.apify.com?tools=automation-lab/lawyer-com-directory-scraper",
"headers": {
"Authorization": "Bearer APIFY_API_TOKEN"
}
}
}
}

Claude Code MCP setup

$claude mcp add --transport http lawyer-scraper "https://mcp.apify.com?tools=automation-lab/lawyer-com-directory-scraper"

Cursor and VS Code MCP setup

Add a remote MCP server with this URL and an Authorization: Bearer APIFY_API_TOKEN header:

https://mcp.apify.com?tools=automation-lab/lawyer-com-directory-scraper

MCP usage prompts

Example prompt:

Find 20 bankruptcy lawyers in New York and return entries with phone and website fields only.

Example response behavior:

Provide the extracted list in CSV with columns name, profileUrl, phone, website, practiceAreas, and address.

Suggested JSON response contract

{
"tool": "run_lawyer_directory_scraper",
"input": {
"searchQuery": "bankruptcy",
"locationQuery": "New York",
"maxResults": 20,
"maxRequestRetries": 3
}
}

Pricing

  • One-time start charge: $0.005 for each actor run.
  • Item charges are metered per extracted record using tiered pay-per-event pricing.
  • Free tier example: 25 extracted items costs about $0.0338 before platform plan allowances.
  • Bronze tier example: 100 extracted items costs about $0.105 at the configured listing price.
  • Use smaller maxResults values while iterating on queries to keep costs predictable.
  • Re-use repeated runs on stable inputs if your pipeline retries.

Integrations and workflow patterns

  • Send extracted lawyers into a CRM as new local legal-services leads.
  • Join output with a website enrichment actor to validate external domains.
  • Use Apify webhooks to notify a downstream workflow when a city/practice scrape finishes.
  • Run multiple direct listingUrls in one actor call when you need larger practice-area coverage.
  • Trigger the actor from Apify MCP to let an AI assistant gather public lawyer directory data during research.

API behavior and reliability

Network and scraping strategy

The actor uses server-rendered directory pages first, follows listing pagination, and uses endpoint-based discovery as a fallback. It is intentionally HTTP-first and does not use browser rendering.

Rate and retry behavior

  • Request retries are controlled by maxRequestRetries.
  • HTTP failures can still happen on anti-bot-protected pages.
  • Unreachable or malformed profiles are skipped while preserving the main run integrity.

Why some rows have missing data

  • Not every profile exposes all fields.
  • Some entries can be placeholders with no public profile details.
  • Structured data may be inconsistent across profile types.

Limitations

  • This actor depends on public Lawyer.com pages and their current JSON/HTML structure.
  • Anti-bot changes or HTML redesigns can reduce extraction completeness.
  • If the directory API changes path/query semantics, field availability may shift.
  • No CAPTCHA solving is implemented in this actor.

Legality and permissions

The actor is intended for lawful data collection and testing. Users should:

  • Verify whether their use case complies with Lawyer.com terms.
  • Respect applicable data use and privacy rules.
  • Avoid collecting personal data for unauthorized purposes.

Troubleshooting

No results returned

  • Check if searchQuery is specific enough.
  • Verify that locationQuery resolves to a known location.
  • Try a larger city or state instead of a narrow ZIP.

Run succeeds but dataset is empty

  • The query may not match public entries.
  • Location resolution can drift for ambiguous names.
  • Profile enrichment endpoints may be temporarily unavailable.

Sudden extraction drops

  • Lawyer.com can throttle at source.
  • Retry with lower maxResults and higher maxRequestRetries.
  • Compare multiple test runs and only keep stable result patterns.

FAQ

Does this return only lawyers?

It returns both lawyers and office/firm entries. profileType helps separate them downstream.

Can I get complete pagination?

The source payload determines available suggestions. If upstream results are limited, follow-up runs with tuned location and query terms are recommended.

Can I set memory options?

Yes. Default run memory is set to 256 MB for baseline runs. The platform may allow overriding run memory per invocation.

Does this work with large query volumes?

Yes, but for large sweeps consider smaller batches and strict filtering to protect cost.

Changelog notes

  • Initial release: directory search + profile enrichment and dataset output.
  • Added profile-type distinction (lawyer vs office).
  • Added local best-effort enrichment from JSON-LD and DOM fallbacks.
  • Added usage examples for API and automation integration.

Security notes

  • API tokens should never be hardcoded in actor source code.
  • Store credentials in secure platform secret stores.
  • Limit sharing of scraped output with unauthorized parties.

Roadmap

  • Add optional deep profile enrichment with office contact deduplication.
  • Add stricter address normalization by region and country.
  • Add optional webhook callback on run completion.

Final notes

Keep sample runs conservative during validation. When you move to production, increase coverage gradually and monitor charged-event counts, run failure rate, and output completeness.