Lawyer.com Directory Scraper
Pricing
Pay per event
Lawyer.com Directory Scraper
Extract Lawyer.com attorney and law firm listings with profile URLs, phones, addresses, firms, practice areas, and pagination source data.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Overview
This actor extracts law firm and lawyer directory entries from Lawyer.com for a given search term and location. It resolves nearby results through Lawyer.com autosuggest endpoints, builds profile URLs, and enriches each profile with contact details where available.
What does it do
Lawyer.com Directory Scraper turns public Lawyer.com directory pages into structured lead and research data. It can start from a practice area and location, such as bankruptcy in New York, NY, or from one or more direct Lawyer.com listing URLs. For supported practice/location combinations, the actor follows listing pagination, extracts the visible lawyer cards, and returns clean profile links, firm names, phone numbers, addresses, practice areas, and source metadata.
When a search term cannot be mapped to a public directory page, the actor falls back to Lawyer.com autosuggest/profile discovery. This keeps common prospecting runs useful while still allowing exact listing URL runs for repeatable coverage.
Why use it
Use this actor when you need a repeatable legal-directory extraction workflow instead of manually copying Lawyer.com search results. It is built for directory-scale collection, CRM enrichment, market research, and local legal services analysis.
The actor is HTTP-first and does not open a browser, so it is fast and cost-efficient for supported public pages. It also exposes direct listing URL input, which lets analysts run large batches across multiple cities or practice categories with deterministic source pages.
Who is it for
- Legal services agencies building local lead lists.
- Go-to-market teams collecting attorney contact information.
- Sales teams validating CRM contacts for legal firms.
- Researchers tracking directories in a specific geography.
Core use cases
- Collecting names, types, and profile links for lawyers/firms.
- Pulling public contact details for outreach workflows.
- Enriching prospecting pipelines by location or practice niche.
- Auditing the presence of a firm or lawyer in specific markets.
Input
Required inputs
-
searchQuery(string, required)- Keywords for the directory search.
- Examples:
bankruptcy,family law,Johnson legal services.
-
locationQuery(string, required)- Location string that can be resolved by Lawyer.com geosuggest.
- Examples:
New York,90210,Austin TX.
Optional controls
-
maxResults(number)- Maximum number of profile entries to attempt.
- Default:
50. - Practical cap for cost control: we recommend 20 for initial tests.
-
maxRequestRetries(number)- Retry limit for failed HTTP calls.
- Default:
3.
Input validation behavior
- Provide either
listingUrlsor bothsearchQueryandlocationQuery. - Direct Lawyer.com listing URLs are scraped first and support pagination.
- When listing URLs are omitted, the actor builds a Lawyer.com directory URL from common practice/location terms and falls back to autosuggest discovery.
- Empty queries are rejected before web traffic begins.
Output
Each dataset item has these fields:
name— Lawyer or firm display name.firmName— Firm or office name shown on the directory card.profileUrl— Public profile URL.firmUrl— Public firm profile URL when shown.profileType—lawyeroroffice(inferred from source payload).sourceLocation— Source location string returned by search response.city— City parsed from the listing address.distanceMiles— Distance text/number from geolocation source.phone— Telephone number if available.address— Address fields merged from profile payload and fallback selectors.description— Biography or summary text if present.practiceAreas— Practice area array (normalized from structured and page sources).website— External website URL if present.profileImageUrl— Profile image URL when available.state— State extracted from location metadata.countryCode— Country code from location metadata.sourceListingUrl— Directory page where the record was found.pageNumber— Pagination page visited for the record.positionOnPage— Position on that source page.
Output dataset / examples
Minimal output row
{"name": "Harbor Family Law","firmName": "Harbor Family Law PLLC","profileType": "office","profileUrl": "https://www.lawyer.com/firm/harbor-family-law","firmUrl": "https://www.lawyer.com/firms/harbor-family-law-pllc.html","sourceLocation": "New York, NY","city": "New York","distanceMiles": 0,"phone": "+1 555-123-4567","address": "12 Legal Ave, New York, NY 10001","description": "Family law attorneys focused on divorce and custody.","practiceAreas": ["Family Law", "Divorce"],"website": "https://examplelaw.com","profileImageUrl": "https://www.lawyer.com/img/logo.png","state": "NY","countryCode": "US","sourceListingUrl": "https://www.lawyer.com/new-york-bankruptcy-debt-lawyer-ny.htm","pageNumber": 1,"positionOnPage": 3}
Notes on missing fields
Fields are best-effort extracted. If a source page does not expose a value, this actor may return an empty string or empty array. Rows without address are still useful for follow-up enrichment and can be filtered downstream.
Example input
{"searchQuery": "bankruptcy","locationQuery": "New York","maxResults": 20,"maxRequestRetries": 3}
Example Run API usage
Node.js (Apify SDK)
import { ApifyClient } from 'apify-client';const client = new ApifyClient({token: 'APIFY_API_TOKEN',});const actor = client.actor('automation-lab/lawyer-com-directory-scraper');const run = await actor.call({searchQuery: 'bankruptcy',locationQuery: 'New York, NY',listingUrls: ['https://www.lawyer.com/new-york-bankruptcy-debt-lawyer-ny.htm'],maxResults: 20,maxPages: 2,maxRequestRetries: 3,});console.log('Run ID:', run.id);console.log('Default dataset items:', run.defaultDatasetId);
Python (apify-client)
from apify_client import ApifyClientclient = ApifyClient('APIFY_API_TOKEN')actor = client.actor('automation-lab/lawyer-com-directory-scraper')run = actor.call(run_input={'searchQuery': 'bankruptcy','locationQuery': 'New York, NY','listingUrls': ['https://www.lawyer.com/new-york-bankruptcy-debt-lawyer-ny.htm'],'maxResults': 20,'maxPages': 2,'maxRequestRetries': 3,})print('Run ID:', run['id'])print('Dataset:', run['defaultDatasetId'])
cURL
curl -X POST \-H 'Content-Type: application/json' \-d '{"input":{"searchQuery":"bankruptcy","locationQuery":"New York, NY","listingUrls":["https://www.lawyer.com/new-york-bankruptcy-debt-lawyer-ny.htm"],"maxResults":20,"maxPages":2,"maxRequestRetries":3}}' \"https://api.apify.com/v2/acts/automation-lab~lawyer-com-directory-scraper/runs?token=$APIFY_API_TOKEN"
MCP integration
Use Apify's hosted MCP endpoint for this actor:
https://mcp.apify.com?tools=automation-lab/lawyer-com-directory-scraper
The actor maps naturally to a simple MCP tool configuration.
- Expose one tool:
run_lawyer_directory_scraper. - Input schema should mirror the actor input fields.
- Return value should be the dataset item array from the run.
- Keep
maxResultssmall during smoke test loops for deterministic cost behavior.
Claude Desktop MCP setup
{"mcpServers": {"lawyer-scraper": {"url": "https://mcp.apify.com?tools=automation-lab/lawyer-com-directory-scraper","headers": {"Authorization": "Bearer APIFY_API_TOKEN"}}}}
Claude Code MCP setup
$claude mcp add --transport http lawyer-scraper "https://mcp.apify.com?tools=automation-lab/lawyer-com-directory-scraper"
Cursor and VS Code MCP setup
Add a remote MCP server with this URL and an Authorization: Bearer APIFY_API_TOKEN header:
https://mcp.apify.com?tools=automation-lab/lawyer-com-directory-scraper
MCP usage prompts
Example prompt:
Find 20 bankruptcy lawyers in New York and return entries with phone and website fields only.
Example response behavior:
Provide the extracted list in CSV with columns
name,profileUrl,phone,website,practiceAreas, andaddress.
Suggested JSON response contract
{"tool": "run_lawyer_directory_scraper","input": {"searchQuery": "bankruptcy","locationQuery": "New York","maxResults": 20,"maxRequestRetries": 3}}
Pricing
- One-time start charge:
$0.005for each actor run. - Item charges are metered per extracted record using tiered pay-per-event pricing.
- Free tier example: 25 extracted items costs about
$0.0338before platform plan allowances. - Bronze tier example: 100 extracted items costs about
$0.105at the configured listing price. - Use smaller
maxResultsvalues while iterating on queries to keep costs predictable. - Re-use repeated runs on stable inputs if your pipeline retries.
Integrations and workflow patterns
- Send extracted lawyers into a CRM as new local legal-services leads.
- Join output with a website enrichment actor to validate external domains.
- Use Apify webhooks to notify a downstream workflow when a city/practice scrape finishes.
- Run multiple direct
listingUrlsin one actor call when you need larger practice-area coverage. - Trigger the actor from Apify MCP to let an AI assistant gather public lawyer directory data during research.
API behavior and reliability
Network and scraping strategy
The actor uses server-rendered directory pages first, follows listing pagination, and uses endpoint-based discovery as a fallback. It is intentionally HTTP-first and does not use browser rendering.
Rate and retry behavior
- Request retries are controlled by
maxRequestRetries. - HTTP failures can still happen on anti-bot-protected pages.
- Unreachable or malformed profiles are skipped while preserving the main run integrity.
Why some rows have missing data
- Not every profile exposes all fields.
- Some entries can be placeholders with no public profile details.
- Structured data may be inconsistent across profile types.
Limitations
- This actor depends on public Lawyer.com pages and their current JSON/HTML structure.
- Anti-bot changes or HTML redesigns can reduce extraction completeness.
- If the directory API changes path/query semantics, field availability may shift.
- No CAPTCHA solving is implemented in this actor.
Legality and permissions
The actor is intended for lawful data collection and testing. Users should:
- Verify whether their use case complies with Lawyer.com terms.
- Respect applicable data use and privacy rules.
- Avoid collecting personal data for unauthorized purposes.
Troubleshooting
No results returned
- Check if
searchQueryis specific enough. - Verify that
locationQueryresolves to a known location. - Try a larger city or state instead of a narrow ZIP.
Run succeeds but dataset is empty
- The query may not match public entries.
- Location resolution can drift for ambiguous names.
- Profile enrichment endpoints may be temporarily unavailable.
Sudden extraction drops
- Lawyer.com can throttle at source.
- Retry with lower
maxResultsand highermaxRequestRetries. - Compare multiple test runs and only keep stable result patterns.
FAQ
Does this return only lawyers?
It returns both lawyers and office/firm entries.
profileType helps separate them downstream.
Can I get complete pagination?
The source payload determines available suggestions. If upstream results are limited, follow-up runs with tuned location and query terms are recommended.
Can I set memory options?
Yes. Default run memory is set to 256 MB for baseline runs. The platform may allow overriding run memory per invocation.
Does this work with large query volumes?
Yes, but for large sweeps consider smaller batches and strict filtering to protect cost.
Related scrapers
Changelog notes
- Initial release: directory search + profile enrichment and dataset output.
- Added profile-type distinction (
lawyervsoffice). - Added local best-effort enrichment from JSON-LD and DOM fallbacks.
- Added usage examples for API and automation integration.
Security notes
- API tokens should never be hardcoded in actor source code.
- Store credentials in secure platform secret stores.
- Limit sharing of scraped output with unauthorized parties.
Roadmap
- Add optional deep profile enrichment with office contact deduplication.
- Add stricter address normalization by region and country.
- Add optional webhook callback on run completion.
Final notes
Keep sample runs conservative during validation. When you move to production, increase coverage gradually and monitor charged-event counts, run failure rate, and output completeness.