Justia Scraper | Socials & Emails | $4 / 1k
Pricing
$3.99 / 1,000 results
Justia Scraper | Socials & Emails | $4 / 1k
Scrape verified US attorney profiles from Justia with structured identity, firm, contact, social, and email enrichment data. Built for lead gen, analytics, CRM sync and repeatable legal intelligence workflows. Clean JSON output.
Pricing
$3.99 / 1,000 results
Rating
5.0
(2)
Developer

Fatih Tahta
Actor stats
1
Bookmarked
24
Total users
3
Monthly active users
3.6 days
Issues response
4 days ago
Last modified
Categories
Share
Justia Scraper | Socials & Emails
Slug: fatihtahta/justia-scraper
This actor is built upon the ValidatedMails.com architecture for email enrichment workflows.
1) Overview
This actor collects public attorney profile records from Justia, including identity, profile links, contact details, social profiles, office information, practice areas, credentials, and optional email enrichment fields. It supports structured collection from state and practice-area searches as well as direct listing/profile URLs, so teams can gather consistent records across different acquisition paths.
Justia is a widely used legal directory, which makes it a practical source for attorney discovery, regional analysis, and professional data enrichment workflows. The actor is built for repeatable automation so you can run the same inputs on a schedule and maintain stable downstream datasets. This reduces manual research time and helps teams move faster with cleaner, standardized output.
2) Why Use This Actor
- Market research & analytics teams: Compare attorney coverage by state and practice area, track profile attributes over time, and build reporting-ready datasets for trend analysis.
- Product & content teams: Source attorney metadata for legal directories, landing pages, or discovery features, including bios, practice areas, and office details.
- Developers & data engineering teams: Feed normalized JSON records into ETL/ELT jobs, data warehouses, search indexes, and internal APIs with minimal transformation.
- Lead generation & enrichment teams: Build prospect lists with social links, firm websites, phone numbers, and optional email fields to support enrichment and qualification workflows.
- Monitoring & competitive tracking teams: Re-run the same inputs on a cadence to detect profile updates, regional shifts, or category-level changes.
3) Input Parameters
Provide any combination of URLs, queries, and filters to control what gets collected.
| Parameter | Type | Description | Default |
|---|---|---|---|
location | string | State selector for query-based collection. Allowed values: Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, Florida, Georgia, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, Washington, DC, West Virginia, Wisconsin, Wyoming. | "California" |
practiceArea | string | Practice area selector for query-based collection. Allowed values: All, Personal Injury, Medical Malpractice, Criminal Law, DUI, Family Law, Divorce, Bankruptcy, Business Law, Consumer Law, Employment Law, Estate Planning, Foreclosure Defense, Immigration Law, Intellectual Property, Nursing Home Abuse, Probate, Products Liability, Real Estate Law, Tax Law, Workers' Compensation, Agricultural Law, Animal & Dog Law, Traffic Tickets, Antitrust Law, Appeals & Appellate, Arbitration & Mediation, Asbestos & Mesothelioma, Cannabis & Marijuana Law, Civil Rights, Communications & Internet Law, Construction Law, Domestic Violence, Education Law, Elder Law, Energy, Oil & Gas Law, Entertainment & Sports Law, Collections, Environmental Law, Gov & Administrative Law, Insurance Claims, Insurance Defense, International Law, Juvenile Law, Landlord Tenant, Legal Malpractice, Health Care Law, Maritime Law, Military Law, Native American Law, Patents, Municipal Law, Securities Law, Trademarks, White Collar Crime, Social Security Disability, Stockbroker & Investment Fraud. | "All" |
startUrls | array | One or more Justia URLs (search, category, or profile) to scrape directly. | – |
limit | integer | Maximum listings to collect per input item. Minimum: 10. | 50000 |
proxyConfiguration | object | Connection settings for the run. | { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] } |
getEmails | boolean | Enables attorney email enrichment in output records. | false |
includeRiskyEmails | boolean | Includes lower-confidence emails and marks them by status for filtering downstream. | true |
4) Example Input
{"location": "California","practiceArea": "Personal Injury","startUrls": ["https://lawyers.justia.com/lawyers/personal-injury/california/san-francisco"],"limit": 1500,"getEmails": true,"includeRiskyEmails": false}
5) Output
6.1 Output destination
The actor writes results to an Apify dataset as JSON records. And the dataset is designed for direct consumption by analytics tools, ETL pipelines, and downstream APIs without post-processing.
5.2 Record envelope (all items)
Every record includes these stable identifiers:
- type (string, required)
- id (number, required)
- url (string, required)
Recommended idempotency key: type + ":" + id.
Use this key for deduplication and upserts so repeated runs update existing entities instead of creating duplicates.
5.3 Examples
Example: profile (type = "profile")
{"type": "profile","id": 1499061,"url": "https://lawyers.justia.com/lawyer/sam-cannon-1499061","profileUrl": "https://lawyers.justia.com/lawyer/sam-cannon-1499061","name": "Sam Cannon","jobTitles": ["Partner","Cannon Law can help you recover. Call us today to see how."],"socialProfiles": ["https://www.facebook.com/samcannonlaw","https://www.linkedin.com/company/cannonlaw-llc"],"telephones": ["(970) 471-7170","(303) 543-1000"],"addresses": [{"street": "3534 John F. Kennedy Pkwy, Unit B","city": "Fort Collins","region": "CO","postalCode": "80525","country": "US"}],"practiceAreas": [{"area": "Personal Injury","subAreas": ["Brain Injury","Car Accidents"]}],"websites": ["https://www.cannonlaw.com"],"email": "sam@cannonlaw.com","emailStatus": "valid"}
6) Field reference
profile record
- type (string, required): Record category (for this dataset:
profile). - id (number, required): Stable numeric profile identifier.
- url (string, required): Canonical profile URL.
- title (string, optional): Profile title displayed on the page.
- profileUrl (string, optional): Profile URL field as returned by source data.
- name (string, optional): Attorney full name.
- jobTitles (string[], optional): Professional roles/headlines.
- affiliations (string[], optional): Organization or membership labels.
- socialProfiles (string[], optional): Public social or profile links.
- telephones (string[], optional): Phone numbers.
- addresses (object[], optional): Address list.
- addresses.street (string, optional): Street line.
- addresses.city (string, optional): City.
- addresses.region (string, optional): State/region code.
- addresses.postalCode (string, optional): Postal/ZIP code.
- addresses.country (string, optional): Country code.
- profileImage (string, optional): Profile image URL.
- faxNumbers (string[]|null, optional): Fax numbers.
- officeHours (object[]|null, optional): Office hours by day.
- officeHours.day (string, optional): Day of week.
- officeHours.hours (string, optional): Hours text.
- biography (string|null, optional): Biography/about text.
- practiceAreas (object[]|null, optional): Primary practice areas.
- practiceAreas.area (string, optional): Area name.
- practiceAreas.subAreas (string[]|null, optional): Sub-specialties.
- additionalPracticeAreas (string|null, optional): Additional area text.
- fees (string[]|null, optional): Fees and consultation notes.
- freeConsultation (boolean|null, optional): Free consultation indicator.
- jurisdictions (object[]|null, optional): Jurisdiction memberships.
- jurisdictions.name (string, optional): Jurisdiction name.
- jurisdictions.organization (string|null, optional): Licensing organization.
- jurisdictions.idNumber (string|null, optional): Membership/ID number.
- jurisdictions.since (string|null, optional): Since/admission text.
- languages (object[]|null, optional): Languages and modalities.
- languages.language (string, optional): Language name.
- languages.modalities (string[]|null, optional): Spoken/written modalities.
- experience (object[]|null, optional): Professional experience entries.
- experience.role (string, optional): Role title.
- experience.organization (string, optional): Organization name.
- experience.startDate (string|null, optional): Start date/year text.
- experience.endDate (string|null, optional): End date/year text.
- education (object[]|null, optional): Education entries.
- education.school (string, optional): School/institution.
- education.degree (string|null, optional): Degree.
- education.field (string|null, optional): Field of study.
- education.startDate (string|null, optional): Start date/year text.
- education.endDate (string|null, optional): End date/year text.
- education.honors (string|null, optional): Honors text.
- awards (string|object[]|null, optional): Awards data.
- associations (object[]|null, optional): Association records.
- associations.organization (string, optional): Association name.
- associations.role (string|null, optional): Role/title.
- associations.startDate (string|null, optional): Start date text.
- associations.endDate (string|null, optional): End date text.
- publications (string|object[]|null, optional): Publications data.
- speakingEngagements (string|object[]|null, optional): Speaking entries.
- websites (string[]|null, optional): Website URLs.
- websiteDetails (object[]|null, optional): Website metadata.
- websiteDetails.type (string, optional): Website type (e.g., website, blog).
- websiteDetails.title (string|null, optional): Website label.
- websiteDetails.url (string, optional): Website URL.
- websiteDetails.description (string|null, optional): Website description.
- videos (string[]|null, optional): Video URLs/details.
- certifications (string|object[]|null, optional): Certification data.
- clientReviews (string|object[]|null, optional): Client review data.
- videoConferencing (string[]|null, optional): Supported conferencing platforms.
- offices (object[]|null, optional): Office records.
- offices.name (string, optional): Office name.
- offices.description (string|null, optional): Office description.
- offices.telephone (string|null, optional): Office phone.
- offices.address (object|null, optional): Office address object.
- offices.address.street (string, optional): Street line.
- offices.address.city (string, optional): City.
- offices.address.region (string, optional): State/region.
- offices.address.postalCode (string, optional): Postal/ZIP code.
- offices.address.country (string, optional): Country code.
- offices.officeHours (object[]|null, optional): Office-specific hours.
- offices.officeHours.day (string, optional): Day of week.
- offices.officeHours.hours (string, optional): Hours text.
- emailContactUrl (string|null, optional): Contact form URL.
- vcardUrl (string|null, optional): vCard URL.
- email (string|null, optional): Email when available.
- emailStatus (string|null, optional): Email quality/status label.
7) Data guarantees & handling
- Best in class extraction: fields may vary by region/session/availability/UI experiments.
- Optional fields: null-check in downstream code.
- Deduplication: recommend
type + ":" + id.
8) How to Run on Apify
- Open the actor in Apify Console.
- Configure your search parameters (for example, practice area and state) and/or provide direct Justia URLs.
- Set the maximum number of outputs to collect.
- Click Start and wait for the run to finish.
- Download results in JSON, CSV, Excel, or other supported formats.
9) Scheduling & Automation
Recurring Attorney Intelligence
This actor is designed for repeatable, structured data collection.
Common recurring workflows:
-
Weekly state-level refresh Re-run
location + practiceAreacombinations (e.g., California + Personal Injury) to detect new attorneys or profile updates. -
Monthly enrichment sync Re-collect previously stored profile URLs to refresh phones, social links, websites, and optional email fields.
-
Change monitoring for niche categories Track competitive movements in specific verticals (e.g., DUI in Texas, Immigration in New York).
-
Lead pipeline replenishment Automatically append newly discovered attorneys into outbound or enrichment workflows.
To schedule:
- Open the actor in Apify Console.
- Go to Schedules.
- Create a daily, weekly, or cron-based schedule.
- Lock in your input (state, practice area, URLs, email settings).
- Route output to a stable dataset for incremental processing.
Because each record has a stable type + ":" + id key, scheduled runs can safely power idempotent upserts in your CRM, warehouse, or lead database.
Downstream Automation Patterns
This actor is commonly used as the top-of-funnel data source in larger legal data systems.
Typical integrations:
-
Webhooks Trigger an ETL job, CRM sync, or enrichment workflow immediately after each run.
-
Zapier / Make Push newly collected attorneys into Airtable, HubSpot, Salesforce, or internal tools.
-
Google Sheets Maintain live state-by-state tracking sheets for ops or research teams.
-
Slack / Discord Send summaries such as:
- “+142 new Personal Injury attorneys in Florida”
- “23 profiles updated this week”
-
Warehouse ingestion (Snowflake / BigQuery / Postgres) Use the dataset API to ingest JSON directly into structured tables for analytics and segmentation.
Recommended Automation Architecture
For production use:
- Use a dedicated dataset per campaign or state
- Deduplicate via
type + ":" + id - Store raw JSON for traceability
- Run enrichment (
getEmails) only when needed to control cost - Separate discovery runs (broad queries) from refresh runs (profile URLs only)
This structure allows you to:
- Scale geographically
- Track attorney growth by category
- Maintain clean, normalized records across repeated runs
- Avoid duplicate outreach or redundant enrichment
10) Performance
Estimated run times:
- Small runs (< 1,000 outputs): ~2–3 minutes
- Medium runs (1,000–5,000 outputs): ~5–15 minutes
- Large runs (5,000+ outputs): ~15–30 minutes
Execution time varies based on filters, result volume, and how much information is returned per record.
11) Compliance & Ethics
Responsible Data Collection
This actor collects publicly available attorney profile and contact information from {{TARGET_SITE}} for legitimate business purposes, including:
- legal services research and market analysis
- lead enrichment and qualification
- directory and content maintenance
Users are responsible for ensuring their collection and use of data complies with applicable laws, regulations, and platform terms. This section is informational and not legal advice.
Best Practices
- Use collected data in accordance with applicable laws, regulations, and the target site’s terms
- Respect individual privacy and personal information
- Use data responsibly and avoid disruptive or excessive collection
- Do not use this actor for spamming, harassment, or other harmful purposes
- Follow relevant data protection requirements where applicable (e.g., GDPR, CCPA)
12) Support
For help, use the Issues section on the actor page. Include the input you used (with sensitive values redacted), the run ID, expected vs. actual behavior, and an optional small output sample so troubleshooting is faster.