Business Registry & Ownership Intel - KYB, Officers
Pricing
from $8.00 / 1,000 company records
Business Registry & Ownership Intel - KYB, Officers
Unifies US Secretary-of-State business registries into one normalized entity schema: status, type, formation date, registered agent, officers, addresses. KYB value layer: cross-state entity resolution, officer linking, and an ownership graph. Logged-out public records. For KYB/AML and PE/M&A.
Pricing
from $8.00 / 1,000 company records
Rating
0.0
(0)
Developer
Seibs.co
Maintained by CommunityActor stats
0
Bookmarked
1
Total users
0
Monthly active users
5 days ago
Last modified
Categories
Share
Business Registry & Corporate Ownership Intel (KYB)
TL;DR for KYB/AML, compliance, PE/M&A, and B2B firmographics teams: Unifies the fragmented US Secretary-of-State business registries (California, New York, Florida, Texas - plus opt-in Delaware) into one normalized legal-entity schema: name, file number, status, type, formation date, registered agent, officers/directors, and addresses. On top of the raw records it adds the KYB value layer competitors charge for: cross-state entity resolution (the same company in CA and NY is one entity, not two rows), officer-to-company linking (one person's full registry footprint), and an inferred ownership/association graph (parent/subsidiary/sister entities via shared officers, agents, addresses, and name roots). The underlying data is public-by-law, but it lives behind 50 separate portals with no unified free API - the exact fragmentation OpenCorporates (GBP 2,250-12,000/yr) and D&B Direct+ ($25,000+/yr) monetize. Government public records, logged-out, PII-minimized. Free Apify plan covers exploration runs on your $5 platform credit.
Run it in 30 seconds
# Via the Apify Python SDKfrom apify_client import ApifyClientclient = ApifyClient("<YOUR_APIFY_TOKEN>")run = client.actor("seibs.co/business-registry-intel").call(run_input={"mode": "entity_search","companies": ["Acme Holdings"],"states": ["CA", "NY", "FL", "TX"]})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
Or via curl:
curl -X POST "https://api.apify.com/v2/acts/seibs.co~business-registry-intel/run-sync-get-dataset-items?token=<YOUR_APIFY_TOKEN>" \-H "Content-Type: application/json" \-d '{"mode": "entity_search", "companies": ["Acme Holdings"], "states": ["CA","NY","FL","TX"]}'
Or click "Try for free" on this page if you prefer the no-code UI.
What you get
Each run produces:
- A clean dataset, filterable in the Apify console and downloadable as CSV or JSON
- An OUTPUT.html dashboard preview of your top records
- A sample-output preview at ./.actor/sample-output.json
- An
access_notesrecord up top documenting each state's access method, proxy needs, and any cost/gating
What does Business Registry Intel do?
It queries each selected state's public business-entity registry and normalizes every result into one schema: entity_name, file_number, jurisdiction, entity_type (normalized to llc / corporation / lp / ...), status (normalized to active / dissolved / suspended / ...), formation_date, registered_agent, officers, principal_address, and source_url. Then it runs the value layer:
- Cross-state entity resolution - clusters records that refer to the same legal entity across jurisdictions (canonical-name + fuzzy match), so a company registered in CA and foreign-qualified in NY shows up as one
entity_clusterrollup, not two disconnected rows. - Officer-to-company linking - groups officers and registered agents by a normalized person key and flags multi-entity individuals (the AML/investigator signal).
- Ownership/association graph - infers parent/subsidiary/sister-entity edges from shared officers, shared registered agents, shared addresses, and common name roots, then reports the connected components.
Modes
| Mode | What it returns |
|---|---|
entity_search (default) | Every matching entity per state for your name queries, plus cross-state entity_cluster resolution rollups. |
entity_profile | Deep profile (registered agent, officers/directors, addresses, filing history where exposed) for the matched entities - charges officer_director_enrichment per entity that yields officer data. |
officer_lookup | One officer_link record per person, with their resolved company footprint and a multi_entity flag. |
ownership_graph | The full association graph (nodes, edges, components) plus the entities and clusters it was built from. Charges ownership_graph_enrichment once. |
National coverage: all 50 states + DC
The 50 US registries share no schema and no unified API - that fragmentation is the moat we unify. Every state + DC is registered as a connector with a documented access method, anti-bot level, and proxy tier. Coverage is honestly tiered:
Fully-parsed registries (13) - request + parser verified against the live portal, returns real normalized entity data (11 over http; PA + MI over the browser tier):
| State | Registry | Access method | Anti-bot | Proxy | Notes |
|---|---|---|---|---|---|
| PA | Pennsylvania DOS (PennFile) | browser (bizfile API behind Cloudflare) | high | RESIDENTIAL | patchright clears Cloudflare; bizfile API called from inside the page. ~3M entities. |
| MI | Michigan LARA (MiBusiness Registry) | browser (webSearch JSON API behind Cloudflare) | high | RESIDENTIAL | patchright clears Cloudflare; agent + status + type in the search row. |
| CA | California SOS (bizfileOnline) | http_json (POST API) | low | DATACENTER | Bulk data available. Officers via detail page. |
| NY | NY Dept. of State Public Inquiry | http_json (token bootstrap + POST) | moderate | DATACENTER | Officers via detail page. |
| FL | Florida Sunbiz | http_html | moderate (edge WAF) | DATACENTER -> RESIDENTIAL | Edge WAF 403s plain HTTP; cleared by the curl_cffi tier. Officers via detail page (drift-guarded). |
| TX | Texas Comptroller Taxable Entity Search | http_html | low | DATACENTER | SOSDirect officer data is paid -> out of scope. |
| CO | Colorado SOS (Business Entities open data) | http_json (Socrata) | none | DATACENTER | Full dataset downloadable. Agent + principal address in search. |
| CT | Connecticut SOS (Business Registry open data) | http_json (Socrata) | none | DATACENTER | Open data exposes name/status/date; no entity type. |
| OR | Oregon SOS (Business Registry open data) | http_json (Socrata) | none | DATACENTER | Multi-row-per-entity (grouped); agent + principal + authorized reps. Active entities only. |
| WI | Wisconsin DFI (Corporate Records) | http_html (plain GET) | none | DATACENTER | Name/type/status/formation date in results; officers not online (agent free on detail). |
| NJ | New Jersey DORES (Business Name Search) | http_html (GET token -> POST) | low | DATACENTER | Free path: name/id/city/type/formation date. Status/agent/officers are paid -> out of scope (like TX/DE). |
| ID | Idaho SOS (SOSBiz) | http_json (bizfile platform) | low | DATACENTER | Same vendor API family as CA. Status/type/agent in search. |
| ND | North Dakota SOS (FirstStop) | http_json (bizfile platform) | low | DATACENTER | Same vendor API family as CA. Status/type in search. |
Catalog-registered (38) - correct portal + access method + anti-bot + proxy tier recorded, escalation pipeline wired, per-state parser pending (these return a documented state_pending note). Grouped by why they're pending:
browser_required(21) - the browser tier now defeats Cloudflare/Imperva (patchright headful - same approach that made PA + MI fully-parsed), so for 14 of these the only remaining work is capturing each portal's search API/selectors (a routine ~15-min task per state): AK, AR, IL, MA, MD, MN, NC, NM, NV, OH, OK, WA (Cloudflare/SPA, no CAPTCHA) - drop-in candidates for the next round. The other 9 cross the no-CAPTCHA / no-login line and stay fail-soft: AZ, GA, NE, SC, WY (per-search CAPTCHA - need the opt-in solver), VA (reCAPTCHA v3 Enterprise -> use its bulk file), DC (login), DE (CAPTCHA + opt-in fee), HI (login/migration).http_htmlViewState-pending (14) - reachable via the curl_cffi tier, but the search is an ASP.NET WebForms/MVC form needing a POST with per-page ViewState/anti-forgery tokens + a live result-row capture: AL, IN, KS, KY, LA, ME, MO, NH, RI, SD, TN, UT, VT, WV.http_jsonpending (3) - IA (Socrata dataset 404s after a portal migration; REST API has no name-search endpoint), MS (corpreporting JSON returns the full DB - its Kendo filter does not narrow server-side), MT (same bizfile platform as CA/ID/ND, but its API 500s to logged-out queries).
Pass states: ["ALL"] (or all_states: true) to query every jurisdiction. The live access_matrix (all 51, with coverage per state) is emitted in the access_notes record on every run. Upgrading a catalog state to fully-parsed is a single per-state parser (and, for the browser_required group, the Playwright image) - the orchestrator, entity-resolution, escalation, and monitor layers are all state-agnostic.
Anti-bot escalation (residential + browser)
Many registries sit behind an edge WAF that fingerprints the TLS/JA3 of the caller and 403s a plain request even from a clean IP (FL Sunbiz does exactly this). The client escalates automatically instead of giving up:
- httpx over the DATACENTER proxy - cheapest, tried first.
- On a Cloudflare/CAPTCHA/403 challenge -> curl_cffi with real Chrome TLS impersonation over the RESIDENTIAL proxy. This defeats JA3/TLS-fingerprint WAFs (it turns FL Sunbiz's 403 into a 200 with real data) and is the portfolio's proven anti-bot tool.
- On a true JS/CAPTCHA challenge (
browser_requiredstates like DE) -> Playwright headless Chromium over the RESIDENTIAL proxy. This tier is optional: it runs on a Playwright-capable image and is skipped cleanly otherwise. - Fail-soft - if every tier is blocked or unavailable, the connector emits a documented
fetch_errorand the run still finishes SUCCEEDED with whatever other states returned.
The proxy tier auto-selects: DATACENTER for the first pass, RESIDENTIAL for the escalation legs (provisioned up front). Set use_browser_fallback: false to use plain httpx only. Per-run escalation counts are reported in the access_notes.anti_bot_escalation block. Delaware is off by default and never spends its per-search fee silently.
The browser tier (how PA + MI work, and how to extend it)
The browser tier opens a stealth-patched browser (patchright, bundled) in headful mode, which defeats the Cloudflare/Imperva managed challenges that block a plain headless browser (verified live: PA + MI return real data this way). For Cloudflare-fronted JSON APIs it calls the API from inside the warmed page via fetch(), so the request carries the cf_clearance cookie + the real browser TLS (a Playwright APIRequestContext does not, and 403s).
- PA warms the page, then POSTs the bizfile API in-page (same shape as CA/ID/ND). MI GETs its
webSearchAPI in-page. Both arecoverage: full. - Runs headful by default; set
BROWSER_HEADLESS=1to force headless (e.g. a server with no display and no Xvfb - but Cloudflare will then block). On theapify/actor-python-playwrightimage headful runs under Xvfb automatically. - Optionally point at an already-running anti-detect browser via
browser_cdp_url(input) orBROWSER_CDP_URL(env) to inherit its session/IP over CDP - useful for a residential-IP browser you already trust.
Adding the remaining Cloudflare/SPA states (AK, AR, IL, MA, MD, MN, NC, NM, NV, OH, OK, WA) is now routine: patchright already passes their challenge, so each just needs its search API/selectors captured once and a ~20-line connector (a browser_recipe returning either an api call or a fill/submit/capture flow, plus a parser - copy PA/MI). They currently make a generic best-effort attempt (rows tagged parse_confidence: "generic").
CAPTCHA / login states (AZ, GA, NE, SC, WY, DC, DE, HI, VA) gate the search action behind a real CAPTCHA or a login - off by default:
- Set
CAPTCHA_SOLVER_PROVIDER(2captcha|capsolver) +CAPTCHA_SOLVER_KEYas actor env secrets to enable the solver (AZ has a worked fill -> solve -> capture recipe; the others follow once their selectors are captured). - VA's reCAPTCHA v3 Enterprise is unsolvable (use its bulk file); DC/DE need a login/fee and stay opt-in.
The access_notes.browser_tier record reports whether the CDP endpoint + solver are configured each run. Responsibility note: these are government public records, but solving a CAPTCHA or logging in circumvents an access control - the actor only does so when you explicitly enable it, and never pays per-record fees.
Miami (and any city) -> covered by the state connector
US business registration is at the state level, not the city level. A Miami-based company is registered with the Florida Division of Corporations (Sunbiz), so it resolves through the FL connector - there is no separate "Miami" registry to add. The same holds for every city: search the state (e.g. states: ["FL"] for Miami/Orlando/Tampa, states: ["NY"] for NYC, states: ["IL"] for Chicago). Confirmed in testing: a Walt-Disney query against FL returns "THE WALT DISNEY COMPANY" and other Florida-registered Disney entities via Sunbiz.
Responsible use / data scope
This actor is a public-data tool that reads government public records - business-entity registries that are public by law, with no adverse platform owner. It stays logged-out: no accounts, no cookies pasted by a user, no paywalls bypassed (Delaware's fee-gated detail report is opt-in and never auto-charged). It minimizes PII: officer and registered-agent names are themselves public record on these filings, and we keep only name + title + business address - we never enrich into private/personal contact data. The ownership graph is labeled an association graph (an investigative lead from publicly-filed linkages), not a verified beneficial-ownership determination. You are responsible for lawful use of the outputs - GDPR (EU) and CCPA (CA) apply to personal data even when it is public.
AI / RAG / Agent
A turn-key KYB feed for compliance copilots, diligence agents, and B2B-enrichment bots. Entities arrive pre-normalized with status, entity_type, entity_cluster_id, and resolved officers so an agent can answer "is this counterparty active, and what other entities does its CEO control?" without parsing 50 different portals. Compatible with LangChain, LlamaIndex, Pinecone, Weaviate, Chroma, and any MCP-aware agent runtime (see the sibling mcp-business-registry-intel actor for direct tool-call wiring with x402 / Skyfire agentic payments).
Features
- National coverage - all 50 states + DC catalogued; 13 fully parsed today (CA, NY, FL, TX, CO, CT, OR, WI, NJ, ID, ND + PA, MI via the browser tier), the rest catalog-registered with correct access metadata and the escalation pipeline wired. Pass
states: ["ALL"]for every jurisdiction. - Automatic anti-bot escalation - httpx -> curl_cffi Chrome TLS impersonation (residential) -> optional Playwright browser (residential) -> fail-soft. Defeats the TLS-fingerprint WAFs registries use, with the proxy tier auto-selected per challenge.
- Normalized entity schema - one shape across every state, with
entity_typeandstatusmapped onto a canonical vocabulary. - Cross-state entity resolution - canonical-name + fuzzy clustering (rapidfuzz when present, stdlib
difflibfallback) withentity_cluster_id+ confidence. - Officer-to-company linking - per-person footprint with a
multi_entityflag. - Ownership/association graph - shared officer / agent / address / name-root edges with connected components.
- Monitor mode - run under an Apify Schedule and get only the change-delta (new filings, status changes, dissolutions) plus an optional Slack digest.
- Cost-control - pre-flight caps, per-run budget guard, and demo-mode soft-fail so runs finish SUCCEEDED.
Use cases
- KYB / AML onboarding - verify a counterparty's legal existence, status, and registered agent across states in one call; flag dissolved or suspended entities.
- PE / M&A entity mapping - resolve a target's entities across jurisdictions and map the subsidiary/sister-entity graph from shared officers and agents.
- B2B firmographics - attach the verified legal entity (and its officers) behind a Maps/website lead. Pairs with every vertical lead-finder in this portfolio.
- Investigations / journalism - find every company a person is an officer of, and the cluster of entities sharing their address or registered agent.
Pricing (Pay Per Event)
| Event | Price | What it is |
|---|---|---|
company_record | $0.008 | One normalized, cross-state-resolved legal entity. |
officer_director_enrichment | $0.012 | Officers/directors, agent, and addresses from the detail page. |
ownership_graph_enrichment | $0.020 | The association graph (premium KYB layer), once per ownership_graph run. |
scheduled_delta_run | $0.050 | One monitor-mode change digest. |
A run that returns nothing costs nothing. Far below the gated alternatives (OpenCorporates GBP 2,250-12,000/yr; D&B Direct+ $25,000+/yr).
Related actors
- sec-edgar-intel - federal SEC filings (Form D issuer <-> legal entity).
- hiring-signal-intel - hiring surges per company.
- b2b-sales-triggers - buying-signal triggers.
mcp-business-registry-intel- the MCP twin for AI agents (x402 / Skyfire ready).