Ofsted Providers — Companies House Enrichment avatar

Ofsted Providers — Companies House Enrichment

Pricing

from $5.00 / 1,000 results

Go to Apify Store
Ofsted Providers — Companies House Enrichment

Ofsted Providers — Companies House Enrichment

Syncs UK care provider names from your Ofsted reports database, matches each one against Companies House using fuzzy search, and stores verified company profiles, registered addresses, and active directors back into your MySQL database — automatically skipping duplicates and aliases.

Pricing

from $5.00 / 1,000 results

Rating

0.0

(0)

Developer

Alkausari M

Alkausari M

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

3 days ago

Last modified

Share

Ofsted Providers — Companies House Enrichment Actor

Automatically enriches your Ofsted reports database with verified Companies House data. The actor reads provider names from your ofsted_reports table, fuzzy-matches each one against the UK Companies House API, and writes the resolved company profile, registered address, and active directors into dedicated MySQL tables — with built-in deduplication, alias tracking, and no-match memory so every run only processes what still needs doing.


How It Works — End to End

1. Read Provider Names from ofsted_reports

At startup the actor connects to your MySQL database and runs:

SELECT DISTINCT LOWER(TRIM(provider_name))
FROM ofsted_reports
WHERE provider_name IS NOT NULL
AND TRIM(provider_name) != ''

This gives a deduplicated, lowercased list of every care provider name that exists in your Ofsted data. These are the names that need to be matched to real Companies House records.


2. Filter Out Known Aliases

Before inserting anything, the actor loads all rows from ofsted_provider_aliases into memory and excludes any provider name that is already recorded there. This prevents name variants (e.g. "foo & bar ltd" vs "foo and bar ltd") from being queued for a redundant API lookup when the underlying company has already been resolved under a different spelling.

Only genuinely new, unseen names are inserted into ofsted_providers via INSERT IGNORE.


3. Determine Which Providers Still Need Processing

Depending on the process_all input flag, the actor fetches one of two sets of names from ofsted_providers:

ModeSQL conditionUse case
process_all = false (default)company_number IS NULL AND search_count < 1Normal incremental run — only unmatched, never-searched providers
process_all = trueAll rows in ofsted_providersFull re-scrape of every provider including already-matched ones

The search_count field is the key to avoiding repeated failed lookups — see section 6.


4. Search Companies House (Fuzzy Match)

For each provider name, the actor calls:

GET https://api.company-information.service.gov.uk/search/companies?q={name}

The top results are scored against the searched name using fuzzy token sort ratio (via rapidfuzz). Before scoring, both the searched name and each candidate are normalised:

  • Uppercased, non-alphanumeric characters stripped
  • Common suffixes expanded: LTD → LIMITED, PLC → PUBLIC LIMITED COMPANY, CO → COMPANY, INC → INCORPORATED

The highest-scoring candidate is selected. If its score is below 85 (out of 100), the match is rejected as too uncertain.

Examples of what the fuzzy match handles correctly:

Searched nameMatched nameScore
acorns to oaks children's care & support ltdACORNS TO OAKS CHILDREN'S CARE & SUPPORT LTD100
affinity childrens care ltdAFFINITY CHILDRENS CARE LIMITED100
1st class support and housing ltd1ST CLASS SUPPORT & HOUSING LTD94
alliance care and education ltdALLIANCE FOR CARE & EDUCATION LIMITED88

5. Fetch Company Details in Parallel

Once a company number is confirmed, three API calls are made concurrently using asyncio.gather:

GET /company/{number} → company profile
GET /company/{number}/registered-office-address → registered address
GET /company/{number}/officers → directors and officers

Only active officers are kept — anyone with a resigned_on date is discarded. If latest_director_only is enabled, only the most recently appointed active director is retained.

Officer names from Companies House arrive in the format LASTNAME, Firstname Middle and are fully parsed into separate title, first_name, middle_name, and last_name fields.

The registered address is also scanned to extract a door_number field (e.g. "92 London Road""92") which is useful for address matching workflows.


6. Handle No-Match Results — search_count

When a provider name produces no strong match (score < 85), or when the API returns no results at all, the actor calls increment_search_count() which sets search_count = 1 for that row in ofsted_providers.

On all future runs, get_missing_provider_names() filters with search_count < 1, so these exhausted names are permanently skipped without making any API calls. This is what prevents the actor from wasting time re-searching the same unmatchable names on every run.

To manually retry a previously failed provider, reset its search_count to 0 in the database:

UPDATE ofsted_providers SET search_count = 0 WHERE searched_name = 'provider name here';

7. Resolve Duplicate Company Numbers — Alias System

The same company can appear in ofsted_reports under slightly different names across different inspection records, for example:

  • "a&m care solutions ltd - t/a metropolitan care"
  • "a&m care solutions ltd - t/a metropolitan care solutions"
  • "a & m care solutions ltd - t/a metropolitan care solutions"

All three resolve to the same Companies House number. When the actor matches a company and finds that another row in ofsted_providers already holds that company number under a different searched_name, it:

  1. Saves the stale name into ofsted_provider_aliases linked to that company number
  2. Deletes the stale row from ofsted_providers
  3. Updates the canonical row (the current searched_name) with the full company data

On all future runs, sync_provider_names() sees those stale names in the aliases table and skips inserting them into ofsted_providers entirely — they never reach the API queue again.


8. Write Results to the Database

A successful match triggers an UPDATE on the existing ofsted_providers row (seeded in step 2) and an INSERT ... ON DUPLICATE KEY UPDATE for each active director in ofsted_directors.

Directors removed from the active list since the last run are automatically deleted from ofsted_directors.


Database Schema

ofsted_providers

Stores one row per unique provider name from ofsted_reports. Rows start with only searched_name populated and are filled in when a match is found.

ColumnTypeDescription
idBIGINTAuto-increment primary key
searched_nameVARCHAR(255)Provider name as it appears in ofsted_reports (lowercased)
matched_nameVARCHAR(255)Official Companies House name
company_numberVARCHAR(20)Companies House unique identifier (nullable until matched)
company_statusVARCHAR(50)e.g. active, dissolved, liquidation
company_typeVARCHAR(50)e.g. ltd, plc, llp
creation_dateDATEDate of incorporation
sic_codes_jsonJSONStandard Industrial Classification codes
address_line_1VARCHAR(255)First line of registered address
address_line_2VARCHAR(255)Second line of registered address
localityVARCHAR(100)Town or city
regionVARCHAR(100)County or region
countryVARCHAR(100)Country
postal_codeVARCHAR(20)UK postcode
search_countTINYINT0 = not yet searched, 1 = searched but no match found
created_atDATETIMERow creation timestamp
updated_atDATETIMELast update timestamp

ofsted_directors

Stores active officers for each matched company. Foreign-keyed to ofsted_providers.company_number with ON DELETE CASCADE.

ColumnTypeDescription
idBIGINTAuto-increment primary key
company_numberVARCHAR(20)FK → ofsted_providers.company_number
full_nameVARCHAR(255)Full name as returned by Companies House
titleVARCHAR(50)Salutation (Mr, Dr, etc.)
first_nameVARCHAR(100)Parsed first name
middle_nameVARCHAR(100)Parsed middle name(s)
last_nameVARCHAR(100)Parsed last name
roleVARCHAR(50)e.g. director, secretary
appointed_onDATEAppointment date
nationalityVARCHAR(100)Declared nationality
service_address_premisesVARCHAR(100)Building name/number
service_address_address_line_1VARCHAR(255)First address line
service_address_address_line_2VARCHAR(255)Second address line
service_address_localityVARCHAR(100)Town or city
service_address_regionVARCHAR(100)County or region
service_address_countryVARCHAR(100)Country
service_address_postal_codeVARCHAR(20)Postcode
created_atDATETIMERow creation timestamp
updated_atDATETIMELast update timestamp

ofsted_provider_aliases

Stores variant names that have already been resolved to a company number. Used by sync_provider_names() to skip redundant lookups.

ColumnTypeDescription
idBIGINTAuto-increment primary key
alias_nameVARCHAR(255)The variant name (unique)
company_numberVARCHAR(20)The company it resolved to
created_atDATETIMEWhen the alias was recorded

Input Reference

FieldTypeRequiredDefaultDescription
api_keyStringCompanies House API key
db_hostStringMySQL host address
db_databaseStringDatabase name
db_userStringDatabase username
db_passwordStringDatabase password
process_allBooleanfalseRe-process all providers including already-matched ones
latest_director_onlyBooleanfalseStore only the most recently appointed active director per company
request_delayInteger1Seconds to wait between companies (increase for large batches)

Example Input

{
"api_key": "your-companies-house-api-key",
"db_host": "your-db-host",
"db_database": "your-database",
"db_user": "your-db-user",
"db_password": "your-db-password",
"request_delay": 4,
"process_all": false,
"latest_director_only": false
}

Output

Results are pushed to the Apify dataset and written to the database on every successful match. Each dataset record looks like this:

{
"searched_name": "almoner care group ltd",
"matched_name": "ALMONER CARE GROUP LIMITED",
"company_number": "13745628",
"registered_address": {
"address_line_1": "Seymour Chambers",
"address_line_2": "92 London Road",
"locality": "Liverpool",
"region": "Merseyside",
"country": "United Kingdom",
"postal_code": "L3 5NW",
"door_number": "92"
},
"active_directors": [
{
"full_name": "CONNOLLY, Frederick James Alfred",
"title": "",
"first_name": "Frederick",
"middle_name": "James Alfred",
"last_name": "CONNOLLY",
"role": "director",
"appointed_on": "2023-02-13",
"nationality": "British",
"service_address": {
"premises": "Seymour Chambers",
"address_line_1": "92 London Road",
"locality": "Liverpool",
"region": "Merseyside",
"country": "England",
"postal_code": "L3 5NW"
}
}
],
"company_profile": {
"status": "active",
"type": "ltd",
"creation_date": "2021-11-16",
"sic_codes": ["87900"]
}
}

Automatic Schema Migration

The actor runs migrate_tables() on every startup using SHOW COLUMNS (not INFORMATION_SCHEMA, which can cause metadata lock hangs). It applies any missing schema changes to existing tables without dropping or recreating them:

  • Makes company_number nullable if it was previously a NOT NULL primary key
  • Adds the search_count column and index if it does not exist

This means the actor is safe to deploy against an existing database from a previous version — it upgrades the schema automatically.


Getting a Companies House API Key

  1. Register at developer.company-information.service.gov.uk
  2. Create a new application and select Live environment
  3. Copy the API key and paste it into the api_key input field

The free tier allows up to 600 requests per minute and is sufficient for all normal batch sizes.


Rate Limiting

The actor handles HTTP 429 Too Many Requests responses with automatic exponential backoff — retrying up to 3 times at 5 s, 10 s, and 20 s intervals. The request_delay input (default 1 second) adds an additional pause between companies. For batches over 500 providers, setting request_delay to 3 or 4 is recommended.