Ofsted Providers — Companies House Enrichment
Pricing
from $5.00 / 1,000 results
Ofsted Providers — Companies House Enrichment
Syncs UK care provider names from your Ofsted reports database, matches each one against Companies House using fuzzy search, and stores verified company profiles, registered addresses, and active directors back into your MySQL database — automatically skipping duplicates and aliases.
Pricing
from $5.00 / 1,000 results
Rating
0.0
(0)
Developer
Alkausari M
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
3 days ago
Last modified
Categories
Share
Ofsted Providers — Companies House Enrichment Actor
Automatically enriches your Ofsted reports database with verified Companies House data. The actor reads provider names from your ofsted_reports table, fuzzy-matches each one against the UK Companies House API, and writes the resolved company profile, registered address, and active directors into dedicated MySQL tables — with built-in deduplication, alias tracking, and no-match memory so every run only processes what still needs doing.
How It Works — End to End
1. Read Provider Names from ofsted_reports
At startup the actor connects to your MySQL database and runs:
SELECT DISTINCT LOWER(TRIM(provider_name))FROM ofsted_reportsWHERE provider_name IS NOT NULLAND TRIM(provider_name) != ''
This gives a deduplicated, lowercased list of every care provider name that exists in your Ofsted data. These are the names that need to be matched to real Companies House records.
2. Filter Out Known Aliases
Before inserting anything, the actor loads all rows from ofsted_provider_aliases into memory and excludes any provider name that is already recorded there. This prevents name variants (e.g. "foo & bar ltd" vs "foo and bar ltd") from being queued for a redundant API lookup when the underlying company has already been resolved under a different spelling.
Only genuinely new, unseen names are inserted into ofsted_providers via INSERT IGNORE.
3. Determine Which Providers Still Need Processing
Depending on the process_all input flag, the actor fetches one of two sets of names from ofsted_providers:
| Mode | SQL condition | Use case |
|---|---|---|
process_all = false (default) | company_number IS NULL AND search_count < 1 | Normal incremental run — only unmatched, never-searched providers |
process_all = true | All rows in ofsted_providers | Full re-scrape of every provider including already-matched ones |
The search_count field is the key to avoiding repeated failed lookups — see section 6.
4. Search Companies House (Fuzzy Match)
For each provider name, the actor calls:
GET https://api.company-information.service.gov.uk/search/companies?q={name}
The top results are scored against the searched name using fuzzy token sort ratio (via rapidfuzz). Before scoring, both the searched name and each candidate are normalised:
- Uppercased, non-alphanumeric characters stripped
- Common suffixes expanded:
LTD → LIMITED,PLC → PUBLIC LIMITED COMPANY,CO → COMPANY,INC → INCORPORATED
The highest-scoring candidate is selected. If its score is below 85 (out of 100), the match is rejected as too uncertain.
Examples of what the fuzzy match handles correctly:
| Searched name | Matched name | Score |
|---|---|---|
acorns to oaks children's care & support ltd | ACORNS TO OAKS CHILDREN'S CARE & SUPPORT LTD | 100 |
affinity childrens care ltd | AFFINITY CHILDRENS CARE LIMITED | 100 |
1st class support and housing ltd | 1ST CLASS SUPPORT & HOUSING LTD | 94 |
alliance care and education ltd | ALLIANCE FOR CARE & EDUCATION LIMITED | 88 |
5. Fetch Company Details in Parallel
Once a company number is confirmed, three API calls are made concurrently using asyncio.gather:
GET /company/{number} → company profileGET /company/{number}/registered-office-address → registered addressGET /company/{number}/officers → directors and officers
Only active officers are kept — anyone with a resigned_on date is discarded. If latest_director_only is enabled, only the most recently appointed active director is retained.
Officer names from Companies House arrive in the format LASTNAME, Firstname Middle and are fully parsed into separate title, first_name, middle_name, and last_name fields.
The registered address is also scanned to extract a door_number field (e.g. "92 London Road" → "92") which is useful for address matching workflows.
6. Handle No-Match Results — search_count
When a provider name produces no strong match (score < 85), or when the API returns no results at all, the actor calls increment_search_count() which sets search_count = 1 for that row in ofsted_providers.
On all future runs, get_missing_provider_names() filters with search_count < 1, so these exhausted names are permanently skipped without making any API calls. This is what prevents the actor from wasting time re-searching the same unmatchable names on every run.
To manually retry a previously failed provider, reset its search_count to 0 in the database:
UPDATE ofsted_providers SET search_count = 0 WHERE searched_name = 'provider name here';
7. Resolve Duplicate Company Numbers — Alias System
The same company can appear in ofsted_reports under slightly different names across different inspection records, for example:
"a&m care solutions ltd - t/a metropolitan care""a&m care solutions ltd - t/a metropolitan care solutions""a & m care solutions ltd - t/a metropolitan care solutions"
All three resolve to the same Companies House number. When the actor matches a company and finds that another row in ofsted_providers already holds that company number under a different searched_name, it:
- Saves the stale name into
ofsted_provider_aliaseslinked to that company number - Deletes the stale row from
ofsted_providers - Updates the canonical row (the current
searched_name) with the full company data
On all future runs, sync_provider_names() sees those stale names in the aliases table and skips inserting them into ofsted_providers entirely — they never reach the API queue again.
8. Write Results to the Database
A successful match triggers an UPDATE on the existing ofsted_providers row (seeded in step 2) and an INSERT ... ON DUPLICATE KEY UPDATE for each active director in ofsted_directors.
Directors removed from the active list since the last run are automatically deleted from ofsted_directors.
Database Schema
ofsted_providers
Stores one row per unique provider name from ofsted_reports. Rows start with only searched_name populated and are filled in when a match is found.
| Column | Type | Description |
|---|---|---|
id | BIGINT | Auto-increment primary key |
searched_name | VARCHAR(255) | Provider name as it appears in ofsted_reports (lowercased) |
matched_name | VARCHAR(255) | Official Companies House name |
company_number | VARCHAR(20) | Companies House unique identifier (nullable until matched) |
company_status | VARCHAR(50) | e.g. active, dissolved, liquidation |
company_type | VARCHAR(50) | e.g. ltd, plc, llp |
creation_date | DATE | Date of incorporation |
sic_codes_json | JSON | Standard Industrial Classification codes |
address_line_1 | VARCHAR(255) | First line of registered address |
address_line_2 | VARCHAR(255) | Second line of registered address |
locality | VARCHAR(100) | Town or city |
region | VARCHAR(100) | County or region |
country | VARCHAR(100) | Country |
postal_code | VARCHAR(20) | UK postcode |
search_count | TINYINT | 0 = not yet searched, 1 = searched but no match found |
created_at | DATETIME | Row creation timestamp |
updated_at | DATETIME | Last update timestamp |
ofsted_directors
Stores active officers for each matched company. Foreign-keyed to ofsted_providers.company_number with ON DELETE CASCADE.
| Column | Type | Description |
|---|---|---|
id | BIGINT | Auto-increment primary key |
company_number | VARCHAR(20) | FK → ofsted_providers.company_number |
full_name | VARCHAR(255) | Full name as returned by Companies House |
title | VARCHAR(50) | Salutation (Mr, Dr, etc.) |
first_name | VARCHAR(100) | Parsed first name |
middle_name | VARCHAR(100) | Parsed middle name(s) |
last_name | VARCHAR(100) | Parsed last name |
role | VARCHAR(50) | e.g. director, secretary |
appointed_on | DATE | Appointment date |
nationality | VARCHAR(100) | Declared nationality |
service_address_premises | VARCHAR(100) | Building name/number |
service_address_address_line_1 | VARCHAR(255) | First address line |
service_address_address_line_2 | VARCHAR(255) | Second address line |
service_address_locality | VARCHAR(100) | Town or city |
service_address_region | VARCHAR(100) | County or region |
service_address_country | VARCHAR(100) | Country |
service_address_postal_code | VARCHAR(20) | Postcode |
created_at | DATETIME | Row creation timestamp |
updated_at | DATETIME | Last update timestamp |
ofsted_provider_aliases
Stores variant names that have already been resolved to a company number. Used by sync_provider_names() to skip redundant lookups.
| Column | Type | Description |
|---|---|---|
id | BIGINT | Auto-increment primary key |
alias_name | VARCHAR(255) | The variant name (unique) |
company_number | VARCHAR(20) | The company it resolved to |
created_at | DATETIME | When the alias was recorded |
Input Reference
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | String | ✅ | — | Companies House API key |
db_host | String | ✅ | — | MySQL host address |
db_database | String | ✅ | — | Database name |
db_user | String | ✅ | — | Database username |
db_password | String | ✅ | — | Database password |
process_all | Boolean | ❌ | false | Re-process all providers including already-matched ones |
latest_director_only | Boolean | ❌ | false | Store only the most recently appointed active director per company |
request_delay | Integer | ❌ | 1 | Seconds to wait between companies (increase for large batches) |
Example Input
{"api_key": "your-companies-house-api-key","db_host": "your-db-host","db_database": "your-database","db_user": "your-db-user","db_password": "your-db-password","request_delay": 4,"process_all": false,"latest_director_only": false}
Output
Results are pushed to the Apify dataset and written to the database on every successful match. Each dataset record looks like this:
{"searched_name": "almoner care group ltd","matched_name": "ALMONER CARE GROUP LIMITED","company_number": "13745628","registered_address": {"address_line_1": "Seymour Chambers","address_line_2": "92 London Road","locality": "Liverpool","region": "Merseyside","country": "United Kingdom","postal_code": "L3 5NW","door_number": "92"},"active_directors": [{"full_name": "CONNOLLY, Frederick James Alfred","title": "","first_name": "Frederick","middle_name": "James Alfred","last_name": "CONNOLLY","role": "director","appointed_on": "2023-02-13","nationality": "British","service_address": {"premises": "Seymour Chambers","address_line_1": "92 London Road","locality": "Liverpool","region": "Merseyside","country": "England","postal_code": "L3 5NW"}}],"company_profile": {"status": "active","type": "ltd","creation_date": "2021-11-16","sic_codes": ["87900"]}}
Automatic Schema Migration
The actor runs migrate_tables() on every startup using SHOW COLUMNS (not INFORMATION_SCHEMA, which can cause metadata lock hangs). It applies any missing schema changes to existing tables without dropping or recreating them:
- Makes
company_numbernullable if it was previously aNOT NULLprimary key - Adds the
search_countcolumn and index if it does not exist
This means the actor is safe to deploy against an existing database from a previous version — it upgrades the schema automatically.
Getting a Companies House API Key
- Register at developer.company-information.service.gov.uk
- Create a new application and select Live environment
- Copy the API key and paste it into the
api_keyinput field
The free tier allows up to 600 requests per minute and is sufficient for all normal batch sizes.
Rate Limiting
The actor handles HTTP 429 Too Many Requests responses with automatic exponential backoff — retrying up to 3 times at 5 s, 10 s, and 20 s intervals. The request_delay input (default 1 second) adds an additional pause between companies. For batches over 500 providers, setting request_delay to 3 or 4 is recommended.