UK Companies House Bulk Scraper
Pricing
from $1.50 / 1,000 results
UK Companies House Bulk Scraper
Extract UK companies from the official Companies House register. 5M+ businesses by industry (SIC code), location, age, status. Monthly bulk snapshot. No API key. B2B lead-gen, KYC, recruitment, market research. $1/1K companies.
Pricing
from $1.50 / 1,000 results
Rating
0.0
(0)
Developer
Logiover
Maintained by CommunityActor stats
0
Bookmarked
8
Total users
5
Monthly active users
7 hours ago
Last modified
Categories
Share
Extract any slice of the UK Companies House register — 5+ million UK businesses — from the official monthly bulk snapshot. Filter by SIC industry code, company name, status, category, postcode, county, country, and incorporation date. No API key required. The most complete and cost-efficient way to build UK B2B lead lists, market maps, and business intelligence datasets.
What Is This Actor?
Companies House is the UK government's official registrar of companies. Every registered UK business — from sole-trader limited companies to FTSE 100 giants — appears in the register. Companies House publishes a free monthly bulk snapshot of the entire register as a single ZIP file (~500 MB compressed, ~2 GB uncompressed CSV) containing over 5 million rows.
This actor downloads that snapshot, streams through it efficiently, applies your filters, and saves only the matching companies to your dataset — without ever storing the full 5M-row file anywhere you'd have to manage.
Built for:
- 🏢 B2B lead generation — build targeted prospect lists of UK businesses by industry, location, and size
- 📊 Market research — map the UK business landscape by sector, region, or company type
- 🆕 New business prospecting — find freshly incorporated companies before your competitors do
- 🔍 Due diligence & compliance — verify company registration details, status, and SIC codes
- 📈 Investor sourcing — monitor company formation rates and identify growing sectors
- 🗺️ Geographic analysis — build postcode or county-level business density datasets
- ⚖️ Competitive intelligence — identify all registered players in a specific SIC category
- 🤝 Partnership prospecting — find companies in complementary industries by SIC code
Why Use the Bulk Snapshot?
Companies House offers two ways to access company data:
| Method | Speed | Cost | Coverage | Filters |
|---|---|---|---|---|
| Companies House API | Per-company lookups | £0 but slow at scale | Any company | Post-fetch only |
| Bulk snapshot (this actor) | Streams 5M rows in ~10 min | Free | All 5M+ companies | Pre-fetch, server-side |
The bulk snapshot is the only practical way to extract thousands or millions of company records with complex multi-field filters. The API would require one HTTP request per company, making a full sector scan prohibitively slow and expensive.
Features
- Full UK register — 5+ million companies including active, dissolved, liquidated, and dormant entities
- Official government source — data comes directly from
download.companieshouse.gov.uk - No API key required — the bulk snapshot is a free public download
- Auto-detect latest snapshot — automatically finds and downloads the most recent monthly file
- SIC code prefix filter — filter by industry using Standard Industrial Classification codes
- SIC keyword filter — match companies by free-text within SIC descriptions
- Company name filter — substring match on company name
- Status filter — Active, Dissolved, Liquidation, etc.
- Category filter — Private Limited Company, Public Limited Company, LLP, etc.
- Postcode prefix filter — target specific London boroughs, cities, or regions
- Post town filter — exact city/town match
- County filter — substring match on county
- Country filter — England, Scotland, Wales, Northern Ireland
- Incorporation date range — find new companies or established businesses
- Streaming processing — processes the 5M-row CSV without loading it all into memory
- Batch dataset writes — 500-record batches for fast, efficient output
- Full address data — care of, PO box, address lines, post town, county, country, postcode
- Financial data — accounts due dates, last filed dates, account category
- Mortgage/charges data — number of charges, outstanding, satisfied
- Previous names — up to 10 previous company names with change dates
- Confirmation statement dates — next due and last made-up dates
- Direct Companies House profile URL — one-click access to the official register entry
- Direct Companies House API URL — ready for further enrichment via the CH REST API
- Two dataset views — Overview and Recently Incorporated
Output Data
Each record represents one UK registered company that matched your filters.
Top-Level Fields
| Field | Type | Description |
|---|---|---|
companyName | string | Registered company name |
companyNumber | string | Unique Companies House registration number |
status | string | "Active", "Dissolved", "Liquidation", "Administration", etc. |
category | string | "Private Limited Company", "Public Limited Company", "LLP", etc. |
incorporationDate | string | null | Date of incorporation (YYYY-MM-DD) |
dissolution_date | string | null | Date of dissolution, if applicable (YYYY-MM-DD) |
country_of_origin | string | null | Country of origin (for foreign companies) |
address | object | Registered office address (see below) |
sicCodes | array | List of SIC code descriptions (up to 4) |
accounts | object | Accounts filing dates and category (see below) |
returns | object | Annual return dates (see below) |
mortgages | object | Mortgage/charge counts (see below) |
previousNames | array | Previous company name history (see below) |
confirmationStatement | object | Confirmation statement due and last made-up dates |
url | string | Direct link to the Companies House public register entry |
apiUrl | string | Direct link to the Companies House REST API for this company |
Address Object
{"careOf": null,"poBox": null,"line1": "123 Tech Street","line2": "Shoreditch","postTown": "LONDON","county": "GREATER LONDON","country": "ENGLAND","postCode": "EC2A 1AB"}
Accounts Object
{"referenceDay": "31","referenceMonth": "12","nextDueDate": "2025-09-30","lastMadeUpDate": "2024-12-31","category": "TOTAL EXEMPTION FULL"}
Mortgages Object
{"numCharges": 3,"numOutstanding": 1,"numPartSatisfied": 0,"numSatisfied": 2}
Previous Names Array
[{ "changedOn": "2021-03-15", "name": "ACME TECH LIMITED" },{ "changedOn": "2018-07-01", "name": "ACME CONSULTING LIMITED" }]
Sample Full Output Record
{"companyName": "ACME TECHNOLOGIES LIMITED","companyNumber": "12345678","status": "Active","category": "Private Limited Company","incorporationDate": "2020-06-01","dissolution_date": null,"country_of_origin": null,"address": {"careOf": null,"poBox": null,"line1": "10 Fintech Square","line2": null,"postTown": "LONDON","county": "GREATER LONDON","country": "ENGLAND","postCode": "EC2A 1AB"},"sicCodes": ["62012 - Business and domestic software development"],"accounts": {"referenceDay": "30","referenceMonth": "6","nextDueDate": "2025-03-31","lastMadeUpDate": "2024-06-30","category": "TOTAL EXEMPTION FULL"},"returns": {"nextDueDate": null,"lastMadeUpDate": null},"mortgages": {"numCharges": 0,"numOutstanding": 0,"numPartSatisfied": 0,"numSatisfied": 0},"previousNames": [],"confirmationStatement": {"nextDueDate": "2026-06-14","lastMadeUpDate": "2025-06-01"},"url": "https://find-and-update.company-information.service.gov.uk/company/12345678","apiUrl": "https://api.company-information.service.gov.uk/company/12345678"}
SIC Codes — Industry Classification
The UK uses Standard Industrial Classification (SIC) 2007 codes to classify business activities. Each company can have up to 4 SIC codes. The full list of UK SIC codes is available at resources.companieshouse.gov.uk/sic/.
Useful SIC code prefixes for common use cases:
| Prefix | Industry |
|---|---|
62 | IT and Software Development |
63 | Information Service Activities (data, web portals) |
58 | Publishing (software publishing: 582) |
70 | Management Consulting |
73 | Advertising and Market Research |
64 | Financial Services and Banking |
66 | Insurance and Pension Funding |
71 | Architectural and Engineering |
72 | Scientific Research and Development |
47 | Retail Trade |
41 | Construction of Buildings |
86 | Human Health Activities |
85 | Education |
56 | Food and Beverage Service Activities |
55 | Accommodation |
74 | Other Professional and Scientific Activities |
77 | Rental and Leasing |
80 | Security and Investigation |
Example: sicCodePrefixes: ["62", "63"] returns all UK IT companies.
Input Configuration
snapshotMonth · string · format: YYYY-MM · default: "" (latest)
Which monthly snapshot to use. Leave blank to automatically use the most recent available snapshot. Companies House publishes a new snapshot around the 1st of each month.
Examples: "2025-05", "2024-12", "2024-01"
If the specified month is not available, the actor throws an error and lists the available snapshot dates.
sicCodePrefixes · array of strings · default: []
Filter by SIC code prefix. Each SIC entry in the bulk file is formatted as "62012 - Business and domestic software development". The prefix is matched against the numeric portion at the start.
Multiple prefixes are OR-ed — a company matches if any of its up to 4 SIC codes starts with any of the provided prefixes.
Examples:
["62"]— all IT and software companies (SIC 62xxx)["62", "63"]— IT + information services["70", "73"]— management consulting + advertising["6201"]— only6201xsoftware development codes (more specific)
Leave empty to skip SIC code filtering.
sicKeywords · array of strings · default: []
Filter by free-text keyword within the SIC description. Case-insensitive substring match. Multiple keywords are OR-ed.
Examples:
["software"]— matches any SIC description containing "software"["consulting", "advisory"]— matches consulting or advisory roles["fintech", "payment"]— matches fintech-adjacent descriptions
Combine with sicCodePrefixes to narrow further (both filters are applied with AND logic).
nameContains · string · default: ""
Substring match on company name. Case-insensitive.
Examples:
"TECH"— matches"FINTECH LTD","TECHWAVE LIMITED","ULTRATECH SOLUTIONS""AI"— matches any company with "AI" in the name"GROUP"— matches holding groups and parent companies
Leave blank to include all company names.
companyStatuses · array of strings · default: ["Active"]
Filter by company status. For lead generation, "Active" is almost always the right choice.
All possible values:
| Value | Description |
|---|---|
Active | Trading and registered (most common for lead-gen) |
Dissolved | Company has been dissolved and removed from the register |
Liquidation | In the process of being wound up |
Administration | Under administrator protection |
Voluntary Arrangement | CVA in place |
Receivership | Under receivership |
Converted / Closed | Changed legal form or closed |
Open | Used for some partnership types |
Registered | Registered but not yet trading |
Dormant | Incorporated but not actively trading |
Leave empty to include all statuses.
companyCategories · array of strings · default: []
Filter by legal entity type.
Common values:
| Value | Description |
|---|---|
Private Limited Company | Ltd — the most common UK company type |
Public Limited Company | PLC — publicly traded or large private companies |
Limited Liability Partnership | LLP — common for law firms, accountants |
Private Unlimited Company | No liability limit, no public disclosure obligation |
Royal Charter Company | Incorporated by Royal Charter |
Community Interest Company | CIC — social enterprises |
European Public Limited-Liability Company (SE) | EU company type |
Leave empty to include all categories.
postCodePrefixes · array of strings · default: []
Filter by UK postcode prefix (outward code). Case-insensitive, matched from the start of the postcode field.
Examples:
| Prefix | Coverage |
|---|---|
EC | Central London (City of London) |
SW1 | Westminster, Belgravia, Pimlico |
E1 | Shoreditch, Whitechapel, Spitalfields |
WC | West Central London (Holborn, Covent Garden) |
M | All Manchester postcodes |
B | All Birmingham postcodes |
LS | All Leeds postcodes |
BS | All Bristol postcodes |
G | All Glasgow postcodes |
EH | All Edinburgh postcodes |
CF | All Cardiff postcodes |
BT | All Northern Ireland postcodes |
Multiple prefixes are OR-ed.
postTowns · array of strings · default: []
Exact match on the registered post town. Case-insensitive.
Examples: ["LONDON"], ["MANCHESTER", "SALFORD"], ["EDINBURGH", "GLASGOW"]
counties · array of strings · default: []
Substring match on the county field. Case-insensitive.
Examples: ["GREATER LONDON"], ["WEST MIDLANDS", "WEST YORKSHIRE"], ["SURREY"]
countries · array of strings · default: []
Substring match on the registered country field.
Values: ["ENGLAND"], ["SCOTLAND"], ["WALES"], ["NORTHERN IRELAND"]
incorporatedSince · string · format: YYYY-MM-DD · default: ""
Only include companies incorporated on or after this date. Useful for finding new businesses before they've had time to fill their CRM with competitor outreach.
Examples:
"2025-01-01"— companies less than ~5 months old (as of May 2025)"2024-01-01"— companies less than ~16 months old"2023-01-01"— companies incorporated in the last 2+ years
incorporatedBefore · string · format: YYYY-MM-DD · default: ""
Only include companies incorporated on or before this date. Useful for finding established businesses with a track record.
Example: "2020-01-01" — companies at least 5 years old.
maxResults · integer · default: 0 · range: 0–5,000,000
Hard cap on the number of records saved to the dataset. The actor always scans the full snapshot but stops writing once this limit is reached.
Set to 0 for unlimited (saves all matching companies).
Tip: Start with
maxResults: 1000to preview your filter results before running a full extraction.
logEveryNRows · integer · default: 100000 · range: 1000–1000000
How frequently (in CSV rows processed) the actor logs progress to the console. Lower values produce more verbose logs; higher values reduce log noise for large runs.
The bulk snapshot has ~5 million rows — at the default of 100000, you'll see ~50 progress lines during a full scan.
Usage Examples
Example 1 — Active IT companies in London (latest snapshot)
{"sicCodePrefixes": ["62", "63"],"companyStatuses": ["Active"],"postTowns": ["LONDON"],"maxResults": 5000}
Returns up to 5,000 active IT and software companies registered in London.
Example 2 — Freshly incorporated companies across England (last 6 months)
{"companyStatuses": ["Active"],"companyCategories": ["Private Limited Company"],"countries": ["ENGLAND"],"incorporatedSince": "2024-11-01","maxResults": 50000}
Returns newly registered English limited companies — ideal for new-business prospecting before competitors have found them.
Example 3 — Management consulting firms in Scotland
{"sicCodePrefixes": ["70"],"companyStatuses": ["Active"],"countries": ["SCOTLAND"],"maxResults": 0}
Returns all active management consulting companies registered in Scotland (unlimited).
Example 4 — Fintech companies by SIC keyword, City of London
{"sicKeywords": ["financial", "payment", "lending", "credit"],"companyStatuses": ["Active"],"postCodePrefixes": ["EC", "E1", "WC"],"maxResults": 2000}
Returns fintech-adjacent companies in the City of London and Shoreditch area.
Example 5 — All active PLCs (public limited companies)
{"companyStatuses": ["Active"],"companyCategories": ["Public Limited Company"],"maxResults": 0}
Returns all active UK PLCs — useful for public company datasets and investor research.
Example 6 — Health sector companies in Greater Manchester
{"sicCodePrefixes": ["86", "87", "88"],"companyStatuses": ["Active"],"counties": ["GREATER MANCHESTER"],"maxResults": 3000}
Example 7 — Established tech companies (5+ years old) in Bristol
{"sicCodePrefixes": ["62", "63"],"companyStatuses": ["Active"],"postTowns": ["BRISTOL"],"incorporatedBefore": "2020-01-01","maxResults": 1000}
Example 8 — Specific snapshot month
{"snapshotMonth": "2025-03","sicCodePrefixes": ["62"],"companyStatuses": ["Active"],"maxResults": 10000}
Uses the March 2025 snapshot regardless of which month is currently latest.
How It Works
Step 1 — Snapshot Discovery
The actor fetches the Companies House download index page (download.companieshouse.gov.uk/en_output.html) and extracts all available BasicCompanyDataAsOneFile-YYYY-MM-DD.zip file names. It selects the most recent (or the one matching snapshotMonth).
Step 2 — Streaming Download
The ~500 MB ZIP file is downloaded to /tmp/ch/snapshot.zip using a streaming download with progress logging every 50 MB. This avoids loading the full file into memory.
download.companieshouse.gov.uk│▼ (streaming ~500 MB)/tmp/ch/snapshot.zip
Step 3 — Streaming Unzip + CSV Parse
The ZIP is opened in-place using unzipper. The inner CSV is streamed directly into csv-parser without fully decompressing to disk first — saving ~1.5 GB of disk I/O.
Step 4 — Row-by-Row Filtering
Each of the 5M+ rows is tested against all active filters. Filters are applied in this order (fastest-reject first):
companyStatuses— string equalitycompanyCategories— string equalitysicCodePrefixes— string prefix match on up to 4 SIC fieldssicKeywords— case-insensitive substring match on concatenated SIC textnameContains— case-insensitive substring matchpostCodePrefixes— string prefix matchpostTowns— case-insensitive exact matchcounties— case-insensitive substring matchcountries— case-insensitive substring matchincorporatedSince/incorporatedBefore— date comparison (UK DD/MM/YYYY parsed to ISO)
Rows that pass all active filters are normalized and buffered.
Step 5 — Batch Dataset Write
Matching rows are collected into 500-record batches and pushed to the Apify Dataset in bulk — approximately 50–100× faster than per-record writes.
ZIP stream│▼csv-parser (row by row)│▼matchesFilters() → reject / accept│▼normalize() → structured record│▼500-record buffer → Dataset.pushData([...500])│└── Repeat until end of CSV or maxResults reached
Performance
| Scenario | Rows Scanned | Matches | Est. Time |
|---|---|---|---|
| Active IT companies, London | 5M | ~15,000–30,000 | ~12–18 min |
| New companies, England, last 6mo | 5M | ~50,000–100,000 | ~12–18 min |
| All active PLCs | 5M | ~5,000–10,000 | ~12–18 min |
| All active companies (no filter) | 5M | ~3,000,000+ | ~15–20 min |
| Quick preview (maxResults: 1000) | 5M (stops early) | 1,000 | ~2–5 min |
Download time: The ~500 MB ZIP download takes 3–8 minutes depending on network speed.
Scan time: Streaming through 5M rows takes ~10–15 minutes.
Total run time: Expect 15–25 minutes for most filtered queries.
The actor is CPU- and I/O-bound, not network-bound after the initial download. No proxy is needed or used.
Dataset Views
The Apify Console provides two pre-configured dataset views:
Overview — all matching companies with: name, number, status, category, incorporation date, address, SIC codes, and Companies House profile URL.
Recently Incorporated — same fields, pre-sorted by incorporation date (newest first). Useful for new-business prospecting workflows.
Export Formats
Download your results from the Apify Dataset in:
- JSON — full nested structure including
address,accounts,mortgages,previousNamesobjects - CSV — flat table; nested objects serialized as JSON strings — ready for Excel or Google Sheets
- Excel (.xlsx) — native spreadsheet format for sharing with non-technical stakeholders
- JSONL — one record per line for streaming into CRMs, databases, or data pipelines
Tips for Best Results
Start small: Use maxResults: 1000 on your first run to verify your filters return the right companies before running a full extraction.
Combine filters for precision: SIC prefix + postcode prefix + status gives a highly targeted list. SIC alone on a broad prefix like "6" can return 500,000+ companies.
Use SIC keywords for niche industries: If you can't find the right SIC code prefix, try sicKeywords: ["machine learning"] or sicKeywords: ["cybersecurity"] to match the description text.
New-company prospecting: Set incorporatedSince to 3–6 months ago for a continuous stream of freshly registered companies in your target sector — reach them before they're on every competitor's call list.
Postcode precision: Use longer prefixes for tighter geographic targeting. "E1" is more specific than "E". "SW1A" is more specific than "SW".
Use the API URL: Every record includes an apiUrl field pointing to the Companies House REST API for that company. Use this for further enrichment — directors, filing history, PSC (persons with significant control) data — via the free Companies House API.
Limitations
- Snapshot is monthly, not real-time. The bulk snapshot is published ~monthly. Newly incorporated companies will appear in the next snapshot, not immediately. For real-time lookups, use the Companies House API directly (one company at a time).
- No director or officer data. The bulk snapshot contains registered company data only — not director names, officer details, or PSC (beneficial ownership) information. These are available via the Companies House REST API using the
apiUrlfield in each record. - No financial data beyond filing dates. Revenue, profit, employee count, and balance sheet figures are not in the bulk snapshot. Annual accounts are filed separately and require fetching individual company pages.
- Download time is fixed. The ~500 MB ZIP must be downloaded before filtering begins. There is no way to pre-filter the download — this is a limitation of the bulk file format, not the actor.
- Scan time is fixed at ~10–15 minutes regardless of how many results you expect. The actor always reads the full 5M-row CSV to find all matches.
- Address quality varies. Some companies provide minimal or incomplete address data. The
postCode,postTown, andcountyfields can be blank for some companies — postcode filtering will not match these rows. - SIC codes are self-reported. Companies declare their own SIC codes on incorporation. Self-reporting means codes can be generic, inaccurate, or outdated. Always validate against the actual company description.
Frequently Asked Questions
Q: Does this require a Companies House API key?
No. The bulk snapshot download is a free public file hosted by Companies House. No registration, API key, or authentication is required.
Q: How often is the data updated?
Companies House publishes a new bulk snapshot approximately monthly, around the 1st of each month. Leave snapshotMonth blank to always use the latest.
Q: Can I get director names from this data?
No. Director and officer information is not in the bulk snapshot. Use the apiUrl field from each result to query the free Companies House REST API for officer data on individual companies.
Q: Why does the full scan take 15–25 minutes even with tight filters?
The actor must stream through all 5M+ rows to find your matches — there's no pre-indexed way to jump to specific rows in the CSV. The download (~500 MB) and scan (~10–15 min) are unavoidable. This is the tradeoff of using the free bulk file vs. the paginated API.
Q: How do I find companies in a specific city that isn't a major UK city?
Use postCodePrefixes with the relevant outward code (e.g. "OX" for Oxford, "CB" for Cambridge, "PO" for Portsmouth). Alternatively, use postTowns with the town name as it appears in the register (usually uppercase).
Q: What's the difference between postTowns and counties?
postTowns is an exact match on the post town field (e.g. "LONDON", "MANCHESTER"). counties is a substring match on the county field (e.g. "GREATER LONDON", "WEST YORKSHIRE"). For major cities, postTowns is more precise; for regional coverage, use counties or postCodePrefixes.
Q: Can I filter by number of employees or revenue?
No. The bulk snapshot does not contain financial performance data. Employee count and revenue are not available from Companies House at all — those require third-party commercial data sources.
Q: What does makersRedacted mean in the output?
This field does not apply to this actor (it belongs to a different actor's output schema). Disregard it.
Q: Can I use the output to further enrich with Companies House API data?
Yes — and this is the recommended workflow. Use this actor to build a filtered list, then use the apiUrl field to call the Companies House REST API for directors, filing history, charges, and PSC data on each company. The Companies House API is free with registration.
Technical Details
| Property | Value |
|---|---|
| Runtime | Node.js (ES Modules) |
| Framework | Apify SDK v3 + Crawlee Dataset |
| Download client | got-scraping (streaming) |
| Zip handling | unzipper (streaming, no full decompress) |
| CSV parser | csv-parser (streaming, header trimming) |
| Batch write size | 500 records per Dataset.pushData() call |
| Progress log interval | Every 100,000 rows (configurable) |
| Download size | ~500 MB (compressed) |
| Uncompressed CSV | ~2 GB |
| Rows in snapshot | ~5,000,000+ |
| Temp storage used | ~500 MB in /tmp/ch/ (deleted after run) |
| Proxy required | ❌ No |
| API key required | ❌ No |
Changelog
v0.1
- Initial release
- Streaming download and CSV parsing of the Companies House bulk snapshot
- Auto-detection of latest snapshot month
- Filters: SIC prefix, SIC keyword, company name, status, category, postcode prefix, post town, county, country, incorporation date range
- Full record normalization: address, SIC codes, accounts, returns, mortgages, previous names, confirmation statement
- Direct Companies House profile URL and REST API URL per record
- 500-record batch dataset writes for high-throughput output
- Progress logging every N rows (configurable)
- maxResults hard cap with early termination
- Two dataset views: Overview and Recently Incorporated
Support
If you encounter snapshot discovery failures (Companies House may occasionally change their download page layout), ZIP parse errors, or filter logic issues, please open a support ticket via the Apify Console. Include your input configuration, the actor run ID, and any error messages from the run log.
Changelog
- 2026-05-20 — Maintenance pass: reviewed the input schema and default values for a smooth one-click start, and rebuilt the Actor on the latest base image.
Last reviewed: 2026-05-20.
📝 Changelog
2026-06-04
- Verified live & refreshed build — reliability/maintenance pass.