Polish Premises Prospector - REGON jednostki lokalne
Pricing
from $10.00 / 1,000 premises
Polish Premises Prospector - REGON jednostki lokalne
Build prospect lists at the physical-site level from the Polish REGON registry. One row per premise (jednostka lokalna) with industry (PKD), address, company age, and ownership, filterable by region (TERYT).
Pricing
from $10.00 / 1,000 premises
Rating
0.0
(0)
Developer
getregdata
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
2
Monthly active users
10 days ago
Last modified
Categories
Share
Polish Premises Prospector - Site-Level REGON (jednostki lokalne)
Build B2B prospect lists in Poland at the individual physical-site level - one row per premise (jednostka lokalna), not just the registered company HQ. For each company you seed, this actor expands it into every physical location registered in the official GUS REGON registry, with the attributes you need to filter and prioritize:
- Industry - per-site PKD activity code + name (each premise can have its own activity)
- Address - full readable address (województwo, powiat, gmina, city, street, number, postcode)
- Region - TERYT symbols for precise filtering by voivodeship / county
- Company age - registration / start date of both the site and the parent company
- Ownership - parent legal form and ownership form (e.g. private, foreign, State Treasury)
- Headcount (optional, company-level proxy) - see "Headcount" below
Output is one row per premise plus the company HQ, ready for CSV/Excel/API.
Why site-level?
REGON registers local units (jednostki lokalne) separately from the legal entity. Each gets its own 14-digit REGON, its own address, and its own PKD activity. A bank has hundreds of branches; a manufacturer has plants and warehouses at different addresses. Filtering the legal entity alone only ever returns the HQ - this actor returns the actual workplaces.
How it works
seed identifiers ──▶ public GUS REGON registry ──▶ entity report + entity PKD(anonymous, no API key) + local-unit list (every premise)+ per-site report + PKD──▶ one row per premise (HQ + jednostki lokalne)──▶ filter by region (TERYT) and site type (PKD)──▶ optional per-site headcount overlay (footprint / LinkedIn)
No API key required. This actor reads the public REGON registry anonymously (it does not use the registered BIR web-service key). You seed the companies to examine (see Discovery), and the actor expands each into its premises and filters them.
Input
| Field | Description |
|---|---|
seedIdentifiers | List of NIP / REGON / KRS (auto-detected by shape). |
seedDatasetId | Apify dataset ID to read identifiers from (chain in a regional search / KRS export). |
seedIdType | How to read 10-digit ids (auto / nip / regon / krs). NIP and KRS are both 10 digits. |
regions | Presets warszawa, lodz, kutno, or { "woj": "10", "powiat": "03" }. Empty = all Poland. |
siteTypes | factory, warehouse, clinic, school, coliving. Empty = all industries. |
pkdCodes | Explicit PKD prefixes, e.g. "86", "8610", "5210B". |
fetchSitePkd | Fetch each premise's own PKD (one extra call/premise). Off = inherit company PKD. |
includeHeadcount | Attach a per-site headcount signal (footprint for industrial, LinkedIn for white-collar; see below). |
headcountMethod | auto (default), footprint, linkedin, or none. |
maxResults, minIntervalMs | Output cap and politeness delay between requests. |
Discovery: how to seed a regional list
The actor works on real public registry data with no key or sandbox to worry about. Get your seed universe from one of:
- The companion
polish-regon-scraperactor (recommended) - run a by-address / by-PKD discovery there, then pass its dataset id asseedDatasetId. This actor readsnip/regon/krsfrom each item. The two actors share the same anonymous REGON client. - A KRS export or your own prospect list - drop the identifiers into
seedIdentifiers. - A paid GUS bulk data order - the cleanest way to get the full local-unit universe for a region (statistical confidentiality applies, so no headcount).
Headcount (read this)
No source publishes a verified per-site headcount for Poland - REGON's employment figure
is legally confidential. So instead of pretending otherwise, every enriched row carries a
headcountBasis that tells you exactly how much of the number is sourced vs modeled:
headcountBasis | What it means | Method | Best for |
|---|---|---|---|
reported-proxy | A real, sourced count (a tally of actual LinkedIn profiles for the company in that city) | LinkedIn employees-by-city | Offices, clinics, schools, universities |
modeled-estimate | A number we compute (building footprint ÷ employment density, ±50-100%) | OSM footprint | Factories, warehouses, DCs |
(none, unknown) | No signal available | - | - |
headcountMethod=auto routes white-collar site types to the LinkedIn city-count (the primary,
sourced signal). For each company we resolve the canonical LinkedIn page before counting:
first by name (harvestapi/linkedin-company, validated by website-domain / name overlap), then -
if the legal name is too far from the brand - by website domain (s-r/free-linkedin-company-finder,
e.g. pkobp.pl -> /company/pko-bp). A low-confidence match is rejected (-> unknown) rather
than returned as a wrong number. We then count employees by the resolved URL, filtered to the
site's city. Caveats: it's a per-city count (not split between two sites in the same city)
and undercounts blue-collar sites. Counting uses harvestapi/linkedin-company-employees at
~$0.02/company-city (the count is read from the run log, so only 1 profile is scraped);
resolution adds ~$0.003-0.004/company. Bound LinkedIn spend with maxEnrich.
For industrial site types (factory/warehouse/DC), where LinkedIn is blind, auto routes to the
building-footprint method - the only realistic per-site number. It geocodes the site address
(OSM Nominatim), reads the building's floor area from OpenStreetMap (Overpass), and divides by an
employment-density factor (warehouse ~80, manufacturing ~40 m²/FTE; UK Employment Densities Guide)
× building levels. The result is flagged modeled-estimate with a ±50-100% band
(headcountLow/headcountHigh) and a caveat, because the floor area is sourced but the people
figure is computed, not reported. It uses only free OSM APIs (no Apify helper-actor cost); if no
building can be resolved at the address, the row is left as unknown rather than guessed. Per OSM
usage policy the actor self-throttles to ≤1 request/second and sends a descriptive User-Agent
(footprintUserAgent). Data © OpenStreetMap contributors (ODbL).
Output (one row per premise)
regon14, parentRegon, nip, krs, isHeadquarters, name, companyName,
addressText, voivodeship, powiat, gmina, city, postalCode, street,
buildingNumber, apartmentNumber, terytWoj/terytPowiat/terytGmina,
pkdMainCode, pkdMainName, pkdAll, siteStartDate, companyRegistrationDate,
companyAgeYears, legalForm, ownershipForm, website, localUnitsCount,
headcountEstimate, headcountLow, headcountHigh, headcountBasis, headcountMethod,
headcountSource, headcountConfidence, linkedinCityCount,
headcountCaveat, scrapedAt, dataSource.
Coverage notes
- Strong: factories (PKD 10-33), warehouses/distribution (52, 46), private clinics (86), private schools/universities (85). These commonly register premises as local units.
- Weak: co-living - buildings are not entities; only the operator's declared premise (PKD 55/68) may appear. Expect partial coverage.
- Only legal entities (
Typ = P) are expanded. Sole-proprietor businesses are skipped.
Permissions
This Actor requests full permissions because its optional LinkedIn headcount enrichment
orchestrates third-party LinkedIn Actors (harvestapi/linkedin-company,
harvestapi/linkedin-company-employees), and calling them requires it. The Actor only runs those
Actors and writes its own dataset - it does not read or modify any other data in your account. If
you don't use LinkedIn headcount, the core REGON extraction and the free OSM building-footprint
headcount work the same.
Source
Data: GUS REGON BIR (official Polish statistical
business registry). This actor uses the registry's JSON ajaxEndpoint interface.