Polish Premises Prospector - REGON jednostki lokalne avatar

Polish Premises Prospector - REGON jednostki lokalne

Pricing

from $10.00 / 1,000 premises

Go to Apify Store
Polish Premises Prospector - REGON jednostki lokalne

Polish Premises Prospector - REGON jednostki lokalne

Build prospect lists at the physical-site level from the Polish REGON registry. One row per premise (jednostka lokalna) with industry (PKD), address, company age, and ownership, filterable by region (TERYT).

Pricing

from $10.00 / 1,000 premises

Rating

0.0

(0)

Developer

getregdata

getregdata

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

10 days ago

Last modified

Share

Polish Premises Prospector - Site-Level REGON (jednostki lokalne)

Build B2B prospect lists in Poland at the individual physical-site level - one row per premise (jednostka lokalna), not just the registered company HQ. For each company you seed, this actor expands it into every physical location registered in the official GUS REGON registry, with the attributes you need to filter and prioritize:

  • Industry - per-site PKD activity code + name (each premise can have its own activity)
  • Address - full readable address (województwo, powiat, gmina, city, street, number, postcode)
  • Region - TERYT symbols for precise filtering by voivodeship / county
  • Company age - registration / start date of both the site and the parent company
  • Ownership - parent legal form and ownership form (e.g. private, foreign, State Treasury)
  • Headcount (optional, company-level proxy) - see "Headcount" below

Output is one row per premise plus the company HQ, ready for CSV/Excel/API.

Why site-level?

REGON registers local units (jednostki lokalne) separately from the legal entity. Each gets its own 14-digit REGON, its own address, and its own PKD activity. A bank has hundreds of branches; a manufacturer has plants and warehouses at different addresses. Filtering the legal entity alone only ever returns the HQ - this actor returns the actual workplaces.

How it works

seed identifiers ──▶ public GUS REGON registry ──▶ entity report + entity PKD
(anonymous, no API key) + local-unit list (every premise)
+ per-site report + PKD
──▶ one row per premise (HQ + jednostki lokalne)
──▶ filter by region (TERYT) and site type (PKD)
──▶ optional per-site headcount overlay (footprint / LinkedIn)

No API key required. This actor reads the public REGON registry anonymously (it does not use the registered BIR web-service key). You seed the companies to examine (see Discovery), and the actor expands each into its premises and filters them.

Input

FieldDescription
seedIdentifiersList of NIP / REGON / KRS (auto-detected by shape).
seedDatasetIdApify dataset ID to read identifiers from (chain in a regional search / KRS export).
seedIdTypeHow to read 10-digit ids (auto / nip / regon / krs). NIP and KRS are both 10 digits.
regionsPresets warszawa, lodz, kutno, or { "woj": "10", "powiat": "03" }. Empty = all Poland.
siteTypesfactory, warehouse, clinic, school, coliving. Empty = all industries.
pkdCodesExplicit PKD prefixes, e.g. "86", "8610", "5210B".
fetchSitePkdFetch each premise's own PKD (one extra call/premise). Off = inherit company PKD.
includeHeadcountAttach a per-site headcount signal (footprint for industrial, LinkedIn for white-collar; see below).
headcountMethodauto (default), footprint, linkedin, or none.
maxResults, minIntervalMsOutput cap and politeness delay between requests.

Discovery: how to seed a regional list

The actor works on real public registry data with no key or sandbox to worry about. Get your seed universe from one of:

  1. The companion polish-regon-scraper actor (recommended) - run a by-address / by-PKD discovery there, then pass its dataset id as seedDatasetId. This actor reads nip/regon/krs from each item. The two actors share the same anonymous REGON client.
  2. A KRS export or your own prospect list - drop the identifiers into seedIdentifiers.
  3. A paid GUS bulk data order - the cleanest way to get the full local-unit universe for a region (statistical confidentiality applies, so no headcount).

Headcount (read this)

No source publishes a verified per-site headcount for Poland - REGON's employment figure is legally confidential. So instead of pretending otherwise, every enriched row carries a headcountBasis that tells you exactly how much of the number is sourced vs modeled:

headcountBasisWhat it meansMethodBest for
reported-proxyA real, sourced count (a tally of actual LinkedIn profiles for the company in that city)LinkedIn employees-by-cityOffices, clinics, schools, universities
modeled-estimateA number we compute (building footprint ÷ employment density, ±50-100%)OSM footprintFactories, warehouses, DCs
(none, unknown)No signal available--

headcountMethod=auto routes white-collar site types to the LinkedIn city-count (the primary, sourced signal). For each company we resolve the canonical LinkedIn page before counting: first by name (harvestapi/linkedin-company, validated by website-domain / name overlap), then - if the legal name is too far from the brand - by website domain (s-r/free-linkedin-company-finder, e.g. pkobp.pl -> /company/pko-bp). A low-confidence match is rejected (-> unknown) rather than returned as a wrong number. We then count employees by the resolved URL, filtered to the site's city. Caveats: it's a per-city count (not split between two sites in the same city) and undercounts blue-collar sites. Counting uses harvestapi/linkedin-company-employees at ~$0.02/company-city (the count is read from the run log, so only 1 profile is scraped); resolution adds ~$0.003-0.004/company. Bound LinkedIn spend with maxEnrich.

For industrial site types (factory/warehouse/DC), where LinkedIn is blind, auto routes to the building-footprint method - the only realistic per-site number. It geocodes the site address (OSM Nominatim), reads the building's floor area from OpenStreetMap (Overpass), and divides by an employment-density factor (warehouse ~80, manufacturing ~40 m²/FTE; UK Employment Densities Guide) × building levels. The result is flagged modeled-estimate with a ±50-100% band (headcountLow/headcountHigh) and a caveat, because the floor area is sourced but the people figure is computed, not reported. It uses only free OSM APIs (no Apify helper-actor cost); if no building can be resolved at the address, the row is left as unknown rather than guessed. Per OSM usage policy the actor self-throttles to ≤1 request/second and sends a descriptive User-Agent (footprintUserAgent). Data © OpenStreetMap contributors (ODbL).

Output (one row per premise)

regon14, parentRegon, nip, krs, isHeadquarters, name, companyName, addressText, voivodeship, powiat, gmina, city, postalCode, street, buildingNumber, apartmentNumber, terytWoj/terytPowiat/terytGmina, pkdMainCode, pkdMainName, pkdAll, siteStartDate, companyRegistrationDate, companyAgeYears, legalForm, ownershipForm, website, localUnitsCount, headcountEstimate, headcountLow, headcountHigh, headcountBasis, headcountMethod, headcountSource, headcountConfidence, linkedinCityCount, headcountCaveat, scrapedAt, dataSource.

Coverage notes

  • Strong: factories (PKD 10-33), warehouses/distribution (52, 46), private clinics (86), private schools/universities (85). These commonly register premises as local units.
  • Weak: co-living - buildings are not entities; only the operator's declared premise (PKD 55/68) may appear. Expect partial coverage.
  • Only legal entities (Typ = P) are expanded. Sole-proprietor businesses are skipped.

Permissions

This Actor requests full permissions because its optional LinkedIn headcount enrichment orchestrates third-party LinkedIn Actors (harvestapi/linkedin-company, harvestapi/linkedin-company-employees), and calling them requires it. The Actor only runs those Actors and writes its own dataset - it does not read or modify any other data in your account. If you don't use LinkedIn headcount, the core REGON extraction and the free OSM building-footprint headcount work the same.

Source

Data: GUS REGON BIR (official Polish statistical business registry). This actor uses the registry's JSON ajaxEndpoint interface.