Extract comprehensive French company data from data-gouv.fr. Search companies using filters (activity, creation date, revenue, location) with automatic pagination. Enriches data with additional information from annuaire-entreprises.data.gouv.fr including legal details, directors, and financial data.
searchUrl: paste a URL from recherche-entreprises.data.gouv.fr or Pappers — filters are extracted automatically (ville→commune, en_activite→etat_administratif)
src/urlParser.js: parses gouv and Pappers URL params into API filters
commune: API filter added (INSEE commune codes, e.g. 74160 for Annecy)
UI: input schema restructured in English (Step 1 — What do you have? etc.)
[0.2.0] - 2026-02-25
Added
Mode enrich: direct enrichment of a SIREN list without search — ideal when SIRENs are already known
Mode search: paginated search by filters (NAF codes, department, region, size, status...) with full automatic pagination
src/transform.js: centralized raw API data → structured output transformation
src/search.js: paginated search engine with progress logging
src/enrich.js: concurrent enrichment by SIREN list with ETA
src/naf.js: full NAF 2008 code → INSEE official label mapping (~700 entries)
tva_intracommunautaire: computed from SIREN (official formula)
forme_juridique: label resolved from INSEE nature_juridique code
effectif_salarie: readable label from tranche_effectif_salarie code
libelle_activite_principale: NAF label resolved from APE code
Directors: full extraction (PP and PM) with flat columns dirigeant_1..5, role_1..5, dirigeant_N_date_naissance, dirigeant_N_nationalite
output.csv: progressive CSV write during run
.actor/input_schema.json: structured input schema with sections
.actor/output_schema.json: "Companies" and "Directors" views in Output tab
.actor/actor.json: full Actor configuration
.apifyignore: excludes dev/test files from Apify build
Changed
Project structure reorganized to Apify conventions (orias-scraper / airbnb-professional-host-scraper)