Build B2B lead lists or CRM enrichment from France’s official company register. Search by pasted URL (Pappers or data.gouv) with NAF, region, and department filters—or paste SIRENs for full records. Returns identity, address, directors, headcount band, and more via recherche-entreprises.api.gouv.fr.
All notable changes to this Actor are documented in this file.
The format follows Keep a Changelog . Actor versions use MAJOR.MINOR (Apify convention), aligned with .actor/actor.json.
[2.1] - 2026-05-16
Changed
Director age filter (dirigeantAgeMin / Max): applies to the primary director only (dirigeant_1). Rows where that birth year is missing or masked as [NON-DIFFUSIBLE] are excluded when either bound is set — every kept row has a verifiable age.
Natural persons only (requireDirigeantPhysique): the primary director must have a real numeric birth year in the register; [NON-DIFFUSIBLE] no longer counts as a usable birth date.
[2.0] - 2026-05-16
Added
Parallel region sharding: when a date filter is active, a single Pappers URL is automatically split into 13 regional shards (one per French metropolitan region) that run in parallel — reducing wall-clock time from ~17 min to the time of the slowest shard.
forme_juridique mapping from Pappers URLs: forme_juridique=XXXX in a Pappers URL is now parsed and forwarded to the API as nature_juridique (e.g. 5710 for SAS).
effectifs_min / effectifs_max mapping from Pappers URLs: converted to a post-filter using the tranche_effectif_salarie INSEE tranche codes.
Director age filter (dirigeantAgeMin / dirigeantAgeMax): post-filters results using director birth years from the register. Exposed as dedicated fields in the Console input form.
Natural person filter (requireDirigeantPhysique, default true): excludes companies whose primary director has no birth date — holdings, SCI, and other legal entities acting as director.
Director presence filter (requireDirigeant, default true): excludes companies with no named director in the register.
Company age filter (maxCompanyAgeYears): keeps only companies created within the last N years, recomputed from today's date on every run — no manual date update needed.
Size category exclusion (excludeCategories, default ["GE","ETI"]): excludes large groups (GE) and mid-market companies (ETI) by default, keeping PME and unclassified structures. Prevents large chains from polluting lead lists.
nature_juridique API filter: forwarded to the recherche-entreprises API to narrow the result pool before pagination.
Changed
Near mode: removed custom latitude / longitude inputs — pick a city from the dropdown only (API still searches around that centre + radius).
nearPoint and sirens modes now apply the same post-filters as Search URL: creation-date window, requireDirigeant, requireDirigeantPhysique, excludeCategories, and director age bounds.
SIREN enrichment: the enrichment callback can return false to drop an API-matched company after post-filters; run logs report filtered separately from found / notFound.
Fixed
Search URL: creation-date bounds from input / maxCompanyAgeYears and from Pappers URL (date_creation_min / max) are merged by intersection (strictest minimum and strictest maximum), instead of one source silently overriding the other.
transform.js: corrected nature_juridique code 5710 label from "SARL" to "SAS" — matches the actual SIRENE/Pappers data where 5710 = Société par actions simplifiée.
[1.9] - 2026-04-28
Added
Near location mode (nearPoint): new input mode that calls the /near_point API endpoint. Retrieve all companies within a circular area — e.g. all active restaurants within 5 km of Lyon. Same pagination, retry logic, and deduplication as the search mode.
City dropdown (nearCity): 44 pre-loaded French cities (Paris, Marseille, Lyon, …). Coordinates are resolved internally. Optional manual latitude / longitude inputs were added alongside this release and removed in v2.0 (dropdown-only centres).
buildNearParams helper in src/lib/searchParams.js for constructing /near_point query strings.
runNearSearch function in src/search.js — same pagination and rate-limit resilience as runSearch.
Input form layout: geographic sections (Near a location — coordinates, Narrow results) moved to the bottom of the form so the default Search URL and SIREN modes are unaffected for existing users.
nearPoint placed last in the mode enum order so the Console dropdown defaults to the existing working modes.
Updated actor.json title, description, and seoTitle / seoDescription to surface the new geographic capability.
[1.8] - 2026-04-28
Fixed
Search mode resilience: each search URL is now wrapped in an individual try/catch. A rate-limit exhaustion or network error on one URL no longer crashes the whole run — the actor logs the error, saves all results collected so far, and continues with the next URL. Previously a single failing API call mid-run would mark the entire run as FAILED even when 90%+ of data had already been saved. This was the primary driver of the ~30% failure rate.
SIREN lookup reliability (fetchBySiren): replaced the /search?q=siren&per_page=5 text-search with a two-pass strategy (per_page=25, then per_page=50). Match comparison is now normalized (trim + 9-digit pad on both sides). Removed the previous results.length === 1 → results[0] shortcut which silently returned the wrong company when the API returned an unrelated single result.
Timeout warning for large queries: after fetching the first page, the actor estimates total runtime and logs a WARNING when it exceeds ~45 min, advising users to raise the run timeout or cap maxResults before hitting the 1-hour default wall.
Added
Unit tests for fetchBySiren (5 new cases): exact match on first pass, fallback to second pass, not-found on both passes, regression guard for the old single-result false positive, and whitespace-normalization in API response SIREN.
[1.7] - 2026-04-04
Changed
Benefit-first copy (Console, Store, README): user-facing text emphasizes outcomes (leads, enrichment, export) instead of paging, retries, or rate limits. Documented as a required rule in the perfect-apify-actor skill.
input_schema: Reworded titles and descriptions; maxResults field title is now “How many companies (search)”.
[1.6] - 2026-04-04
Changed
Input UI: Removed rate-limit fields from input_schema.json (no “Rate limiting & performance” section). Paging delay, enrich concurrency/stagger, and HTTP retries are centralized in src/lib/runtimeDefaults.js with conservative defaults.
Input mode dropdown: enum had four values (searchUrl, sirens, search, enrich) but only two enumTitles, so the Console showed raw search and enrich. Schema now exposes two modes only; main.js normalizes legacy search → searchUrl and enrich → sirens after input load.
[1.4] - 2026-04-04
Changed
Title alignment: README H1, input_schema.json title, seoTitle, and output_schema.json title now match the Store actor.json positioning (same “French Companies — Search & SIREN Enrichment (Official INSEE API)” line; SEO uses the same naming without “Scraper” vs README drift).
[1.3] - 2026-04-04
Changed
Input schema (default + prefill): Conservative defaults so Console, API, and platform health runs finish quickly and avoid HTTP 429 — maxResults25, delayBetweenPages3, maxConcurrency2, delayBetweenRequests2. Explicit prefill on searchUrls and sirens for the UI; default is what integrations/API use when a field is omitted.
input.json: Matches schema defaults for local testing.
README: Parameter table and notes aligned with Apify prefill vs default behaviour.
[1.2] - 2026-04-04
Changed
Local input: one root input.json for all local tests; removed separate input-search*.json and input-enrich.json files. README documents switching mode for search vs enrich.
[1.1] - 2026-04-04
Added
SEO & marketplace metadata: seoTitle and seoDescription in actor.json for Apify Store search snippets.
Input schema: sectionCaption / sectionDescription for rate-limit and performance fields; clearer grouping in the Console.
Run status (cloud): Actor.setStatusMessage updates during search and enrich runs when executing on Apify.
Changed
User-facing copy (English): All run logs, warnings, errors, and default dataset error text are now English-only for Store and support consistency.
README: Restructured for the Apify marketplace (quick start, parameters table, performance, FAQ, legal, support).
actor.json: Title and description tuned for keywords (SIREN, INSEE, lead generation) while staying accurate.
Categories: Order adjusted to LEAD_GENERATION, BUSINESS, DATA_EXTRACTION.
Fixed
Rate limiting: Shared HTTP retry layer with backoff, Retry-After support, and configurable delayBetweenPages / rateLimitMaxRetries documented in README and input schema.
[1.0] - 2026-02-25
Added
Search URL mode: Paste URLs from recherche-entreprises.data.gouv.fr or Pappers; filters are parsed automatically (urlParser.js).
SIREN enrichment mode: Parallel batches with delays and ETA logging.
HTTP resilience: Retries on HTTP 429 / 503 for both search and enrich.
Transform pipeline: transform.js maps API payloads to flat dataset rows (directors, NAF labels, TVA, financials).
CSV: Optional output.csv written during the run (local).
Unit tests: tests/ for URL parsing, HTTP helpers, SIREN normalization, and mocked search pagination.
Changed
Project layout aligned with Apify conventions; .actor/ schemas and actor.json wired for Console and Store.
[0.2.1] - 2026-02-25
Added
searchUrl: URL-based input with Pappers mapping (ville → commune, en_activite → active status).