Build B2B lead lists or CRM enrichment from France’s official company register. Search by pasted URL (Pappers or data.gouv) with NAF, region, and department filters—or paste SIRENs for full records. Returns identity, address, directors, headcount band, and more via recherche-entreprises.api.gouv.fr.
All notable changes to this Actor are documented in this file.
The format follows Keep a Changelog . Actor versions use MAJOR.MINOR (Apify convention), aligned with .actor/actor.json.
[2.0] - 2026-04-29
Added
Auto-split by département when results are capped — the API hard-caps results at 10 000 (Elasticsearch limit). When a query hits this ceiling, the Actor now automatically re-runs the same query once per French département (101 sub-queries), aggregates results, and deduplicates by SIREN. Queries that were previously silently truncated now return the full dataset, even for broad filters like "all active companies with 500+ employees" (5 500 companies).
Parallel pagination — pages are now fetched in concurrent batches (default: 3 pages at a time, staggered by 1.5 s) instead of sequentially. A query that previously took ~33 min now completes in ~14 min on the same dataset.
Live ETA + progress % — every page log now includes the completion percentage and a dynamically recalibrated ETA based on actual elapsed time (e.g. Page 120/400 — 3000/10000 companies · 30.0% · ETA 11m 54s).
Network-error retries — fetchWithRetry now catches thrown errors from undici/Node.js (UND_ERR_*, ETIMEDOUT, ECONNRESET, socket hang up, etc.) and applies the same exponential backoff as HTTP 429. Previously these errors bubbled up uncaught and terminated the run at page 8.
Per-request AbortSignal.timeout (default 55 s) — each fetch attempt gets its own fresh signal. If the API server hangs, the request is abandoned and retried cleanly instead of waiting for undici's 30 s default, which previously caused silent run failures.
Run summary log line — on completion the Actor now logs total unique companies, skipped pages (after retries exhausted), and wall-clock time: [search] Finished — 5500 unique companies · 14m 23s.
Full URL filter coverage — urlParser.js now maps every parameter the official API supports:
annuaire-entreprises.data.gouv.fr URLs: etat=A → etat_administratif, naf → activite_principale, terme → q, sap → section_activite_principale, cp_dep (with cp_dep_type), fn/n/dmin/dmax for director search, label and type for structure/certification flags.
tranche_effectif_salarie from repeated URL params → single comma-separated API parameter (the format the API actually accepts).
16 boolean flags: est_bio, est_rge, est_qualiopi, est_siae, est_societe_mission, est_service_public, est_l100_3, est_patrimoine_vivant, est_alim_confiance, est_achats_responsables, egapro_renseignee, convention_collective_renseignee, and more.
src/lib/departments.js — static list of all 101 French département codes (01–95, 2A/2B, 971–976) used by the auto-split logic.
53 unit tests (up from 5) covering: HTTP retry logic (network errors, 429, Retry-After, backoff), normalizeSiren, all URL sources (recherche-entreprises, annuaire-entreprises, Pappers), every filter in buildParams, and a full round-trip URL → API params assertion.
Fixed
categorie_entreprise= (empty string in URL) no longer gets passed to the API as a blank filter.
tranche_effectif_salarie sent as multiple ¶m=X¶m=Y URL params (how browsers encode them) was previously ignored; now correctly joined to "41,42,51,52,53" which is the format the API expects.
etat=A from annuaire-entreprises.data.gouv.fr URLs was not mapped — queries using that site's URL as input got no active-only filter applied.
Changed
SEARCH_DELAY_BETWEEN_PAGES_MS renamed to SEARCH_DELAY_BETWEEN_BATCHES_MS; SEARCH_CONCURRENCY and SEARCH_STAGGER_MS added to runtimeDefaults.js.
HTTP_MAX_RETRIES stays at 12; HTTP_REQUEST_TIMEOUT_MS (55 000 ms) added.
[1.9] - 2026-04-28
Added
Near location mode (nearPoint): new input mode that calls the /near_point API endpoint. Retrieve all companies within a circular area — e.g. all active restaurants within 5 km of Lyon. Same pagination, retry logic, and deduplication as the search mode.
City dropdown (nearCity): 44 pre-loaded French cities (Paris, Marseille, Lyon, …). Coordinates are resolved internally — users never need to enter GPS numbers. Selecting "Custom coordinates" reveals latitude / longitude fields for advanced use.
buildNearParams helper in src/lib/searchParams.js for constructing /near_point query strings.
runNearSearch function in src/search.js — same pagination and rate-limit resilience as runSearch.
Input form layout: geographic sections (Near a location — coordinates, Narrow results) moved to the bottom of the form so the default Search URL and SIREN modes are unaffected for existing users.
nearPoint placed last in the mode enum order so the Console dropdown defaults to the existing working modes.
Updated actor.json title, description, and seoTitle / seoDescription to surface the new geographic capability.
[1.8] - 2026-04-28
Fixed
Search mode resilience: each search URL is now wrapped in an individual try/catch. A rate-limit exhaustion or network error on one URL no longer crashes the whole run — the actor logs the error, saves all results collected so far, and continues with the next URL. Previously a single failing API call mid-run would mark the entire run as FAILED even when 90%+ of data had already been saved. This was the primary driver of the ~30% failure rate.
SIREN lookup reliability (fetchBySiren): replaced the /search?q=siren&per_page=5 text-search with a two-pass strategy (per_page=25, then per_page=50). Match comparison is now normalized (trim + 9-digit pad on both sides). Removed the previous results.length === 1 → results[0] shortcut which silently returned the wrong company when the API returned an unrelated single result.
Timeout warning for large queries: after fetching the first page, the actor estimates total runtime and logs a WARNING when it exceeds ~45 min, advising users to raise the run timeout or cap maxResults before hitting the 1-hour default wall.
Added
Unit tests for fetchBySiren (5 new cases): exact match on first pass, fallback to second pass, not-found on both passes, regression guard for the old single-result false positive, and whitespace-normalization in API response SIREN.
[1.7] - 2026-04-04
Changed
Benefit-first copy (Console, Store, README): user-facing text emphasizes outcomes (leads, enrichment, export) instead of paging, retries, or rate limits. Documented as a required rule in the perfect-apify-actor skill.
input_schema: Reworded titles and descriptions; maxResults field title is now “How many companies (search)”.
[1.6] - 2026-04-04
Changed
Input UI: Removed rate-limit fields from input_schema.json (no “Rate limiting & performance” section). Paging delay, enrich concurrency/stagger, and HTTP retries are centralized in src/lib/runtimeDefaults.js with conservative defaults.
Input mode dropdown: enum had four values (searchUrl, sirens, search, enrich) but only two enumTitles, so the Console showed raw search and enrich. Schema now exposes two modes only; main.js normalizes legacy search → searchUrl and enrich → sirens after input load.
[1.4] - 2026-04-04
Changed
Title alignment: README H1, input_schema.json title, seoTitle, and output_schema.json title now match the Store actor.json positioning (same “French Companies — Search & SIREN Enrichment (Official INSEE API)” line; SEO uses the same naming without “Scraper” vs README drift).
[1.3] - 2026-04-04
Changed
Input schema (default + prefill): Conservative defaults so Console, API, and platform health runs finish quickly and avoid HTTP 429 — maxResults25, delayBetweenPages3, maxConcurrency2, delayBetweenRequests2. Explicit prefill on searchUrls and sirens for the UI; default is what integrations/API use when a field is omitted.
input.json: Matches schema defaults for local testing.
README: Parameter table and notes aligned with Apify prefill vs default behaviour.
[1.2] - 2026-04-04
Changed
Local input: one root input.json for all local tests; removed separate input-search*.json and input-enrich.json files. README documents switching mode for search vs enrich.
[1.1] - 2026-04-04
Added
SEO & marketplace metadata: seoTitle and seoDescription in actor.json for Apify Store search snippets.
Input schema: sectionCaption / sectionDescription for rate-limit and performance fields; clearer grouping in the Console.
Run status (cloud): Actor.setStatusMessage updates during search and enrich runs when executing on Apify.
Changed
User-facing copy (English): All run logs, warnings, errors, and default dataset error text are now English-only for Store and support consistency.
README: Restructured for the Apify marketplace (quick start, parameters table, performance, FAQ, legal, support).
actor.json: Title and description tuned for keywords (SIREN, INSEE, lead generation) while staying accurate.
Categories: Order adjusted to LEAD_GENERATION, BUSINESS, DATA_EXTRACTION.
Fixed
Rate limiting: Shared HTTP retry layer with backoff, Retry-After support, and configurable delayBetweenPages / rateLimitMaxRetries documented in README and input schema.
[1.0] - 2026-02-25
Added
Search URL mode: Paste URLs from recherche-entreprises.data.gouv.fr or Pappers; filters are parsed automatically (urlParser.js).
SIREN enrichment mode: Parallel batches with delays and ETA logging.
HTTP resilience: Retries on HTTP 429 / 503 for both search and enrich.
Transform pipeline: transform.js maps API payloads to flat dataset rows (directors, NAF labels, TVA, financials).
CSV: Optional output.csv written during the run (local).
Unit tests: tests/ for URL parsing, HTTP helpers, SIREN normalization, and mocked search pagination.
Changed
Project layout aligned with Apify conventions; .actor/ schemas and actor.json wired for Console and Store.
[0.2.1] - 2026-02-25
Added
searchUrl: URL-based input with Pappers mapping (ville → commune, en_activite → active status).