- Clean standby-mode shutdown:
run_standby_mode() now registers its own abort handler that shuts down the standby HTTP server before calling Actor.exit(). This ensures the serve_forever() executor future exits cleanly instead of being cancelled during actor shutdown, avoiding a Python 3.13 recursion bug in Task.cancel() that could crash the actor when cancelling deeply nested task chains. A CancelledError fallback also shuts down the server if the task is cancelled externally. The shared abort handler was removed from main(); standby mode now manages HTTP server shutdown independently while regular mode keeps the simpler exit handler.
- OpenAPI schema for Standby HTTP API: Added
.actor/web_server_schema.json (OpenAPI 3.0.0) describing the actor's standby endpoints. Covers the POST / search request with all parameters, enums, defaults, and three request examples (by name, register number, keyword+location); the full 200 response schema including all output fields; and distinct error examples for each status code (400 validation errors, 402 quota limit, 404 no results, 500 scraping failures). Referenced via webServerSchema in .actor/actor.json so Apify Console surfaces it automatically.
- Proxy rotation on retries: Each retry attempt now requests a unique Apify proxy session (
session_id parameter) to guarantee a different exit IP instead of relying on pool default behaviour.
- Run timeout awareness: The retry loop reads the Apify run deadline (
Actor.configuration.timeout_at) once at startup and skips any retry whose wait + request budget would exceed the remaining time, failing early with a clear log message instead of being killed silently by the platform. If the configured timeout is under the recommended 300 s, a visible INFO warning is emitted at startup.
- Retry count reduced from 5 to 4: Fits the retry budget more comfortably within typical timeout values.
- Incompatible proxy group warning: A
ProxyError caused by using the GOOGLE_SERP proxy group (which only routes to Google domains) now emits an explicit warning advising the user to switch to RESIDENTIAL or DATACENTER.
- Clean actor exit on scraping failure: Unhandled network errors and scraping failures in regular mode now call
Actor.fail() with a readable status message instead of crashing with a raw exception traceback and exit code 91.
- SI link detection after website change: The Handelsregister website changed the SI (Structured register content) download link from using
property:'Global.Dokumentart.SI' in the onclick attribute to a PrimeFaces.monitorDownload format (same as AD/CD/HD links). The parser now identifies the SI link by its visible span text "SI" instead of the onclick pattern, with the old pattern kept as a fallback.
- Standby mode dataset limit check: Fixed AttributeError in
_check_dataset_limit() by using correct Apify SDK method get_metadata() instead of the non-existent get_info(). This resolves 500 Internal Server Error when checking dataset quota limits in standby mode.
- Proxy connection reset handling: Added
ProxyError to the retry exception handling in the search loop. Transient proxy connection resets (595 ECONNRESET) now trigger automatic retries with exponential backoff instead of immediately crashing the actor.
- Improved retry logic for proxy-based requests: Enhanced the search retry mechanism to request a fresh proxy URL (new residential IP) on each retry attempt instead of reusing the same blocked IP. Increased retry attempts from 3 to 5 and implemented exponential backoff with jitter (4-35 seconds) to better handle server disconnections and rate limiting.
- Proxy configuration support: Added support for Apify proxy configuration to hide request origins and improve reliability for high-frequency scraping. Configurable via
proxyConfiguration input parameter with defaults set to Residential proxies from Germany.
- Comprehensive legal form support: Enhanced parser to extract decision-makers from all major German legal forms:
- e.K. (Einzelkaufmann): Inhaber (owners)
- AG (Aktiengesellschaft): Vorstand (board members), Vorstandsvorsitzender (chairmen)
- SE (Europäische Gesellschaft): Geschäftsführender Direktor (managing directors)
- GmbH/OHG: Gesellschafter (shareholders/partners)
- Plus existing support for GmbH Geschäftsführer, KG Komplementäre/Kommanditisten
- Unified decision-makers field: All persons with representation authority are now consolidated under
vertretungsberechtigte with German role names for simplified lead generation.
- Role name lookup: Integration with "GDS.Rollenbezeichnung" codelist for accurate German role designations.
- Simplified output structure: Consolidated all decision-makers under single
vertretungsberechtigte field instead of separate role-specific fields (geschaeftsfuehrer, inhaber, vorstand, etc.).
- Numeric register numbers:
laufende_nummer now contains only the numeric portion (e.g., "8438" instead of "HRA 8438 P") for better data processing.
- Register information extraction: Enhanced parser to check multiple XML sources (registereintragung, aktenzeichen) with intelligent fallback logic.
- e.K. company support: Fixed missing register information for sole proprietorships and other HRA entities.
- Data quality: More accurate extraction of company decision-makers across all legal forms.
- Input validation for "mindestens ein Schlagwort enthalten." mode: Added validation to prevent submission errors when using the "at least one keyword" search option without required additional filters. The scraper now validates that when using
schlagwoerter_suchoptionen = "mindestens ein Schlagwort enthalten.", at least one of registerart, registergericht, or registernummer must be provided, as required by the Handelsregister website.
- Improved error messages: Users now receive a clear 400 Bad Request error explaining the missing required fields instead of encountering a cryptic German error message from the Handelsregister website.
- Updated documentation: Added warnings in both the input schema description and README to inform users about the additional filter requirement for the "mindestens ein Schlagwort enthalten." search mode.
The Handelsregister Scraper actor has been released as a stable version following a successful pre-release testing period. This actor provides reliable, real-time access to German Commercial Register (Handelsregister) data through a HTTP API running in standby mode.
- Real-time API with standby mode: Sub-second response times with always-on availability and automatic scaling.
- Flexible search capabilities: Search by company name/keywords or direct register number lookup with configurable search strategies.
- Phonetic search support: Optional fuzzy matching for similar-sounding terms (e.g., "Meyer" matches "Meier", "Mayer").
- Comprehensive data extraction: Complete company information including legal form, address, business purpose, capital structure, managing directors, authorized officers, partners, and limited partners with liability contributions.
- Raw XML access: Optional storage of original XJustiz XML files in the Apify Key-Value Store for advanced processing.
- Automatic data storage: All results automatically saved to Apify Dataset for easy retrieval and analysis.
- Intelligent error handling: Automatic court name validation with fuzzy matching and helpful suggestions.
- Professional HTTP status codes: Proper error responses (400, 401, 402, 404, 500) for integration reliability.