Changelog
[v0.3.0-alpha] - 2025-07-17
Added:
Handelsregister number and court extraction from imprint pages.
Graceful shutdown handling with signal handlers (SIGINT, SIGTERM).
Health check system for monitoring actor responsiveness.
Semaphore-based concurrency control to limit simultaneous requests.
Enhanced HTTP client timeout configuration.
Fixed:
Critical bug where actor would hang indefinitely when URL processing timeout was reached.
Changed:
Enhanced logging for better debugging and monitoring.
[v0.2.3-alpha] - 2025-06-24
Added:
Timeout to automatically skip URLs that take too long to process.
Added URL validation to filter out malformed URLs.
Error loggings for unsuccessfully processed URLs can now be included it the output.
[v0.2.2-alpha] - 2025-05-02
Changed:
Extracted Python directory for looking up German postal codes and cities.
Emails are now sorted based on an algorithm that determents their relevance.
[v0.2.1-alpha] - 2025-05-02
Changed:
Improvements to the extraction of addresses and emails.
Fixed:
Doing the email extraction the script didn't properly filter Unicode encoded characters.
[v0.2.0-alpha] - 2025-04-24
Added:
Search for social media links.
Changed:
Improved performance of the decision maker extraction.
[v0.1.1-alpha] - 2025-04-17
Changed:
Default settings: Decision Makers Search is now set as activated (true
) in the default input settings.
Removed:
Input max_dept
option removed, since changes by the end user is not required for this actor's functionality.
Fixed:
Decision maker search functionality is now working.
[v0.1.0-alpha] - 2025-04-14
Added:
Initial release of the German Imprint Scraper.
Extracts Company Name, Address, Phone, Email from Imprint pages.
Optional extraction of Decision Makers.