German Imprint Scraper with Decision Makers Names Extraction

Pricing

from $2.20 / 1,000 results

German Imprint Scraper with Decision Makers Names Extraction

An Actor that automatically locates and scrapes key contact details from German website imprint pages (Impressum). It extracts information such as company name, address, phone numbers, emails, and decision-makers (Entscheider, Entscheidungsträger)

Pricing

from $2.20 / 1,000 results

Rating

4.1

(2)

Developer

Dominic M. Quaiser

Maintained by Community

Actor stats

Bookmarked

325

Total users

Monthly active users

7.2 days

Issues response

9 days ago

Last modified

[v0.8.0—beta] — 2025-12-24

Added

VAT ID extraction: New extract_vat_id module with ISO/IEC 7064 MOD 11,10 checksum validation. Extracts German VAT IDs (Umsatzsteuer-Identifikationsnummer) from imprint pages with context-aware scoring and support for various formatting styles (e.g., DE 123 456 789, DE-123-456-789).

[v0.7.0—beta] — 2025-12-22

Added

Fax number extraction: New extract_fax_numbers module with intelligent pattern matching, context validation, and priority scoring. Extracts up to 10 ranked fax numbers from German imprint pages with support for international formats and address proximity detection.

[v0.6.1—beta] — 2025-11-21

Added

Discount Tier Pricing System: Implemented comprehensive pricing manager with support for four discount tiers (FREE, BRONZE, SILVER, GOLD).
- Automatically detects user's discount tier from APIFY_ACTOR_PRICING_TIER environment variable.
- Dynamic pricing adjustments based on tier with detailed debug logging.
- New PricingManager class in src/utilities/pricing_manager.py.

Changed

Pricing Integration: Replaced charging_manager with pricing_manager throughout the codebase for discount tier support.

Removed

Automatic Billing Events: Removed manual charging for events now handled automatically by Apify:
- actor-start: Now charged automatically by Apify platform.
- successful-result: Now charged automatically by Apify platform.
Removed unused charge_event import from result_handler.py.

[v0.6.0—beta] — 2025-11-15

Changed

Headless Browser Configuration: Replaced the binary usePlaywright toggle with a three-mode headlessBrowser option offering granular control over fetching strategy:
- headlessBrowserOn: Always use browser (most reliable for JavaScript-heavy sites)
- headlessBrowserAuto: Automatic mode with HTTP first, browser fallback (default, recommended)
- headlessBrowserOff: HTTP only, no browser (fastest, but may fail on dynamic sites)
Backward Compatibility: The deprecated usePlaywright=true setting now acts as an override, forcing headlessBrowserOn mode when explicitly set.

Removed

Optional Error Output: Removed the optional error output that pushes URLs that failed to extract data into the dataset with their error message.

[v0.5.5—beta] — 2025-11-08

Changed

NER API Integration Update: Migrated to new model-specific endpoint /extract-names/german for improved accuracy.
Base URL Configuration: NER_API_URL environment variable now expects base URL only (e.g., https://ner-api.domain.net), endpoint path is automatically appended.
API Response Format: Updated to support new response structure with persons and raw_entities fields.

Added

Created .env.example template file for environment configuration.
Added ENV_SETUP.md with comprehensive documentation for NER API setup, URL construction, and troubleshooting.

Fixed

Improved confidence score extraction from raw_entities field with fallback to default value (0.8) when scores are missing.
Enhanced URL handling to automatically strip trailing slashes from base URLs.

[v0.5.4—beta] — 2025-09-12

Added

Added cost limit checking to automatically stop processing when the user-configured 'Maximum cost per run' is reached.

[v0.5.3—beta] — 2025-09-06

Changed

Asynchronous Handling: The main extract method now creates and runs all extraction tasks concurrently using asyncio.gather.
Small improvements in the company name extraction.

[v0.5.2—beta] — 2025-09-01

Changed

The phone number and email output is now limited to 10 results.

[v0.5.1—beta] — 2025-08-30

Changed

Added additional keywords for decision maker extraction.

[v0.5.0—beta] — 2025-08-28

Added

Migration support: when the server is migrated on Apify's side, the Actor now persists state across runs using Actor.set_value() and Actor.get_value().
The time the website was finished scraping (scraped_at) can now be found under the metadata output.

Changed

Slightly improved the decision maker extraction for better accuracy.
Moved imprint_url output from the metaData to the standard output.

Fixed

Bug in company name extraction that occasionally returned incorrect values.

[v0.4.0—beta] — 2025-08-27

This is a major update, marking the transition from alpha to the first beta release! The actor has been completely rewritten from the ground up to be more powerful, reliable, and flexible.

Added

Dual Fetching Technology: The actor can now use a fast HTTP-based method for simple sites and automatically fall back to a powerful headless browser (Playwright) for modern, JavaScript-heavy websites. This dramatically increases the success rate of finding and scraping imprint pages.
Selective Data Extraction: You now have full control over what data you want. A new input field fieldsToExtract allows you to choose the exact information you need (e.g., only company name and email).
Enhanced Configuration: New input options like metaData and errorOutput have been added to give you more insight and control over the scraping process.
Proxy Support: A proxy server provided by Apify can now be set in the input configuration.

Changed

Reliability Overhaul: The entire codebase has been refactored. This results in better stability and significantly more accurate data extraction.
Smarter Scraping Logic: The algorithms for identifying and parsing data have been completely reworked, leading to higher quality results across a wider variety of websites.
ML-Powered Decision Maker Extraction: The logic for identifying decision-makers has been upgraded from simple keyword matching to a sophisticated NER (Named Entity Recognition) model, resulting in much higher accuracy.
Redesigned Input: The actor's input configuration has been updated to be more intuitive and powerful, replacing the previous simple toggles with more granular controls.
Improved Output Structure: The output JSON is now more cleanly structured and provides additional context, such as confidence scores for certain data points.

[v0.3.0—alpha] — 2025-07-17

Added:

Handelsregister number and court extraction from imprint pages.
Graceful shutdown handling with signal handlers (SIGINT, SIGTERM).
Health check system for monitoring actor responsiveness.
Semaphore-based concurrency control to limit simultaneous requests.
Enhanced HTTP client timeout configuration.

Fixed:

Critical bug where actor would hang indefinitely when URL processing timeout was reached.

Changed:

Enhanced logging for better debugging and monitoring.

[v0.2.3—alpha] — 2025-06-24

Added:

Timeout to automatically skip URLs that take too long to process.
Added URL validation to filter out malformed URLs.
Error loggings for unsuccessfully processed URLs can now be included it the output.

[v0.2.2—alpha] — 2025-05-02

Changed:

Extracted Python directory for looking up German postal codes and cities.
Emails are now sorted based on an algorithm that determents their relevance.

[v0.2.1—alpha] — 2025-05-02

Changed:

Improvements to the extraction of addresses and emails.

Fixed:

Doing the email extraction the script didn't properly filter Unicode encoded characters.

[v0.2.0—alpha] — 2025-04-24

Added:

Search for social media links.

Changed:

Improved performance of the decision maker extraction.

[v0.1.1—alpha] — 2025-04-17

Changed:

Default settings: Decision Makers Search is now set as activated (true) in the default input settings.

Removed:

Input max_dept option removed, since changes by the end user is not required for this actor's functionality.

Fixed:

Decision maker search functionality is now working properly.

[v0.1.0—alpha] — 2025-04-14

Added:

Initial release of the German Imprint Scraper.
Extracts Company Name, Address, Phone, Email from Imprint pages.
Optional extraction of Decision Makers.

German Imprint Scraper

codescraper/german-imprint-scraper

A powerful Actor scraper to find and extract legal "Impressum" data from German websites. Get company names, addresses, decision-makers, legal IDs, and more, all automatically.

CodeScraper

5.0

German Imprint Scraper + Email Validation

winningsolutions/german-imprint-scraper

Smart Actor for German websites that detects Impressum pages, extracts company details, contact data, and verifies emails. Offers reliable scraping, structured JSON results, and robust performance for lead generation at scale.

Winning Solutions

5.0

German Imprint Scraper (Contact+Social Links)

codescraper/german-impressum-scraper-fast

Very fast actor, Get Impressum data for just $1.5/1000 Results. This powerful scraper finds any German impressum page and extracts key company data: companyName, address, registerNumber, taxId, emails, phones, socialLinks, and page metadata. Get clean, reliable B2B data in seconds.

CodeScraper

5.0

Decision Maker Name & Email Extractor

dominic-quaiser/decision-maker-name-email-extractor

An actor that crawls a website to identify key decision‑maker names and job titles, then uses NER‑powered matching to extract and pair their email addresses for streamlined lead generation and B2B data enrichment.

Dominic M. Quaiser

194

1.0

Impressum Page Scraper & Automation

alkausari_mujahid/impressum-page-scraper-automation

Alkausari M

Gelbe Seiten Scraper - German Business Leads & Company Data

plowdata/gelbe-seiten

Extract German business leads and company information from Gelbe Seiten (gelbeseiten.de). Collect emails, phone numbers, addresses, reviews, and rich listing data. Export to CSV, Excel, JSON, or integrate into automation workflows.

Frederic

255

4.9

German Jobs Scraper | Bundesagentur für Arbeit Data Collector

xtech/bundesagentur-fur-arbeit-job-scraper

Extract real-time job listings from Germany's Federal Employment Agency. Get structured data for 100,000+ jobs including titles, locations, and company details. Perfect for recruiters tracking market trends, job seekers, and researchers. Features daily updates and CSV/JSON exports. 4.9/5 rated.

Xtech

Arbeitsagentur Germany Job Details Scraper 🇩🇪

scrapestorm/arbeitsagentur-germany-job-details-scraper

🔍 Need detailed job data from listings in Germany? 📄 The Bundesagentur für Arbeit Job Details Scraper 💼 extracts full job descriptions, employer information, contract type, location, salary details, application links & more. Ideal for job databases, search enrichment, or recruitment platforms.

Storm_Scraper

5.0

Gelbe Seiten (German Yellow Pages) Scraper

dominic-quaiser/gelbe-seiten-german-yellow-pages-scraper

Scrape German business listings from Gelbe Seiten with flexible detail levels. This Apify Actor supports fast, basic, and deep search modes, rate limiting, proxy rotation, and index control. Ideal for lead gen, SEO, and market research. Outputs structured data to Apify datasets.

Dominic M. Quaiser

5.0

Gelbe Seiten Business Details Scraper

ecomscrape/gelbeseiten-business-details-scraper

Gelbeseiten.de Business Details Scraper extracts comprehensive German business data including contacts, ratings, reviews, photos, geo-location and more. Automates research, delivers structured JSON for market analysis, lead generation, competitive intelligence in Germany.

ecomscrape

German Imprint Scraper with Decision Makers Names Extraction

German Imprint Scraper with Decision Makers Names Extraction

Changelog

[v0.8.0—beta] — 2025-12-24

Added

[v0.7.0—beta] — 2025-12-22

Added

[v0.6.1—beta] — 2025-11-21

Added

Changed

Removed

[v0.6.0—beta] — 2025-11-15

Changed

Removed

[v0.5.5—beta] — 2025-11-08

Changed

Added

Fixed

[v0.5.4—beta] — 2025-09-12

Added

[v0.5.3—beta] — 2025-09-06

Changed

[v0.5.2—beta] — 2025-09-01

Changed

[v0.5.1—beta] — 2025-08-30

Changed

[v0.5.0—beta] — 2025-08-28

Added

Changed

Fixed

[v0.4.0—beta] — 2025-08-27

Added

Changed

[v0.3.0—alpha] — 2025-07-17

Added:

Fixed:

Changed:

[v0.2.3—alpha] — 2025-06-24

Added:

[v0.2.2—alpha] — 2025-05-02

Changed:

[v0.2.1—alpha] — 2025-05-02

Changed:

Fixed:

[v0.2.0—alpha] — 2025-04-24

Added:

Changed:

[v0.1.1—alpha] — 2025-04-17

Changed:

Removed:

Fixed:

[v0.1.0—alpha] — 2025-04-14

Added:

You might also like

German Imprint Scraper

German Imprint Scraper + Email Validation

German Imprint Scraper (Contact+Social Links)

Decision Maker Name & Email Extractor

Impressum Page Scraper & Automation

Gelbe Seiten Scraper - German Business Leads & Company Data

German Jobs Scraper | Bundesagentur für Arbeit Data Collector

Arbeitsagentur Germany Job Details Scraper 🇩🇪

Gelbe Seiten (German Yellow Pages) Scraper

Gelbe Seiten Business Details Scraper

Changelog

[v0.8.0—beta] — 2025-12-24

Added

[v0.7.0—beta] — 2025-12-22

Added

[v0.6.1—beta] — 2025-11-21

Added

Changed

Removed

[v0.6.0—beta] — 2025-11-15

Changed

Removed

[v0.5.5—beta] — 2025-11-08

Changed

Added

Fixed