German Imprint Contact Scraper avatar
German Imprint Contact Scraper

Pricing

Pay per usage

Go to Apify Store
German Imprint Contact Scraper

German Imprint Contact Scraper

Developed by

Dominic M. Quaiser

Dominic M. Quaiser

Maintained by Community

An Actor that automatically locates and scrapes key contact details from German website imprint pages (Impressum). It extracts information such as company name, address, phone numbers, emails, and decision-maker details.

0.0 (0)

Pricing

Pay per usage

9

174

57

Issues response

8.5 hours

Last modified

6 days ago

German Imprint Scraper

A Python-based Apify Actor designed to find and extract contact and legal information from German imprint pages ("Impressum"). Simply provide a list of website homepages, and the actor will automatically locate the imprint page and scrape key details like company name, address, phone number, email, and commercial register number.

Beta Version Notice: This actor is currently in beta. While it's fully functional and returns results, you may encounter occasional quirks or incomplete features. I welcome your feedback! Please report any issues or suggestions you have.

💡 Features

  • Automatic Imprint Page Discovery: Intelligently crawls websites to find the correct imprint page from your starting URLs.
  • Selective Data Extraction: Choose exactly which data points you need, from basic contact info to advanced details like company decision-makers.
  • Dual Fetching Technology:
    • HTTP Mode: A fast, lightweight method for scraping simple, server-rendered websites.
    • Headless Browser Mode (Playwright): A powerful option for modern, JavaScript-heavy websites. The actor can be configured to use this mode for all sites or as an automatic fallback if the standard HTTP method fails, ensuring maximum success rates.
  • Proxy Support: Integrates seamlessly with Apify's proxy service to handle IP rotation and avoid blocking.
  • Customizable Output: Include optional metadata or error records for detailed analysis and troubleshooting.
  • Structured JSON Output: Delivers clean, well-structured data ready for use in your applications, databases, or CRM systems.

📥 Input Parameters

Configure the actor's behavior using these fields in the Apify Console Input tab or via API:

FieldTypeDescriptionDefaultRequired
startUrlsArrayEnter the homepage URLs of the websites to process.[{ "url": "https://www.vita-cola.de/" }]Yes
fieldsToExtractArrayChoose the specific pieces of information you want to collect.["company_name", "business_address"]No
usePlaywrightBooleanUse a headless browser for all websites. Slower but more reliable for JavaScript-heavy sites.falseNo
metaDataBooleanInclude technical details in the output.falseNo
errorOutputBooleanInclude a row in the output for each website that failed to process.falseNo
debugLogBooleanGenerate a verbose log for troubleshooting.falseNo
proxyConfigurationObjectProxy settings. Apify Proxy is recommended.{ "useApifyProxy": true }No

📤 Output Data Structure

The exact fields depend on your fieldsToExtract selection.

Example Output

{
"start_url": "https://muster-firma.de/",
"imprint_url": "https://muster-firma.de/impressum",
"company_name": {
"name": "Muster GmbH",
"confidence": 1
},
"business_address": {
"full_address": "Musterstraße 123, 12345 Berlin",
"street": "Musterstraße",
"house_number": "123",
"postal_code": "12345",
"city": "Berlin"
},
"phone_number": {
"phone_1": "+493012345678"
},
"emails": {
"email_1": "kontakt@muster-firma.de"
},
"register_number": {
"number": "HRB 12345 B",
"court": "Amtsgericht Charlottenburg"
},
"social_media": {
"linkedin": "https://www.linkedin.com/company/muster-firma"
},
"decision_makers": [
"Max Mustermann"
],
"metadata": {
"domain": "muster-firma.de",
"fetch_method": "http",
"fallback_attempted": false,
"scraped_at": "2025-08-28T12:15:45.003780"
}
}

Note: The numbered outputs like emails and phone numbers are sorted by confidence in how likely they are the main contact data for the company.

📊 Extractable Data in Detail

You can select any combination of the following fields for extraction:

FieldDescriptionData Structure
company_nameExtracts the official company name. The result includes a confidence score indicating the likelihood of a correct match. The higher the number, the lower is the confidence.Object
business_addressParses the full business address into structured components: full_address, street, house_number, postal_code, and city.Object
phone_numberFinds and extracts one or more phone numbers from the page. Results are keyed as phone_1, phone_2, etc.Object
emailsFinds and extracts one or more email addresses. The extractor prioritizes emails that match the website's domain.Object
register_numberExtracts the commercial register number ("Handelsregisternummer") and the corresponding registration court (Registergericht).Object
social_mediaScans for and extracts links to common social media platforms like LinkedIn, Xing, Facebook, Instagram, etc.Object
decision_makers(Premium) Identifies and extracts the names of key decision-makers ("Entscheidungsträger"). This feature uses an external NER (Named Entity Recognition) machine learning model to ensure accuracy.Array

##💲 Pricing

Currently only platform usage!

This actor uses a pay-per-event pricing model. You are charged based on your usage, ensuring you only pay for what you need. The costs are as follows:

  • Actor Start: $0.10 per run
  • Per Website:
    • Website Processed: $0.0004 for each URL from your input list
    • Successful Result: $0.0026 for each website where data is successfully extracted
    • Decision Maker Extracted: $0.0004 for decision-makers found per website (this is in addition to the successful result charge)
    • Maximum Sum: $0.0035 per Website
  • Per 1000 Websites:
    • Website Processed: $0.40 for 1000 URL from your input list
    • Successful Result: $2.60 for 10000 websites where data is successfully extracted
    • Decision Maker Extracted: $0.60 for decision-makers found per 1000 websites (this is in addition to the successful result charge)
    • Maximum Sum: $3.50 per 1000 Websites

⚙️ Usage

  1. Input URLs: Go to the Input tab and paste the homepage URLs of the websites you want to scrape.
  2. Select Data: In the fieldsToExtract dropdown, select all the data points you wish to collect.
  3. Configure Settings: Adjust settings like usePlaywright or proxyConfiguration as needed.
  4. Start the Actor: Click the Start button.
  5. Get Data: Once the run is finished, find your results in the StorageDataset tab.

🎯 Use Cases

  • Lead Generation: Build targeted contact lists for sales and marketing.
  • Compliance & Verification: Check for legally compliant imprint information.
  • Market Research: Aggregate data on companies in a specific industry or region.
  • Data Enrichment: Enhance existing company profiles with official contact and registration details.

You are solely responsible for determining the legality of your use of this actor and the data it generates. The scraping and handling of data, particularly personal information, is subject to complex legal frameworks like the General Data Protection Regulation (GDPR/DSGVO), copyright laws, and the terms of service of the websites you scrape. It is your responsibility to ensure your use case is compliant with all applicable laws. This text does not constitute legal advice.

GDPR Notice: "Decision Makers" Feature

Please be aware that the decision_makers feature uses an external API hosted on a private server in Germany for data processing.

  • What is Processed: The text of the imprint page is sent to this API to identify personal names.
  • Why: This is necessary for the Named Entity Recognition (NER) model to accurately extract decision-makers.
  • Data Controller: You, the user, are the data controller. The actor's developer acts as the data processor for this specific task.
  • Location & Compliance: All processing for this feature occurs within the EU (Germany) and is subject to GDPR (DSGVO).
  • Data Storage: The text is processed in-memory and is not stored or logged on the external server.
  • Important: This processing is external to the Apify platform and is not covered by Apify's DPA. By using this feature, you acknowledge this separate data processing activity.

🛠️ Maintainer