
German Imprint Contact Scraper
Pricing
Pay per usage

German Imprint Contact Scraper
An Actor that automatically locates and scrapes key contact details from German website imprint pages (Impressum). It extracts information such as company name, address, phone numbers, emails, and decision-maker details.
0.0 (0)
Pricing
Pay per usage
9
174
57
Issues response
8.5 hours
Last modified
6 days ago
German Imprint Scraper
A Python-based Apify Actor designed to find and extract contact and legal information from German imprint pages ("Impressum"). Simply provide a list of website homepages, and the actor will automatically locate the imprint page and scrape key details like company name, address, phone number, email, and commercial register number.
Beta Version Notice: This actor is currently in beta. While it's fully functional and returns results, you may encounter occasional quirks or incomplete features. I welcome your feedback! Please report any issues or suggestions you have.
💡 Features
- Automatic Imprint Page Discovery: Intelligently crawls websites to find the correct imprint page from your starting URLs.
- Selective Data Extraction: Choose exactly which data points you need, from basic contact info to advanced details like company decision-makers.
- Dual Fetching Technology:
- HTTP Mode: A fast, lightweight method for scraping simple, server-rendered websites.
- Headless Browser Mode (Playwright): A powerful option for modern, JavaScript-heavy websites. The actor can be configured to use this mode for all sites or as an automatic fallback if the standard HTTP method fails, ensuring maximum success rates.
- Proxy Support: Integrates seamlessly with Apify's proxy service to handle IP rotation and avoid blocking.
- Customizable Output: Include optional metadata or error records for detailed analysis and troubleshooting.
- Structured JSON Output: Delivers clean, well-structured data ready for use in your applications, databases, or CRM systems.
📥 Input Parameters
Configure the actor's behavior using these fields in the Apify Console Input tab or via API:
Field | Type | Description | Default | Required |
---|---|---|---|---|
startUrls | Array | Enter the homepage URLs of the websites to process. | [{ "url": "https://www.vita-cola.de/" }] | Yes |
fieldsToExtract | Array | Choose the specific pieces of information you want to collect. | ["company_name", "business_address"] | No |
usePlaywright | Boolean | Use a headless browser for all websites. Slower but more reliable for JavaScript-heavy sites. | false | No |
metaData | Boolean | Include technical details in the output. | false | No |
errorOutput | Boolean | Include a row in the output for each website that failed to process. | false | No |
debugLog | Boolean | Generate a verbose log for troubleshooting. | false | No |
proxyConfiguration | Object | Proxy settings. Apify Proxy is recommended. | { "useApifyProxy": true } | No |
📤 Output Data Structure
The exact fields depend on your fieldsToExtract
selection.
Example Output
{"start_url": "https://muster-firma.de/","imprint_url": "https://muster-firma.de/impressum","company_name": {"name": "Muster GmbH","confidence": 1},"business_address": {"full_address": "Musterstraße 123, 12345 Berlin","street": "Musterstraße","house_number": "123","postal_code": "12345","city": "Berlin"},"phone_number": {"phone_1": "+493012345678"},"emails": {"email_1": "kontakt@muster-firma.de"},"register_number": {"number": "HRB 12345 B","court": "Amtsgericht Charlottenburg"},"social_media": {"linkedin": "https://www.linkedin.com/company/muster-firma"},"decision_makers": ["Max Mustermann"],"metadata": {"domain": "muster-firma.de","fetch_method": "http","fallback_attempted": false,"scraped_at": "2025-08-28T12:15:45.003780"}}
Note: The numbered outputs like emails and phone numbers are sorted by confidence in how likely they are the main contact data for the company.
📊 Extractable Data in Detail
You can select any combination of the following fields for extraction:
Field | Description | Data Structure |
---|---|---|
company_name | Extracts the official company name. The result includes a confidence score indicating the likelihood of a correct match. The higher the number, the lower is the confidence. | Object |
business_address | Parses the full business address into structured components: full_address , street , house_number , postal_code , and city . | Object |
phone_number | Finds and extracts one or more phone numbers from the page. Results are keyed as phone_1 , phone_2 , etc. | Object |
emails | Finds and extracts one or more email addresses. The extractor prioritizes emails that match the website's domain. | Object |
register_number | Extracts the commercial register number ("Handelsregisternummer") and the corresponding registration court (Registergericht ). | Object |
social_media | Scans for and extracts links to common social media platforms like LinkedIn, Xing, Facebook, Instagram, etc. | Object |
decision_makers | (Premium) Identifies and extracts the names of key decision-makers ("Entscheidungsträger"). This feature uses an external NER (Named Entity Recognition) machine learning model to ensure accuracy. | Array |
##💲 Pricing
Currently only platform usage!
This actor uses a pay-per-event pricing model. You are charged based on your usage, ensuring you only pay for what you need. The costs are as follows:
Actor Start:$0.10
per runPer Website:Website Processed: $0.0004 for each URL from your input listSuccessful Result: $0.0026 for each website where data is successfully extractedDecision Maker Extracted: $0.0004 for decision-makers found per website (this is in addition to the successful result charge)Maximum Sum: $0.0035 per Website
Per 1000 Websites:Website Processed: $0.40 for 1000 URL from your input listSuccessful Result: $2.60 for 10000 websites where data is successfully extractedDecision Maker Extracted: $0.60 for decision-makers found per 1000 websites (this is in addition to the successful result charge)Maximum Sum: $3.50 per 1000 Websites
⚙️ Usage
- Input URLs: Go to the Input tab and paste the homepage URLs of the websites you want to scrape.
- Select Data: In the
fieldsToExtract
dropdown, select all the data points you wish to collect. - Configure Settings: Adjust settings like
usePlaywright
orproxyConfiguration
as needed. - Start the Actor: Click the Start button.
- Get Data: Once the run is finished, find your results in the Storage → Dataset tab.
🎯 Use Cases
- Lead Generation: Build targeted contact lists for sales and marketing.
- Compliance & Verification: Check for legally compliant imprint information.
- Market Research: Aggregate data on companies in a specific industry or region.
- Data Enrichment: Enhance existing company profiles with official contact and registration details.
⚖️ Legal Disclaimer
You are solely responsible for determining the legality of your use of this actor and the data it generates. The scraping and handling of data, particularly personal information, is subject to complex legal frameworks like the General Data Protection Regulation (GDPR/DSGVO), copyright laws, and the terms of service of the websites you scrape. It is your responsibility to ensure your use case is compliant with all applicable laws. This text does not constitute legal advice.
GDPR Notice: "Decision Makers" Feature
Please be aware that the decision_makers
feature uses an external API hosted on a private server in Germany for data processing.
- What is Processed: The text of the imprint page is sent to this API to identify personal names.
- Why: This is necessary for the Named Entity Recognition (NER) model to accurately extract decision-makers.
- Data Controller: You, the user, are the data controller. The actor's developer acts as the data processor for this specific task.
- Location & Compliance: All processing for this feature occurs within the EU (Germany) and is subject to GDPR (DSGVO).
- Data Storage: The text is processed in-memory and is not stored or logged on the external server.
- Important: This processing is external to the Apify platform and is not covered by Apify's DPA. By using this feature, you acknowledge this separate data processing activity.
🛠️ Maintainer
- Author: Dominic M. Quaiser
- Contact: mail@dominic-quaiser.io
- Website: dominic-quaiser.io