
Imprint Contact Scraper
Pricing
Pay per usage

Imprint Contact Scraper
An Actor that automatically locates and scrapes key contact details from German website imprint pages (Impressum). It extracts information such as company name, address, phone numbers, emails, and decision-maker details.
0.0 (0)
Pricing
Pay per usage
0
Monthly users
5
Runs succeeded
97%
Last modified
2 days ago
Notice: This Scraper is currently under development and may not be fully stable or feature-complete. Use with caution.
Imprint Contact Scraper
This Apify Actor automatically scrapes German company information from Imprint (Impressum) pages. You provide a list of website homepages, and the Actor attempts to find the imprint link, visit it, and extract key details like the company name, address, phone numbers, email addresses, and optionally, decision-makers. It is designed to work with typical German website structures where an imprint link is present on the homepage.
What can this Actor do?
- Find Imprint Pages: Automatically locates links containing "Impressum" or "Imprint" on the provided website homepages.
- HTML Cleaning: Removes common clutter like cookie banners, scripts, styles, and overly long text paragraphs before extraction to improve accuracy.
- Extract Company Details: Parses the imprint page to find:
- Company Name
- Full Address (Street, Postal Code, City)
- Phone Number(s)
- Email Address(es)
- Extract Decision Makers ("Entscheidungsträger") (Optional): If enabled, it identifies and extracts names associated with roles like "Geschäftsführer", "Vorstand", "Inhaber", etc.
Use Cases
This Actor is useful for various tasks, including:
- Lead Generation: Quickly gather contact information for German businesses.
- Market Research: Collect data on companies within a specific sector or region.
- B2B Data Enrichment: Augment existing company records with imprint data.
- Compliance Checks: Verify if websites have accessible and complete imprint information.
Input
The Actor requires the following input:
- Start URLs (
start_urls
): A list of homepage URLs for the websites you want to scrape. Each entry should be an object containing aurl
key. - Search Decision Makers (
search_decision_makers
): A checkbox (true/false). Check this box if you want the Actor to attempt extracting decision-maker names and roles. This is disabled by default (false
).
Example Input JSON
1{ 2 "start_urls": [ 3 { "url": "[https://www.example.de](https://www.example.de)" }, 4 { "url": "[https://www.another-company.com](https://www.another-company.com)" } 5 ], 6 "search_decision_makers": true 7}
Output
The Actor outputs a dataset item for each successfully processed start URL. Each item is a JSON object containing the extracted data.
Example Output JSON
1{ 2 "source_url": "https://www.example.de", 3 "imprint_url": "https://www.example.de/impressum", 4 "homepage_title": "Example Company - Homepage", 5 "company_name": "Example GmbH", 6 "address": "Musterstraße 1, 12345 Musterstadt", 7 "phone_number_1": "+49 (0) 123 456789", 8 "email_1": "info@example.de", 9 "Geschäftsführer": [ // Indicator found becomes the key 10 ["Max Mustermann", 0], // Name and rank (order found) 11 ["Erika Beispiel", 1] 12 ], 13 "primary_decision_maker": "Max Mustermann" // Highest priority role found 14}
phone_number_X
andemail_X
fields are numbered sequentially if multiple are found.- Decision maker fields (like
Geschäftsführer
,Vorstand
, etc.) contain a list of[name, rank]
tuples, where rank indicates the order of appearance. Theprimary_decision_maker
field contains the name of the highest-priority contact found.
How to Use
- Add your desired homepage URLs to the Start URLs input field.
- Optionally, check the Search Decision Makers box.
- Click Start to run the Actor.
- Once the run finishes, preview or download the extracted data from the Dataset tab.
Limitations
- The Actor relies on finding standard "Impressum" or "Imprint" links. It may fail if the link text or URL is unconventional.
- Extraction accuracy depends on the HTML structure of the imprint page. Complex or unusual layouts might lead to incomplete or incorrect data.
- Address validation uses an internal German postal code lookup table. While extensive, it might not cover every single postal code or city variant.
- Decision maker extraction uses heuristics based on common German role titles. It may not capture all possible roles or names, especially if formatted unusually.
Dependencies
- Apify SDK for Python (
apify
) - Beautiful Soup 4 (
beautifulsoup4[lxml]
) - HTTPX (
httpx
)
Local Development
This Actor is primarily designed for the Apify platform. Running it locally requires Python 3.x and setting up the Apify SDK environment.
Clone the repository:
1git clone <repository_url> 2cd <repository_directory>
Set up a virtual environment (recommended):
1python -m venv venv 2source venv/bin/activate # On Windows use `venv\Scripts\activate`
Install dependencies:
pip install -r requirements.txt
Running Locally: You can attempt to run the scraper using python src/main.py
. However, it relies on Apify platform features (Actor.get_input()
, Actor.push_data()
, etc.). For local development, you might need to:
- Set the
APIFY_TOKEN
environment variable. - Modify the code to read input from a local file (e.g.,
input.json
) instead ofActor.get_input()
. - Modify the code to print output instead of using
Actor.push_data()
.
Important Legal Notice Regarding Scraping Personal Data
This Actor is designed to scrape contact information, which may include personal data such as names, phone numbers, and email addresses. Please be aware that the scraping and processing of personal data are subject to various laws and regulations, including but not limited to the General Data Protection Regulation (GDPR) in the European Union and other national and regional data protection laws.
It is your responsibility to research and understand the laws that apply to your specific use case and jurisdiction before using this Actor to scrape personal data. Ensure that you have a lawful basis for collecting and processing this information and that you comply with all applicable legal requirements.
Use this Actor responsibly and ethically.
License
This code is licensed under the MIT License.
Pricing
Pricing model
Pay per usageThis Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.