Imprint Contact Scraper avatar
Imprint Contact Scraper

Pricing

Pay per usage

Go to Store
Imprint Contact Scraper

Imprint Contact Scraper

Developed by

Azquaier

Maintained by Community

An Actor that automatically locates and scrapes key contact details from German website imprint pages (Impressum). It extracts information such as company name, address, phone numbers, emails, and decision-maker details.

0.0 (0)

Pricing

Pay per usage

0

Monthly users

5

Runs succeeded

97%

Last modified

2 days ago

Notice: This Scraper is currently under development and may not be fully stable or feature-complete. Use with caution.

Imprint Contact Scraper

This Apify Actor automatically scrapes German company information from Imprint (Impressum) pages. You provide a list of website homepages, and the Actor attempts to find the imprint link, visit it, and extract key details like the company name, address, phone numbers, email addresses, and optionally, decision-makers. It is designed to work with typical German website structures where an imprint link is present on the homepage.

What can this Actor do?

  • Find Imprint Pages: Automatically locates links containing "Impressum" or "Imprint" on the provided website homepages.
  • HTML Cleaning: Removes common clutter like cookie banners, scripts, styles, and overly long text paragraphs before extraction to improve accuracy.
  • Extract Company Details: Parses the imprint page to find:
    • Company Name
    • Full Address (Street, Postal Code, City)
    • Phone Number(s)
    • Email Address(es)
  • Extract Decision Makers ("Entscheidungsträger") (Optional): If enabled, it identifies and extracts names associated with roles like "Geschäftsführer", "Vorstand", "Inhaber", etc.

Use Cases

This Actor is useful for various tasks, including:

  • Lead Generation: Quickly gather contact information for German businesses.
  • Market Research: Collect data on companies within a specific sector or region.
  • B2B Data Enrichment: Augment existing company records with imprint data.
  • Compliance Checks: Verify if websites have accessible and complete imprint information.

Input

The Actor requires the following input:

  1. Start URLs (start_urls): A list of homepage URLs for the websites you want to scrape. Each entry should be an object containing a url key.
  2. Search Decision Makers (search_decision_makers): A checkbox (true/false). Check this box if you want the Actor to attempt extracting decision-maker names and roles. This is disabled by default (false).

Example Input JSON

1{
2  "start_urls": [
3    { "url": "[https://www.example.de](https://www.example.de)" },
4    { "url": "[https://www.another-company.com](https://www.another-company.com)" }
5  ],
6  "search_decision_makers": true
7}

Output

The Actor outputs a dataset item for each successfully processed start URL. Each item is a JSON object containing the extracted data.

Example Output JSON

1{
2	"source_url": "https://www.example.de",
3	"imprint_url": "https://www.example.de/impressum",
4	"homepage_title": "Example Company - Homepage",
5	"company_name": "Example GmbH",
6	"address": "Musterstraße 1, 12345 Musterstadt",
7	"phone_number_1": "+49 (0) 123 456789",
8	"email_1": "info@example.de",
9	"Geschäftsführer": [ // Indicator found becomes the key
10		["Max Mustermann", 0], // Name and rank (order found)
11		["Erika Beispiel", 1]
12	],
13	"primary_decision_maker": "Max Mustermann" // Highest priority role found
14}
  • phone_number_X and email_X fields are numbered sequentially if multiple are found.
  • Decision maker fields (like Geschäftsführer, Vorstand, etc.) contain a list of [name, rank] tuples, where rank indicates the order of appearance. The primary_decision_maker field contains the name of the highest-priority contact found.

How to Use

  1. Add your desired homepage URLs to the Start URLs input field.
  2. Optionally, check the Search Decision Makers box.
  3. Click Start to run the Actor.
  4. Once the run finishes, preview or download the extracted data from the Dataset tab.

Limitations

  • The Actor relies on finding standard "Impressum" or "Imprint" links. It may fail if the link text or URL is unconventional.
  • Extraction accuracy depends on the HTML structure of the imprint page. Complex or unusual layouts might lead to incomplete or incorrect data.
  • Address validation uses an internal German postal code lookup table. While extensive, it might not cover every single postal code or city variant.
  • Decision maker extraction uses heuristics based on common German role titles. It may not capture all possible roles or names, especially if formatted unusually.

Dependencies

  • Apify SDK for Python (apify)
  • Beautiful Soup 4 (beautifulsoup4[lxml])
  • HTTPX (httpx)

Local Development

This Actor is primarily designed for the Apify platform. Running it locally requires Python 3.x and setting up the Apify SDK environment.

Clone the repository:

1git clone <repository_url>
2cd <repository_directory>

Set up a virtual environment (recommended):

1python -m venv venv
2source venv/bin/activate # On Windows use `venv\Scripts\activate`

Install dependencies:

pip install -r requirements.txt

Running Locally: You can attempt to run the scraper using python src/main.py. However, it relies on Apify platform features (Actor.get_input(), Actor.push_data(), etc.). For local development, you might need to:

  • Set the APIFY_TOKEN environment variable.
  • Modify the code to read input from a local file (e.g., input.json) instead of Actor.get_input().
  • Modify the code to print output instead of using Actor.push_data().

This Actor is designed to scrape contact information, which may include personal data such as names, phone numbers, and email addresses. Please be aware that the scraping and processing of personal data are subject to various laws and regulations, including but not limited to the General Data Protection Regulation (GDPR) in the European Union and other national and regional data protection laws.

It is your responsibility to research and understand the laws that apply to your specific use case and jurisdiction before using this Actor to scrape personal data. Ensure that you have a lawful basis for collecting and processing this information and that you comply with all applicable legal requirements.

Use this Actor responsibly and ethically.

License

This code is licensed under the MIT License.

GitHub repository.

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.