# German Imprint Leads Scraper (`automation-lab/german-imprint-leads-scraper`) Actor

Extract German Impressum legal contacts, company details, VAT IDs, HRB records, emails, and decision-makers from domains.

- **URL**: https://apify.com/automation-lab/german-imprint-leads-scraper.md
- **Developed by:** [Stas Persiianenko](https://apify.com/automation-lab) (community)
- **Categories:** Lead generation, Business
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## German Imprint Leads Scraper

Extract structured legal and contact data from German public Impressum pages.

Use this actor when you already have a list of German company domains and need CRM-ready enrichment: company name, legal form, registered address, phone numbers, emails, VAT ID, Handelsregister details, managing directors, responsible persons, social links, source snippets, and confidence flags.

### What does German Imprint Leads Scraper do?

It visits each submitted domain, checks common German legal-contact pages such as `/impressum`, `/service/impressum`, `/imprint`, and `/kontakt`, follows likely footer links, and saves one structured lead record per domain.

### Who is it for?

- 🧑‍💼 Sales teams enriching German B2B account lists
- 🧾 Compliance teams checking public company disclosures
- 🧲 Lead-generation agencies building Germany-specific datasets
- 🧑‍💻 Recruiters finding company decision makers
- 🧹 CRM operations teams normalizing German legal contacts

### Why use it?

German websites often place high-value company data in the Impressum instead of on a marketing contact page. This actor targets that legal-contact workflow directly instead of returning generic page text.

### What data can it extract?

| Field | Description |
| --- | --- |
| `inputUrl` | Submitted domain or URL |
| `imprintUrl` | Best Impressum/contact page found |
| `companyName` | Legal company name when detected |
| `legalForm` | GmbH, AG, KG, UG, e.K., and similar forms |
| `address` | Registered or legal address snippet |
| `emails` | Public email addresses |
| `phoneNumbers` | Public phone numbers |
| `vatId` | German VAT ID / USt-IdNr |
| `registrationCourt` | Amtsgericht / register court |
| `registrationNumber` | HRB/HRA registration number |
| `managingDirectors` | Geschäftsführer, Vorstand, or similar names |
| `responsiblePerson` | Responsible person when disclosed |
| `confidenceFlags` | Flags showing which important fields were found |
| `sourceSnippets` | Text snippets for verification |

### How much does it cost to extract German Impressum leads?

The actor uses pay-per-event pricing with a small start fee and a per-result fee. Current configured pricing is:

| Event | Free | Bronze | Silver | Gold | Platinum | Diamond |
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
| Run start | $0.005 | $0.005 | $0.005 | $0.005 | $0.005 | $0.005 |
| Result extracted | $0.0006508 | $0.00056591 | $0.00044141 | $0.00033955 | $0.00022636 | $0.00015845 |

Example estimates before Apify platform fees: 100 extracted domains cost about $0.070 on Free, $0.062 on Bronze, and $0.039 on Gold, including the start event. The default two-domain prefill costs about $0.0063 on Free, so it stays suitable for a quick first test.

### Input

Provide domains or URLs in `startUrls`.

```json
{
  "startUrls": [
    { "url": "https://www.rewe.de" },
    { "url": "https://www.dm.de" }
  ],
  "maxPagesPerDomain": 8,
  "includeSubpages": true,
  "proxyConfiguration": { "useApifyProxy": false }
}
````

### Output

Each dataset item represents one submitted domain or URL.

```json
{
  "inputUrl": "https://www.rewe.de",
  "inputDomain": "rewe.de",
  "imprintUrl": "https://www.rewe.de/service/impressum/",
  "status": "found",
  "companyName": "REWE Markt GmbH",
  "legalForm": "GmbH",
  "emails": ["impressum@rewe.de"],
  "vatId": "DE812706034",
  "registrationNumber": "HRB 66773",
  "confidenceFlags": ["company_name_found", "email_found"]
}
```

### How to use it

1. Prepare a list of German domains or websites.
2. Paste them into the Start URLs field.
3. Keep `maxPagesPerDomain` low for quick enrichment.
4. Run the actor.
5. Export the dataset as JSON, CSV, Excel, or via API.

### Tips for better results

- Submit homepages, not random blog posts.
- Keep `includeSubpages` enabled so footer Impressum links are followed.
- Use no proxy first; most public legal pages are accessible directly.
- Increase `maxPagesPerDomain` only for sites with unusual navigation.

### Status values

- `found` means an Impressum/contact page was located and parsed.
- `not_found` means pages were checked but no legal-contact page scored high enough.
- `error` means the domain could not be processed due to a network or parsing error.

### Confidence flags

Confidence flags help filter records:

- `company_name_found`
- `address_found`
- `email_found`
- `phone_found`
- `vat_id_found`
- `registration_found`
- `decision_maker_found`

### Integrations

Use the output with:

- HubSpot or Salesforce enrichment workflows
- Clay tables and lead-routing systems
- Google Sheets lead lists
- Compliance review queues
- Internal data-quality checks

### API usage: Node.js

```js
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('automation-lab/german-imprint-leads-scraper').call({
  startUrls: [{ url: 'https://www.rewe.de' }],
  maxPagesPerDomain: 8,
});
console.log(run.defaultDatasetId);
```

### API usage: Python

```python
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('automation-lab/german-imprint-leads-scraper').call(run_input={
    'startUrls': [{'url': 'https://www.rewe.de'}],
    'maxPagesPerDomain': 8,
})
print(run['defaultDatasetId'])
```

### API usage: cURL

```bash
curl -X POST 'https://api.apify.com/v2/acts/automation-lab~german-imprint-leads-scraper/runs?token=YOUR_APIFY_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{"startUrls":[{"url":"https://www.rewe.de"}],"maxPagesPerDomain":8}'
```

### MCP usage

Connect Apify MCP with this actor enabled:

`https://mcp.apify.com/?tools=automation-lab/german-imprint-leads-scraper`

Claude Code setup:

```bash
claude mcp add apify-german-imprint https://mcp.apify.com/?tools=automation-lab/german-imprint-leads-scraper
```

Claude Desktop JSON config:

```json
{
  "mcpServers": {
    "apify-german-imprint": {
      "url": "https://mcp.apify.com/?tools=automation-lab/german-imprint-leads-scraper"
    }
  }
}
```

Example prompts:

- "Extract Impressum contacts for these 20 German domains."
- "Find VAT IDs and managing directors for this German prospect list."
- "Check which domains have no public legal contact details."

### Legality

This actor extracts publicly available business information from websites you provide. You are responsible for using the data lawfully, respecting website terms, and complying with GDPR, ePrivacy, and other applicable rules.

### FAQ

#### Why did one domain return `not_found`?

The site may use a non-standard legal page URL, block automated HTTP clients, or render legal data only in JavaScript. Try submitting the exact Impressum URL or increasing `maxPagesPerDomain`.

#### Does this actor validate email deliverability?

No. It extracts public emails from pages. Use a dedicated email validation service if you need deliverability checks.

### Troubleshooting

If a site returns no data, try raising `maxPagesPerDomain` or submitting the exact Impressum URL.

If many requests fail, enable Apify Proxy or retry later. Some sites block automated traffic intermittently.

### Related scrapers

- https://apify.com/automation-lab/website-contact-finder
- https://apify.com/automation-lab/website-emails-scraper
- https://apify.com/automation-lab/gelbeseiten-scraper
- https://apify.com/automation-lab/wlw-de-supplier-directory-scraper

### Limitations

The actor uses HTTP and Cheerio for speed and low cost. Some JavaScript-only pages may expose fewer fields than a browser-based scraper.

### Privacy notes

The actor does not log in, bypass paywalls, or access private systems. It only reads public pages reachable from submitted domains.

### Changelog

Initial version extracts German Impressum legal-contact fields from submitted domains and URLs.

### Support

If you need fields tuned for a specific German industry or CMS pattern, open an Apify issue with sample URLs and expected output.

### Field reference

`pagesChecked` lists every URL requested for the domain. `sourceSnippets` contains nearby text around key legal labels so users can audit extraction quality.

### Performance

HTTP-only crawling keeps runs lightweight. The default platform memory is 512 MB and the default crawl depth is capped per domain.

### Data quality workflow

Use `confidenceFlags` to route complete leads into your CRM and send lower-confidence rows to manual review.

# Actor input Schema

## `startUrls` (type: `array`):

German websites to scan. You can enter bare domains (example.de) or full URLs. The actor checks common Impressum/contact paths and footer links.

## `maxPagesPerDomain` (type: `integer`):

Maximum pages checked for each domain, including homepage, common Impressum paths, and discovered contact/legal footer links.

## `includeSubpages` (type: `boolean`):

When enabled, the actor follows homepage links whose text or href looks like Impressum, Imprint, Kontakt, Contact, Legal, or Anbieterkennzeichnung.

## `proxyConfiguration` (type: `object`):

Optional proxy settings. Most German company sites work without a proxy; enable Apify Proxy only if your target list blocks datacenter traffic.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://www.rewe.de"
    },
    {
      "url": "https://www.dm.de"
    }
  ],
  "maxPagesPerDomain": 8,
  "includeSubpages": true,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://www.rewe.de"
        },
        {
            "url": "https://www.dm.de"
        }
    ],
    "maxPagesPerDomain": 8,
    "includeSubpages": true,
    "proxyConfiguration": {
        "useApifyProxy": false
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("automation-lab/german-imprint-leads-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [
        { "url": "https://www.rewe.de" },
        { "url": "https://www.dm.de" },
    ],
    "maxPagesPerDomain": 8,
    "includeSubpages": True,
    "proxyConfiguration": { "useApifyProxy": False },
}

# Run the Actor and wait for it to finish
run = client.actor("automation-lab/german-imprint-leads-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://www.rewe.de"
    },
    {
      "url": "https://www.dm.de"
    }
  ],
  "maxPagesPerDomain": 8,
  "includeSubpages": true,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}' |
apify call automation-lab/german-imprint-leads-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=automation-lab/german-imprint-leads-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "German Imprint Leads Scraper",
        "description": "Extract German Impressum legal contacts, company details, VAT IDs, HRB records, emails, and decision-makers from domains.",
        "version": "0.1",
        "x-build-id": "oiDNcu0YflEJFs0eN"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/automation-lab~german-imprint-leads-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-automation-lab-german-imprint-leads-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/automation-lab~german-imprint-leads-scraper/runs": {
            "post": {
                "operationId": "runs-sync-automation-lab-german-imprint-leads-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/automation-lab~german-imprint-leads-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-automation-lab-german-imprint-leads-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Domains or URLs",
                        "type": "array",
                        "description": "German websites to scan. You can enter bare domains (example.de) or full URLs. The actor checks common Impressum/contact paths and footer links.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxPagesPerDomain": {
                        "title": "Max pages per domain",
                        "minimum": 1,
                        "maximum": 25,
                        "type": "integer",
                        "description": "Maximum pages checked for each domain, including homepage, common Impressum paths, and discovered contact/legal footer links.",
                        "default": 8
                    },
                    "includeSubpages": {
                        "title": "Follow discovered legal/contact links",
                        "type": "boolean",
                        "description": "When enabled, the actor follows homepage links whose text or href looks like Impressum, Imprint, Kontakt, Contact, Legal, or Anbieterkennzeichnung.",
                        "default": true
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Optional proxy settings. Most German company sites work without a proxy; enable Apify Proxy only if your target list blocks datacenter traffic.",
                        "default": {
                            "useApifyProxy": false
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
