Impressum Standby Scraper (Playwright Version) avatar

Impressum Standby Scraper (Playwright Version)

Pricing

from $2.52 / 1,000 results

Go to Apify Store
Impressum Standby Scraper (Playwright Version)

Impressum Standby Scraper (Playwright Version)

Scrape German imprint pages instantly. Using a headless-browser for dynamic modern sites. This Apify Actor finds and extracts structured contact & legal data from any German website — company name, address, phone, fax, email, VAT ID, register number, social media & decision makers.

Pricing

from $2.52 / 1,000 results

Rating

0.0

(0)

Developer

Dominic M. Quaiser

Dominic M. Quaiser

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

3 days ago

Last modified

Share

German Imprint Scraper (Standby API)

Find and extract structured contact and legal information from German imprint pages ("Impressum") — in real time, one URL per request. Send a homepage URL to the actor's HTTP endpoint and it automatically discovers the site's imprint page and returns clean, structured data: company name, address, phone/fax, email, commercial register number, VAT ID, social media links, and decision-makers.

This actor runs in Apify Standby mode as a long-lived HTTP server. That makes it ideal for on-demand enrichment: low per-request latency, no run start-up overhead per URL, and a simple GET/POST API you can call directly from your application, a workflow tool, or another actor.

ℹ️ Which version is this?

This scraper is published in two variants, optimised for different kinds of websites:

🎭 Playwright version (this actor)

A headless-browser scraper that renders pages with a real Chromium engine. Use it for modern, JavaScript-heavy websites whose imprint links or content only appear after the page renders (e.g. Next.js / React apps). It is more robust but slower, and adds a small headless-browser charge per processed URL.

👉 Most imprint pages are plain server-rendered HTML and don't need a browser. For those, the HTTP version is faster and cheaper.

💡 Features

  • Automatic imprint-page discovery: point the actor at a homepage; it finds the correct "Impressum" page for you.
  • Selective data extraction: request only the fields you need, from basic contact info to ML-extracted decision-makers.
  • Real-time Standby API: GET or POST a single URL and get structured JSON back immediately. One request is processed at a time per container.
  • Proxy support: integrates with Apify Proxy for IP rotation and to reduce blocking.
  • Structured JSON output: clean, predictable records ready for your CRM, database, or downstream pipeline.

🔌 Standby API

In Standby mode the actor exposes an HTTP server. Apify gives every Standby actor a base URL; append the query parameters below and authenticate with your Apify API token (e.g. as a ?token= query parameter or Authorization: Bearer <token> header).

GET / — scrape one URL (query string)

ParameterRequiredDescription
startUrlYesHomepage URL to scrape. The actor discovers the imprint page automatically. https:// is prepended if the scheme is missing.
fieldsToExtractNoComma-separated list of fields to extract. Defaults to all fields.
metaDataNotrue/false — include extra technical details in the response. Default false.
$curl 'https://dominic-quaiser--impressum-standby-scraper.apify.actor/?startUrl=https://www.renault.de/&fieldsToExtract=company_name,emails,phone_number&token=<APIFY_TOKEN>'

POST / — scrape one URL (JSON body)

curl -X POST 'https://dominic-quaiser--impressum-standby-scraper.apify.actor/?token=<APIFY_TOKEN>' \
-H 'Content-Type: application/json' \
-d '{
"startUrl": "https://www.renault.de/",
"fieldsToExtract": ["company_name", "emails", "phone_number"],
"metaData": false
}'

GET /health — health check & stats

Returns 200 with a snapshot of running counters (total requests, successful scrapes, errors, etc.). Useful for uptime checks.

$curl 'https://dominic-quaiser--impressum-standby-scraper.apify.actor/health'

Responses

StatusMeaning
200Scrape completed. Body is { "url": ..., "result": { ... } }, or { "url": ..., "result": null, "message": "No data extracted" } when nothing could be extracted.
400Missing or invalid startUrl, or an invalid JSON body.
500Unhandled scraper error.
504Processing timed out.

Each successful result is also pushed to the actor's default dataset, so you can browse or export your scrape history from the Apify Console even when calling the API directly.

📊 Extractable data

Select any combination of the following fields via fieldsToExtract:

FieldDescriptionType
company_nameThe official company name, with a confidence score for the match.Object
business_addressFull address parsed into full_address, street, house_number, postal_code, city.Object
phone_numberOne or more phone numbers, keyed phone_1, phone_2, …Object
fax_numberOne or more fax numbers, keyed fax_1, fax_2, …Object
emailsOne or more email addresses; emails matching the site's domain are prioritised.Object
register_numberCommercial register number ("Handelsregisternummer") and the registration court ("Registergericht").Object
vat_idGerman VAT ID ("Umsatzsteuer-ID") with checksum validation, e.g. DE123456788.Object
social_mediaLinks to platforms like LinkedIn, Xing, Facebook, Instagram, etc.Object
decision_makers(Premium) Names of key decision-makers ("Entscheidungsträger") extracted via an external NER (Named Entity Recognition) model.Array

Numbered outputs (emails, phone numbers, …) are ordered by how likely each value is the company's main contact.

📤 Output structure

The exact fields depend on your fieldsToExtract selection.

{
"start_url": "https://muster-firma.de/",
"imprint_url": "https://muster-firma.de/impressum",
"company_name": {
"name": "Muster GmbH",
"confidence": 1
},
"business_address": {
"full_address": "Musterstraße 123, 12345 Berlin",
"street": "Musterstraße",
"house_number": "123",
"postal_code": "12345",
"city": "Berlin"
},
"phone_number": { "phone_1": "+493012345678" },
"fax_number": { "fax_1": "+493012345679" },
"emails": { "email_1": "kontakt@muster-firma.de" },
"register_number": {
"number": "HRB 12345 B",
"court": "Amtsgericht Charlottenburg"
},
"vat_id": { "vat_id": "DE123456788" },
"social_media": {
"linkedin": "https://www.linkedin.com/company/muster-firma"
},
"decision_makers": ["Max Mustermann"],
"metadata": {
"domain": "muster-firma.de",
"fetch_method": "http",
"fallback_attempted": false,
"scraped_at": "2026-06-22T12:04:48.003780"
}
}

The metadata block is only included when metaData is enabled.

You are solely responsible for determining the legality of your use of this actor and the data it generates. Scraping and handling data — particularly personal information — is subject to legal frameworks such as the GDPR (DSGVO), copyright law, and the terms of service of the sites you scrape. Ensure your use case is compliant with all applicable laws. This text is not legal advice.

GDPR notice: "Decision Makers" feature

The decision_makers feature uses an external API hosted on a private server in Europe (Germany) to process data.

  • What is processed: the text of the imprint page is sent to the API to identify personal names.
  • Why: the NER model needs the page text to accurately extract decision-makers.
  • Data controller: you, the user, are the data controller; the actor's developer acts as data processor for this task.
  • Location & compliance: all processing occurs within the EU and is subject to the GDPR (DSGVO).
  • Data storage: the text is processed in-memory and is not stored or logged on the external server.
  • Important: this processing is external to the Apify platform and not covered by Apify's DPA. By using this feature you acknowledge this separate processing activity.

🤖 Other actors

🎯 Use cases

  • Lead generation — build targeted contact lists for sales and marketing.
  • Real-time enrichment — call the Standby API to enrich a record the moment a lead enters your CRM.
  • Compliance & verification — check for legally compliant imprint information.
  • Market research — aggregate company data for a specific industry or region.

🛠️ Maintainer