Willhaben.at [Only $1đź’°] Marktplatz-Immobilien-Auto-Job Scraper avatar

Willhaben.at [Only $1đź’°] Marktplatz-Immobilien-Auto-Job Scraper

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Willhaben.at [Only $1đź’°] Marktplatz-Immobilien-Auto-Job Scraper

Willhaben.at [Only $1đź’°] Marktplatz-Immobilien-Auto-Job Scraper

đź’°$1 per 1,000 results, unlimited extraction. Extract structured data from willhaben.at across Marktplatz, Immobilien, Auto & Motor, and Jobs. Impit TLS. Fields: title, price/rent, area, rooms, address, images, contact, listing IDs; search and detail trace URLs. Jobs: role, company, location, emails

Pricing

from $1.00 / 1,000 results

Rating

5.0

(1)

Developer

Muhamed Didovic

Muhamed Didovic

Maintained by Community

Actor stats

0

Bookmarked

15

Total users

13

Monthly active users

6 days ago

Last modified

Share

Overview

Extract structured listings from willhaben.at (Austria). The site is organised into four main areas—Marktplatz, Immobilien, Auto & Motor, and Jobs—as shown in the main navigation. The actor calls Willhaben’s list APIs (search results) and public detail JSON for each listing using impit (Chrome TLS fingerprinting). You get one dataset row per listing after the detail step, with a normalised flat schema plus Willhaben attribute keys (and optional widget-derived fields) for spreadsheets.

Use it to monitor ads, export price and location data, or feed analytics; Jobs runs use a dedicated job object shape (see Output Structure).


Features

  • Four supported verticals (aligned with Willhaben’s top navigation):

    • Marktplatz — classifieds under /iad/kaufen-und-verkaufen/… (marketplace).
    • Immobilien — real estate under /iad/immobilien/….
    • Auto & Motor — vehicles under /iad/auto/… (and related iad auto paths).
    • Jobs — job search under /jobs/…, resolved internally to Willhaben’s jobs list JSON endpoint.
  • List + detail pipeline:

    • List: iad www URLs are turned into search JSON requests; Jobs www URLs into a separate jobs search JSON list.
    • Detail: each hit is followed by a public detail JSON request (listing or job id) until maxItems caps queued detail requests per startUrls entry (each www link has its own budget).
  • Pagination:

    • iad / search: page on the list request (query order preserved; slashed keys like ESTATE_SIZE/LIVING_AREA_FROM are not re-encoded).
    • Jobs: page on the jobs list request (1-based); further pages enqueued from rowsFound / rowsRequested.
  • Flattened export (flattenOutput: true):

    • One flat object per listing: mapped fields plus all listingDocument.attributesFlat keys at the top level (collisions → attr_<name>). The sample below reflects this mode.

How to Use

  1. Set Up: Apify account and this actor (or run locally with apify run / npm run start:dev).
  2. Provide Input: Add one or more Willhaben URLs under startUrls (https://www.willhaben.at/…).
  3. Configure: Set maxItems (cap on detail requests queued per start URL), flattenOutput, concurrency, retries, and proxy (Austria / residential often works well for apifyProxyCountry: AT).
  4. Run & Export: Download JSON / CSV from the dataset. If list or detail returns 403, refresh the request headers / signatures from a capture or use the request-signature env vars documented in the repo.

Usage Limitations

Free / non-paying Apify users may be subject to platform limits on dataset items or charges. Paid users typically get higher limits; adjust maxItems to control how many detail pages are fetched per start URL (each link in startUrls can queue up to maxItems details). Willhaben may rate-limit or block datacenter IPs—proxy is recommended (e.g. RESIDENTIAL with AT).


Input Configuration

Example input:

{
"startUrls": [
{
"url": "https://www.willhaben.at/jobs/suche?employment_type=109&location=Salzburg&region=14096"
},
{
"url": "https://www.willhaben.at/iad/immobilien/mietwohnungen/mietwohnung-angebote?sfId=0db5c6aa-6f06-4760-a077-5d2d88453916&rows=30&areaId=5&page=1&PRICE_FROM=150&PRICE_TO=1300"
},
{
"url": "https://www.willhaben.at/iad/kaufen-und-verkaufen/marktplatz/kinderfeste-kinderfeiern-4282/a/zustand-neu-22?sfId=60dce79c-cd0d-421b-ab8a-d278d9dea396&rows=30&isNavigation=true&PRICE_FROM=0&PRICE_TO=1"
}
],
"maxItems": 30,
"flattenOutput": true,
"maxConcurrency": 50,
"minConcurrency": 1,
"maxRequestRetries": 100,
"proxy": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"],
"apifyProxyCountry": "AT"
}
}

Input Fields Explanation

  • startUrls (startUrls): Array of objects with url pointing to a www.willhaben.at listing hub—Jobs (/jobs/…), Immobilien, Marktplatz, or Auto under /iad/…. The actor converts these to the appropriate list URL.
  • maxItems (maxItems): Maximum number of listings for which a detail request is queued, separately for each object in startUrls (keyed by that row’s url). Two start URLs with maxItems: 30 can yield up to 60 detail scrapes. Default 30 (see actor schema).
  • flattenOutput (flattenOutput): When true, each row is a single flat object (mapped fields + attribute flat index at the top level for iad verticals). Jobs: adds willhabenKind: jobs and source: willhaben_jobs_detail around the job object. When false, iad rows include nested listingDocument and actorMeta; Jobs rows are the job object + apify_* only. Default false in schema; the Immobilien sample below uses flattened shape; data-jobs.json shows flattened Jobs.
  • maxConcurrency (maxConcurrency): Maximum list + detail requests in flight at once (shared queue). minConcurrency in the schema is ignored by this actor.
  • maxRequestRetries (maxRequestRetries): Retries per failed list/detail impit fetch (with proxy rotation on 401/403/429).
  • proxy (proxy): Apify proxy or custom configuration; apifyProxyCountry: AT is a common choice for Willhaben.

Output Structure

The dataset contains one row per listing (or one row per job) after the detail step.

iad / Auto / Marktplatz / Immobilien (not Jobs):

  • source: willhaben_detail
  • willhabenKind: real_estate, marketplace, or car
  • With flattenOutput: true, see the first sample below (data.json).
  • With flattenOutput: false, the same logical data appears under nested listingDocument and actorMeta.

Jobs:

  • Native-style job object fields (id, title, company, employmentModes, jobLocations, …) plus apify_scrapedAt and apify_extracted_emails.
  • With flattenOutput: true, rows also include willhabenKind: jobs and source: willhaben_jobs_detail (see data-jobs.json sample below).
  • With flattenOutput: false, rows are only the job fields plus the two apify_* fields (no listingDocument).

Filter non-Jobs rows with source === 'willhaben_detail'. Filter Jobs with willhabenKind === 'jobs' (and optionally source === 'willhaben_jobs_detail' when flattened).


Sample: willhaben_detail (first object in data.json)

The JSON below is based on the first record of a real export (willhabenKind: real_estate, flattenOutput: true). Long strings, internal URLs, and image arrays are shortened/redacted for the README; the on-disk file contains the full values. _readme_note is documentation-only and does not appear in live output.

{
"source": "willhaben_detail",
"willhabenKind": "real_estate",
"listingId": null,
"url": null,
"platform": "willhaben",
"scrapedAt": "2026-04-07T06:43:48.489Z",
"publishedAt": null,
"updatedAt": null,
"objectType": null,
"transactionType": "rent",
"title": "Erdgeschosswohnung mit Garten",
"description": null,
"price": null,
"priceText": null,
"currency": "EUR",
"pricePerM2": null,
"rentGross": 1300,
"rentNet": null,
"rentPerM2": null,
"operatingCosts": null,
"heatingCosts": null,
"parkingCosts": null,
"additionalCostsTotal": null,
"livingAreaM2": null,
"usableAreaM2": null,
"totalAreaM2": null,
"plotAreaM2": null,
"roomCount": null,
"bedroomCount": null,
"bathroomCount": null,
"floor": null,
"street": null,
"postalCode": null,
"city": null,
"district": null,
"region": null,
"country": null,
"latitude": null,
"longitude": null,
"hasSeaView": false,
"distanceToSea": null,
"distanceToCenter": null,
"hasMountainView": false,
"yearBuilt": null,
"condition": null,
"energyCertificateAvailable": false,
"energyClass": null,
"heatingType": null,
"hasBalcony": false,
"hasTerrace": false,
"hasLoggia": false,
"hasGarden": false,
"hasYard": false,
"hasPool": false,
"hasGarage": false,
"hasCarport": false,
"hasParkingSpace": false,
"hasStorageRoom": false,
"hasElevator": false,
"hasBasement": false,
"hasAirConditioning": false,
"hasBuiltInKitchen": false,
"isBarrierFree": false,
"sellerName": null,
"sellerType": null,
"phone": null,
"email": null,
"imageUrls": [],
"videoUrls": [],
"locationText": null,
"rawHtmlSnippet": null,
"apify_actor": "willhaben-cheerio",
"apify_scrapedAt": "2026-04-07T06:43:48.489Z",
"detailUrl": "[redacted internal detail URL]",
"originalInputUrl": "[redacted input URL]",
"listPage": 1,
"listingDocumentId": "1355309481",
"listingDocumentUuid": "95079344-3e17-4b08-8be2-b34e9d305250",
"LOCATION": "Aigelsbrunn",
"POSTCODE": "5204",
"STATE": "Salzburg",
"BODY_DYN": "Zur Vermietung gelangt eine moderne und gepflegte Erdgeschosswohnung …",
"ORG_UUID": "0bf5c502-ee07-4290-aa9d-c82a3667e9e5",
"ESTATE_SIZE/LIVING_AREA": "77",
"DISTRICT": "Salzburg-Umgebung",
"HEADING": "Erdgeschosswohnung mit Garten",
"LOCATION_QUALITY": "1.0",
"PUBLISHED": "1775498880000",
"COUNTRY": "Ă–sterreich",
"LOCATION_ID": "113901",
"PROPERTY_TYPE": "ErdgeschoĂźwohnung",
"NUMBER_OF_ROOMS": "3",
"ADTYPE_ID": "2",
"PROPERTY_TYPE_ID": "105",
"ADID": "1355309481",
"ORGID": "24712601",
"SEO_URL": "immobilien/d/mietwohnungen/salzburg/salzburg-umgebung/erdgeschosswohnung-mit-garten-1355309481/",
"ALL_IMAGE_URLS": "1/135/530/9481_317321052.jpg;1/135/530/9481_731380688.jpg;…",
"PUBLISHED_String": "2026-04-06T20:08:00Z",
"ESTATE_PREFERENCE": "15, 24, 250, 27, 28",
"categorytreeids": "7276",
"RENT/PER_MONTH_LETTINGS": "1300.0",
"PRODUCT_ID": "200",
"MMO": "1/135/530/9481_317321052.jpg",
"ROOMS": "3X3",
"AD_UUID": "95079344-3e17-4b08-8be2-b34e9d305250",
"ADDRESS": "Haidach 14",
"COORDINATES": "47.99015,13.24095",
"PRICE": "1300",
"PRICE_FOR_DISPLAY": "€ 1.300",
"ESTATE_SIZE": "77",
"ISPRIVATE": "1",
"PROPERTY_TYPE_FLAT": "true",
"DISPLAY/Gesamtmiete": "€ 1.300",
"DISPLAY/Wohnfläche": "77 m²",
"DISPLAY/Zimmer": "3",
"WIDGET_TEXT/Objektstandort": "Haidach 14,\n5204 Aigelsbrunn, Salzburg-Umgebung, Salzburg",
"WIDGET/Objektinformation/Objekttyp": "ErdgeschoĂźwohnung",
"WIDGET/Objektinformation/VerfĂĽgbar": "nach Vereinbarung",
"WIDGET/Objektinformation/Bautyp": "Altbau",
"WIDGET/Objektinformation/Heizung": "Pellets",
"WIDGET/Objektinformation/Zustand": "Renoviert",
"DESCRIPTION_FROM_WIDGET": "Zur Vermietung gelangt eine moderne …",
"WIDGET_TEXT/Lage": "Die Wohnung befindet sich in Haidach …",
"WIDGET_TEXT/untitled": "Privatperson",
"ALL_IMAGE_REFERENCE_URLS": "https://cache.willhaben.at/mmo/1/135/530/9481_317321052.jpg;…",
"IMAGE_REFERENCE_URL": [
"https://cache.willhaben.at/mmo/1/135/530/9481_317321052.jpg",
"https://cache.willhaben.at/mmo/1/135/530/9481_731380688.jpg"
],
"_readme_note": "Omitted: remaining IMAGE_REFERENCE_URL entries and full widget text."
}

Output fields (willhaben_detail, flattened) — field-by-field

Row identity and crawl metadata

  • source — Always willhaben_detail for rows produced from the list → detail pipeline.
  • willhabenKind — Vertical classifier: real_estate, marketplace, car, or jobs (Jobs rows use a different top-level schema; see Output Structure).
  • listingId — Listing id in the mapped schema when the mapper fills it; may be null in flattened runs if the id appears only under ADID / listingDocumentId.
  • url — Public www URL for the listing when the mapper sets it; often null when only API ids are available.
  • platform — Always willhaben.
  • scrapedAt — ISO timestamp when the row was assembled (mapper / push time).

Mapped schema — publication and transaction

  • publishedAt — Normalised “published” time string when the mapper fills it from attributes; else null (raw publish values may still appear under PUBLISHED / PUBLISHED_String).
  • updatedAt — Last-updated field when mapped; else null.
  • objectType — High-level object type when mapped (e.g. property type); else null.
  • transactionType — buy or rent inferred from the iad path (e.g. Mietwohnung → rent).

Mapped schema — title, description, money

  • title — Listing heading from the mapper (often aligns with HEADING).
  • description — Long description in the mapped schema when extracted; else null (teaser may be in BODY_DYN or DESCRIPTION_FROM_WIDGET).
  • price — Numeric price when mapped as a single number; else null.
  • priceText — Human-readable price string when mapped; else null.
  • currency — Currency code when set (e.g. EUR).
  • pricePerM2 — Price per square metre when mapped; else null.
  • rentGross — Gross rent per month (or period per mapper rules) when applicable; e.g. 1300.
  • rentNet — Net rent when mapped; else null.
  • rentPerM2 — Rent per m² when mapped; else null.
  • operatingCosts — Nebenkosten / operating costs when mapped; else null.
  • heatingCosts — Heating costs when mapped; else null.
  • parkingCosts — Parking costs when mapped; else null.
  • additionalCostsTotal — Aggregated extra costs when mapped; else null.

Mapped schema — areas and rooms

  • livingAreaM2 — Living area in m² when mapped; else null (see ESTATE_SIZE / ESTATE_SIZE/LIVING_AREA).
  • usableAreaM2 — Usable area when mapped; else null.
  • totalAreaM2 — Total area when mapped; else null.
  • plotAreaM2 — Plot / land area when mapped; else null.
  • roomCount — Room count when mapped; else null (see NUMBER_OF_ROOMS / ROOMS).
  • bedroomCount — Bedrooms when mapped; else null.
  • bathroomCount — Bathrooms when mapped; else null.
  • floor — Floor / storey when mapped; else null.

Mapped schema — address and geo

  • street — Street line when mapped; else null (see ADDRESS).
  • postalCode — PLZ when mapped; else null (see POSTCODE).
  • city — City / Ort when mapped; else null (see LOCATION).
  • district — Bezirk when mapped; else null (see DISTRICT).
  • region — Bundesland / region when mapped; else null (see STATE).
  • country — Country when mapped; else null (see COUNTRY).
  • latitude — Latitude when mapped; else null (may parse from COORDINATES).
  • longitude — Longitude when mapped; else null.

Mapped schema — features (booleans / enums)

  • hasSeaView, hasMountainView, hasBalcony, hasTerrace, hasLoggia, hasGarden, hasYard, hasPool, hasGarage, hasCarport, hasParkingSpace, hasStorageRoom, hasElevator, hasBasement, hasAirConditioning, hasBuiltInKitchen, isBarrierFree — Feature flags from the mapper; false when not set from data.
  • distanceToSea, distanceToCenter — Distance fields when mapped; else null.
  • yearBuilt, condition, energyCertificateAvailable, energyClass, heatingType — Building / energy fields when mapped; else null / false.

Mapped schema — seller and media

  • sellerName — Advertiser / org display name when mapped; else null.
  • sellerType — Seller type when mapped (e.g. private vs dealer); else null.
  • phone, email — Contact fields when mapped; else null.
  • imageUrls — Array of main image URLs in the mapped schema (may be empty if images only appear under IMAGE_REFERENCE_URL / ALL_IMAGE_REFERENCE_URLS).
  • videoUrls — Video URLs when mapped; else [].
  • locationText — Free-text location line when mapped; else null.
  • rawHtmlSnippet — Optional HTML snippet when captured; else null.

Apify and traceability

  • apify_actor — Actor name (willhaben-cheerio).
  • apify_scrapedAt — ISO timestamp when the dataset row was written.
  • originalInputUrl — The www start URL from input that led to this crawl branch.
  • listPage — 1-based list page index on which this listing appeared.
  • listingDocumentId — Numeric ad id (string) in the merged listing document.
  • listingDocumentUuid — UUID of the advert in the merged listing document.

Willhaben attributes (flattened from listingDocument.attributesFlat)

  • LOCATION — Location label from search / advert attributes (e.g. locality name).
  • POSTCODE — Postal code (PLZ).
  • STATE — Austrian Bundesland (or state label).
  • BODY_DYN — Short dynamic body / teaser text from the listing payload.
  • ORG_UUID — Organisation UUID associated with the advertiser.
  • ESTATE_SIZE/LIVING_AREA — Living area as string (m²), from slashed attribute name.
  • DISTRICT — District (Bezirk) label.
  • HEADING — Listing headline from Willhaben attributes.
  • LOCATION_QUALITY — Internal quality / scoring field from the API when present.
  • PUBLISHED — Publish time in epoch milliseconds as string.
  • COUNTRY — Country label (e.g. Ă–sterreich).
  • LOCATION_ID — Willhaben location id.
  • PROPERTY_TYPE — Property type label (e.g. ErdgeschoĂźwohnung).
  • NUMBER_OF_ROOMS — Room count as string.
  • ADTYPE_ID — Ad type identifier.
  • PROPERTY_TYPE_ID — Property type identifier.
  • ADID — Advert id (string); same listing as listingDocumentId when aligned.
  • ORGID — Organisation id.
  • SEO_URL — SEO path segment for the listing on www.
  • ALL_IMAGE_URLS — Semicolon-separated relative image paths under Willhaben’s image cache convention.
  • PUBLISHED_String — ISO-style published timestamp string from the API.
  • ESTATE_PREFERENCE — Comma-separated preference / filter codes when present.
  • categorytreeids — Category tree ids for navigation / classification.
  • RENT/PER_MONTH_LETTINGS — Monthly rent as string from slashed attribute name.
  • PRODUCT_ID — Product vertical id (e.g. real estate product code).
  • MMO — Primary MMO image path (multi-media object key).
  • ROOMS — Encoded rooms bucket (e.g. 3X3) from Willhaben.
  • AD_UUID — Advert UUID (matches listingDocumentUuid when aligned).
  • ADDRESS — Street address line.
  • COORDINATES — lat,lon string when present.
  • PRICE — Raw price string from attributes.
  • PRICE_FOR_DISPLAY — Localised display price (e.g. € 1.300).
  • ESTATE_SIZE — Size string (often living area in m²).
  • ISPRIVATE — "1" / "0" style flag for private vs commercial when provided.
  • PROPERTY_TYPE_FLAT — String boolean for flat / apartment classification when present.

Display and widget-derived keys (detail layout)

  • DISPLAY/Gesamtmiete, DISPLAY/Wohnfläche, DISPLAY/Zimmer — Human-readable lines extracted from TITLE_WITH_ATTRIBUTES-style widgets (labels depend on locale and vertical).
  • WIDGET_TEXT/Objektstandort — Free text from a widget section (here: object location block).
  • WIDGET/Objektinformation/Objekttyp, VerfĂĽgbar, Bautyp, Heizung, Zustand — Key–value pairs from KEY_VALUE_PAIRS_LIST widgets under Objektinformation.
  • DESCRIPTION_FROM_WIDGET — Long description assembled from PARAGRAPHED_TEXT / Beschreibung-style widget content when present.
  • WIDGET_TEXT/Lage — Location description paragraph from widgets.
  • WIDGET_TEXT/untitled — Widget paragraph where the section had no title (e.g. Privatperson).
  • ALL_IMAGE_REFERENCE_URLS — Semicolon-separated absolute cache URLs for images.
  • IMAGE_REFERENCE_URL — Array of absolute image URLs (same images as above, as a list).

Sample: Jobs (first object in data-jobs.json)

The JSON below is the first record of a real Jobs export (flattenOutput: true). _readme_note is documentation-only and does not appear in live output.

{
"willhabenKind": "jobs",
"source": "willhaben_jobs_detail",
"id": 13158523,
"title": "Zimmerer Hilfskraft (m/w/d)",
"slugTitle": "zimmerer-hilfskraft-m-w-d",
"description": "Zimmerer Hilfskraft (m/w/d)\nBruttogehalt: € 2.720 monatlich",
"employmentTime": "ab sofort",
"position": "Mitarbeiter:in",
"firstPublishDate": "2026-04-07T01:50:00",
"lastModifiedDate": "2026-04-07T01:50:00",
"expiryDate": null,
"lastReorderDate": "2026-04-07T01:50:00",
"overpay": false,
"forceExternalApplicationForm": true,
"salary": 2720,
"salaryTimeFrame": "monatlich",
"isExpired": false,
"employmentModes": ["Teilzeit", "Vollzeit"],
"jobLocations": [
{
"name": "Sankt Johann im Pongau",
"federalState": "Sankt Johann im Pongau",
"country": "Ă–sterreich"
}
],
"languageSkills": [],
"company": {
"id": 79439,
"title": "Maschinenring Personal u Service eGen",
"slugTitle": "maschinenring-personal-u-service-egen",
"type": "Firma",
"uidNumber": null,
"url": "https://www.willhaben.at/jobs/firma/personaldienstleister/79439",
"logoUrl": "https://www.willhaben.at/jobs/api/v1/images/public/6355402?resolution=480",
"industry": "Personaldienstleistungen",
"address": null,
"foundingYear": null,
"employeeCountFrom": null,
"employeeCountTo": null,
"activeAdverts": null
},
"contact": {},
"applyUrl": null,
"apify_scrapedAt": "2026-04-07T06:42:52.366Z",
"apify_extracted_emails": []
}

Output fields (Jobs) — field-by-field

Row labels (flattened Jobs runs only)

  • willhabenKind — Always jobs when this wrapper is present (flattenOutput: true).
  • source — willhaben_jobs_detail: row produced by the Jobs detail mapper (not the iad willhaben_detail pipeline).

Job identity and copy

  • id — Numeric Willhaben job advert id (same id used for the job detail request; see detailUrl in export).
  • title — Job title / headline (from detail widgets + list row).
  • slugTitle — URL-style slug derived from title (lowercase, hyphenated, diacritics normalised).
  • description — Short teaser plus optional lines such as Bruttogehalt parsed from detail widgets. Not the full long HTML description (that content is behind authenticated / WEB_VIEW flows on Willhaben’s side).

Employment and dates

  • employmentTime — Start / availability label (e.g. ab sofort) from JOB_OFFER_DETAILS widget attributes when present.
  • position — Position level label (e.g. Mitarbeiter:in, Lehre) from the same widget strip.
  • firstPublishDate — First publish timestamp string derived from the list row (PUBLISHED_String) when available.
  • lastModifiedDate — Last modified string merged from ADVERT_INFO (“Zuletzt geändert”) and publish time when available.
  • expiryDate — Advert end date when the API exposes it; often null on the public widget payload.
  • lastReorderDate — Reorder / bump date aligned with publish when available; else mirrors firstPublishDate.

Salary and application flags

  • overpay — true when the salary text indicates Ăśberzahlung / willingness to pay above scale.
  • forceExternalApplicationForm — true when the list row indicates the application is not only internal to Willhaben (isInternalApplication === false).
  • salary — Parsed gross amount as a number (e.g. 2720), when a € amount is found in Bruttogehalt / salary widget text.
  • salaryTimeFrame — monatlich, stĂĽndlich, etc., inferred from that text when possible.
  • isExpired — false by default for scraped active rows; true only if the mapper sets it from API data.

Modes, locations, languages

  • employmentModes — Array of employment types (e.g. Teilzeit, Vollzeit, Freiberuflich) from the first attribute line of JOB_OFFER_DETAILS or from the list row’s EMPLOYMENT_TYPE.
  • jobLocations — Array of place objects for the job’s advertised locations:
    • name — Place label (city, region, or free-text fragment from the API).
    • federalState — Bundesland or same as name when only a coarse label exists (heuristic).
    • country — Ă–sterreich, Deutschland, etc., inferred from known regions (e.g. Bayern → Germany).
  • languageSkills — Array of language requirements when the API provides them; often [] on the public widget response.

Company (nested object)

  • company.id — Willhaben company id (parsed from company follow / agent links in widgets when present).
  • company.title — Legal or display company name.
  • company.slugTitle — Slug for the company profile URL.
  • company.type — e.g. Firma or Personaldienstleister when set; default Firma in the mapper when unknown.
  • company.uidNumber — Austrian UID when present; else null.
  • company.url — Link to the www company profile when a companyProfile context link exists.
  • company.logoUrl — Company logo URL on Willhaben’s www jobs CDN when present (often with a fixed resolution query).
  • company.industry — Industry / sector label from COMPANY_INFORMATION when present.
  • company.address — Structured address (street, zipCode, city, country) when the API provides it; else null.
  • company.foundingYear, employeeCountFrom, employeeCountTo, activeAdverts — Firmographics when present; often null on the public widget payload.

Contact and apply

  • contact — Object for contact person fields. Often {} when widgets do not expose name/email; if an email is found in description, contact.email may be set to the first match.
  • applyUrl — External application URL when discovered in the payload; often null (many jobs only link to login-gated content).

Apify-only

  • apify_scrapedAt — ISO timestamp when this dataset row was written.
  • apify_extracted_emails — Deduplicated email addresses regex-matched from title, description, and contact fields (order preserved).

Optional fields (not in this sample)

Some jobs may include extra keys when present in the source payload or mapper, e.g. professionalExperience (boolean). Treat the schema as stable core + optional extensions.


Benefits of the Willhaben scraper

  • Four verticals in one actor: Marktplatz, Immobilien, Auto & Motor, and Jobs, matching Willhaben’s main navigation.
  • List + detail: structured fields from detail JSON, not only search snippets.
  • Flatten mode for CSV-friendly exports: mapped columns plus raw Willhaben attribute keys on one row (iad verticals).
  • Traceability: detailUrl, originalInputUrl, listPage on iad rows; Jobs rows focus on job + company fields (trace fields are not duplicated on the job object).

Why Choose This Actor?

Built for Austrian marketplace and classifieds research on willhaben.at: discovery from www URLs, list JSON then detail JSON. Outputs suit warehouses, BI, or price monitoring.

Use cases:

  • Track rentals and sales in Immobilien with filters from the URL.
  • Export Marktplatz or Auto listings with attributes and images.
  • Collect Jobs in a native job JSON shape plus extracted emails.

Technical Implementation

  1. URL routing (willhaben-url.ts, willhaben-classify.ts): Detects Jobs (/jobs/) vs iad paths (Immobilien, Marktplatz, Auto); builds the correct internal list URL per vertical; preserves raw query encoding for slashed keys.
  2. Shared list/detail logic (willhaben-list-detail-logic.ts): processWillhabenListPage (per-origin maxItems, pagination) and pushWillhabenDetailFromJson.
  3. Pipeline (willhaben-internal-run.ts + main.ts): one FIFO queue for list and detail tasks (details enqueued before list follow-ups), maxConcurrency workers, impit only; proxy rotation on 401/403/429; retries from maxRequestRetries. Optional reference router: routes.ts (legacy Cheerio shape, not used at runtime).

Explore More Scrapers

If you found this actor useful, check out other scrapers at memo23's Apify profile.


Support


Additional Services