Willhaben.at [Only $1đź’°] Marktplatz-Immobilien-Auto-Job Scraper
Pricing
from $1.00 / 1,000 results
Willhaben.at [Only $1đź’°] Marktplatz-Immobilien-Auto-Job Scraper
đź’°$1 per 1,000 results, unlimited extraction. Extract structured data from willhaben.at across Marktplatz, Immobilien, Auto & Motor, and Jobs. Impit TLS. Fields: title, price/rent, area, rooms, address, images, contact, listing IDs; search and detail trace URLs. Jobs: role, company, location, emails
Pricing
from $1.00 / 1,000 results
Rating
5.0
(1)
Developer
Muhamed Didovic
Actor stats
0
Bookmarked
15
Total users
13
Monthly active users
6 days ago
Last modified
Categories
Share
Overview
Extract structured listings from willhaben.at (Austria). The site is organised into four main areas—Marktplatz, Immobilien, Auto & Motor, and Jobs—as shown in the main navigation. The actor calls Willhaben’s list APIs (search results) and public detail JSON for each listing using impit (Chrome TLS fingerprinting). You get one dataset row per listing after the detail step, with a normalised flat schema plus Willhaben attribute keys (and optional widget-derived fields) for spreadsheets.
Use it to monitor ads, export price and location data, or feed analytics; Jobs runs use a dedicated job object shape (see Output Structure).
Features
-
Four supported verticals (aligned with Willhaben’s top navigation):
- Marktplatz — classifieds under
/iad/kaufen-und-verkaufen/…(marketplace). - Immobilien — real estate under
/iad/immobilien/…. - Auto & Motor — vehicles under
/iad/auto/…(and related iad auto paths). - Jobs — job search under
/jobs/…, resolved internally to Willhaben’s jobs list JSON endpoint.
- Marktplatz — classifieds under
-
List + detail pipeline:
- List: iad www URLs are turned into search JSON requests; Jobs www URLs into a separate jobs search JSON list.
- Detail: each hit is followed by a public detail JSON request (listing or job id) until
maxItemscaps queued detail requests perstartUrlsentry (each www link has its own budget).
-
Pagination:
- iad / search:
pageon the list request (query order preserved; slashed keys likeESTATE_SIZE/LIVING_AREA_FROMare not re-encoded). - Jobs:
pageon the jobs list request (1-based); further pages enqueued fromrowsFound/rowsRequested.
- iad / search:
-
Flattened export (
flattenOutput: true):- One flat object per listing: mapped fields plus all
listingDocument.attributesFlatkeys at the top level (collisions →attr_<name>). The sample below reflects this mode.
- One flat object per listing: mapped fields plus all
How to Use
- Set Up: Apify account and this actor (or run locally with
apify run/npm run start:dev). - Provide Input: Add one or more Willhaben URLs under
startUrls(https://www.willhaben.at/…). - Configure: Set
maxItems(cap on detail requests queued per start URL),flattenOutput, concurrency, retries, and proxy (Austria / residential often works well forapifyProxyCountry: AT). - Run & Export: Download JSON / CSV from the dataset. If list or detail returns 403, refresh the request headers / signatures from a capture or use the request-signature env vars documented in the repo.
Usage Limitations
Free / non-paying Apify users may be subject to platform limits on dataset items or charges. Paid users typically get higher limits; adjust maxItems to control how many detail pages are fetched per start URL (each link in startUrls can queue up to maxItems details). Willhaben may rate-limit or block datacenter IPs—proxy is recommended (e.g. RESIDENTIAL with AT).
Input Configuration
Example input:
{"startUrls": [{"url": "https://www.willhaben.at/jobs/suche?employment_type=109&location=Salzburg®ion=14096"},{"url": "https://www.willhaben.at/iad/immobilien/mietwohnungen/mietwohnung-angebote?sfId=0db5c6aa-6f06-4760-a077-5d2d88453916&rows=30&areaId=5&page=1&PRICE_FROM=150&PRICE_TO=1300"},{"url": "https://www.willhaben.at/iad/kaufen-und-verkaufen/marktplatz/kinderfeste-kinderfeiern-4282/a/zustand-neu-22?sfId=60dce79c-cd0d-421b-ab8a-d278d9dea396&rows=30&isNavigation=true&PRICE_FROM=0&PRICE_TO=1"}],"maxItems": 30,"flattenOutput": true,"maxConcurrency": 50,"minConcurrency": 1,"maxRequestRetries": 100,"proxy": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"],"apifyProxyCountry": "AT"}}
Input Fields Explanation
- startUrls (
startUrls): Array of objects withurlpointing to a www.willhaben.at listing hub—Jobs (/jobs/…), Immobilien, Marktplatz, or Auto under/iad/…. The actor converts these to the appropriate list URL. - maxItems (
maxItems): Maximum number of listings for which a detail request is queued, separately for each object instartUrls(keyed by that row’surl). Two start URLs withmaxItems: 30can yield up to 60 detail scrapes. Default 30 (see actor schema). - flattenOutput (
flattenOutput): When true, each row is a single flat object (mapped fields + attribute flat index at the top level for iad verticals). Jobs: addswillhabenKind: jobsandsource: willhaben_jobs_detailaround the job object. When false, iad rows include nestedlistingDocumentandactorMeta; Jobs rows are the job object +apify_*only. Default false in schema; the Immobilien sample below uses flattened shape;data-jobs.jsonshows flattened Jobs. - maxConcurrency (
maxConcurrency): Maximum list + detail requests in flight at once (shared queue).minConcurrencyin the schema is ignored by this actor. - maxRequestRetries (
maxRequestRetries): Retries per failed list/detail impit fetch (with proxy rotation on 401/403/429). - proxy (
proxy): Apify proxy or custom configuration;apifyProxyCountry: ATis a common choice for Willhaben.
Output Structure
The dataset contains one row per listing (or one row per job) after the detail step.
iad / Auto / Marktplatz / Immobilien (not Jobs):
source: willhaben_detailwillhabenKind:real_estate,marketplace, orcar- With
flattenOutput: true, see the first sample below (data.json). - With
flattenOutput: false, the same logical data appears under nestedlistingDocumentandactorMeta.
Jobs:
- Native-style job object fields (
id,title,company,employmentModes,jobLocations, …) plusapify_scrapedAtandapify_extracted_emails. - With
flattenOutput: true, rows also includewillhabenKind: jobsandsource: willhaben_jobs_detail(seedata-jobs.jsonsample below). - With
flattenOutput: false, rows are only the job fields plus the twoapify_*fields (nolistingDocument).
Filter non-Jobs rows with source === 'willhaben_detail'. Filter Jobs with willhabenKind === 'jobs' (and optionally source === 'willhaben_jobs_detail' when flattened).
Sample: willhaben_detail (first object in data.json)
The JSON below is based on the first record of a real export (willhabenKind: real_estate, flattenOutput: true). Long strings, internal URLs, and image arrays are shortened/redacted for the README; the on-disk file contains the full values. _readme_note is documentation-only and does not appear in live output.
{"source": "willhaben_detail","willhabenKind": "real_estate","listingId": null,"url": null,"platform": "willhaben","scrapedAt": "2026-04-07T06:43:48.489Z","publishedAt": null,"updatedAt": null,"objectType": null,"transactionType": "rent","title": "Erdgeschosswohnung mit Garten","description": null,"price": null,"priceText": null,"currency": "EUR","pricePerM2": null,"rentGross": 1300,"rentNet": null,"rentPerM2": null,"operatingCosts": null,"heatingCosts": null,"parkingCosts": null,"additionalCostsTotal": null,"livingAreaM2": null,"usableAreaM2": null,"totalAreaM2": null,"plotAreaM2": null,"roomCount": null,"bedroomCount": null,"bathroomCount": null,"floor": null,"street": null,"postalCode": null,"city": null,"district": null,"region": null,"country": null,"latitude": null,"longitude": null,"hasSeaView": false,"distanceToSea": null,"distanceToCenter": null,"hasMountainView": false,"yearBuilt": null,"condition": null,"energyCertificateAvailable": false,"energyClass": null,"heatingType": null,"hasBalcony": false,"hasTerrace": false,"hasLoggia": false,"hasGarden": false,"hasYard": false,"hasPool": false,"hasGarage": false,"hasCarport": false,"hasParkingSpace": false,"hasStorageRoom": false,"hasElevator": false,"hasBasement": false,"hasAirConditioning": false,"hasBuiltInKitchen": false,"isBarrierFree": false,"sellerName": null,"sellerType": null,"phone": null,"email": null,"imageUrls": [],"videoUrls": [],"locationText": null,"rawHtmlSnippet": null,"apify_actor": "willhaben-cheerio","apify_scrapedAt": "2026-04-07T06:43:48.489Z","detailUrl": "[redacted internal detail URL]","originalInputUrl": "[redacted input URL]","listPage": 1,"listingDocumentId": "1355309481","listingDocumentUuid": "95079344-3e17-4b08-8be2-b34e9d305250","LOCATION": "Aigelsbrunn","POSTCODE": "5204","STATE": "Salzburg","BODY_DYN": "Zur Vermietung gelangt eine moderne und gepflegte Erdgeschosswohnung …","ORG_UUID": "0bf5c502-ee07-4290-aa9d-c82a3667e9e5","ESTATE_SIZE/LIVING_AREA": "77","DISTRICT": "Salzburg-Umgebung","HEADING": "Erdgeschosswohnung mit Garten","LOCATION_QUALITY": "1.0","PUBLISHED": "1775498880000","COUNTRY": "Österreich","LOCATION_ID": "113901","PROPERTY_TYPE": "Erdgeschoßwohnung","NUMBER_OF_ROOMS": "3","ADTYPE_ID": "2","PROPERTY_TYPE_ID": "105","ADID": "1355309481","ORGID": "24712601","SEO_URL": "immobilien/d/mietwohnungen/salzburg/salzburg-umgebung/erdgeschosswohnung-mit-garten-1355309481/","ALL_IMAGE_URLS": "1/135/530/9481_317321052.jpg;1/135/530/9481_731380688.jpg;…","PUBLISHED_String": "2026-04-06T20:08:00Z","ESTATE_PREFERENCE": "15, 24, 250, 27, 28","categorytreeids": "7276","RENT/PER_MONTH_LETTINGS": "1300.0","PRODUCT_ID": "200","MMO": "1/135/530/9481_317321052.jpg","ROOMS": "3X3","AD_UUID": "95079344-3e17-4b08-8be2-b34e9d305250","ADDRESS": "Haidach 14","COORDINATES": "47.99015,13.24095","PRICE": "1300","PRICE_FOR_DISPLAY": "€ 1.300","ESTATE_SIZE": "77","ISPRIVATE": "1","PROPERTY_TYPE_FLAT": "true","DISPLAY/Gesamtmiete": "€ 1.300","DISPLAY/Wohnfläche": "77 m²","DISPLAY/Zimmer": "3","WIDGET_TEXT/Objektstandort": "Haidach 14,\n5204 Aigelsbrunn, Salzburg-Umgebung, Salzburg","WIDGET/Objektinformation/Objekttyp": "Erdgeschoßwohnung","WIDGET/Objektinformation/Verfügbar": "nach Vereinbarung","WIDGET/Objektinformation/Bautyp": "Altbau","WIDGET/Objektinformation/Heizung": "Pellets","WIDGET/Objektinformation/Zustand": "Renoviert","DESCRIPTION_FROM_WIDGET": "Zur Vermietung gelangt eine moderne …","WIDGET_TEXT/Lage": "Die Wohnung befindet sich in Haidach …","WIDGET_TEXT/untitled": "Privatperson","ALL_IMAGE_REFERENCE_URLS": "https://cache.willhaben.at/mmo/1/135/530/9481_317321052.jpg;…","IMAGE_REFERENCE_URL": ["https://cache.willhaben.at/mmo/1/135/530/9481_317321052.jpg","https://cache.willhaben.at/mmo/1/135/530/9481_731380688.jpg"],"_readme_note": "Omitted: remaining IMAGE_REFERENCE_URL entries and full widget text."}
Output fields (willhaben_detail, flattened) — field-by-field
Row identity and crawl metadata
source— Alwayswillhaben_detailfor rows produced from the list → detail pipeline.willhabenKind— Vertical classifier:real_estate,marketplace,car, orjobs(Jobs rows use a different top-level schema; see Output Structure).listingId— Listing id in the mapped schema when the mapper fills it; may benullin flattened runs if the id appears only underADID/listingDocumentId.url— Public www URL for the listing when the mapper sets it; oftennullwhen only API ids are available.platform— Alwayswillhaben.scrapedAt— ISO timestamp when the row was assembled (mapper / push time).
Mapped schema — publication and transaction
publishedAt— Normalised “published” time string when the mapper fills it from attributes; elsenull(raw publish values may still appear underPUBLISHED/PUBLISHED_String).updatedAt— Last-updated field when mapped; elsenull.objectType— High-level object type when mapped (e.g. property type); elsenull.transactionType—buyorrentinferred from the iad path (e.g. Mietwohnung →rent).
Mapped schema — title, description, money
title— Listing heading from the mapper (often aligns withHEADING).description— Long description in the mapped schema when extracted; elsenull(teaser may be inBODY_DYNorDESCRIPTION_FROM_WIDGET).price— Numeric price when mapped as a single number; elsenull.priceText— Human-readable price string when mapped; elsenull.currency— Currency code when set (e.g.EUR).pricePerM2— Price per square metre when mapped; elsenull.rentGross— Gross rent per month (or period per mapper rules) when applicable; e.g. 1300.rentNet— Net rent when mapped; elsenull.rentPerM2— Rent per m² when mapped; elsenull.operatingCosts— Nebenkosten / operating costs when mapped; elsenull.heatingCosts— Heating costs when mapped; elsenull.parkingCosts— Parking costs when mapped; elsenull.additionalCostsTotal— Aggregated extra costs when mapped; elsenull.
Mapped schema — areas and rooms
livingAreaM2— Living area in m² when mapped; elsenull(seeESTATE_SIZE/ESTATE_SIZE/LIVING_AREA).usableAreaM2— Usable area when mapped; elsenull.totalAreaM2— Total area when mapped; elsenull.plotAreaM2— Plot / land area when mapped; elsenull.roomCount— Room count when mapped; elsenull(seeNUMBER_OF_ROOMS/ROOMS).bedroomCount— Bedrooms when mapped; elsenull.bathroomCount— Bathrooms when mapped; elsenull.floor— Floor / storey when mapped; elsenull.
Mapped schema — address and geo
street— Street line when mapped; elsenull(seeADDRESS).postalCode— PLZ when mapped; elsenull(seePOSTCODE).city— City / Ort when mapped; elsenull(seeLOCATION).district— Bezirk when mapped; elsenull(seeDISTRICT).region— Bundesland / region when mapped; elsenull(seeSTATE).country— Country when mapped; elsenull(seeCOUNTRY).latitude— Latitude when mapped; elsenull(may parse fromCOORDINATES).longitude— Longitude when mapped; elsenull.
Mapped schema — features (booleans / enums)
hasSeaView,hasMountainView,hasBalcony,hasTerrace,hasLoggia,hasGarden,hasYard,hasPool,hasGarage,hasCarport,hasParkingSpace,hasStorageRoom,hasElevator,hasBasement,hasAirConditioning,hasBuiltInKitchen,isBarrierFree— Feature flags from the mapper;falsewhen not set from data.distanceToSea,distanceToCenter— Distance fields when mapped; elsenull.yearBuilt,condition,energyCertificateAvailable,energyClass,heatingType— Building / energy fields when mapped; elsenull/false.
Mapped schema — seller and media
sellerName— Advertiser / org display name when mapped; elsenull.sellerType— Seller type when mapped (e.g. private vs dealer); elsenull.phone,email— Contact fields when mapped; elsenull.imageUrls— Array of main image URLs in the mapped schema (may be empty if images only appear underIMAGE_REFERENCE_URL/ALL_IMAGE_REFERENCE_URLS).videoUrls— Video URLs when mapped; else[].locationText— Free-text location line when mapped; elsenull.rawHtmlSnippet— Optional HTML snippet when captured; elsenull.
Apify and traceability
apify_actor— Actor name (willhaben-cheerio).apify_scrapedAt— ISO timestamp when the dataset row was written.originalInputUrl— The www start URL from input that led to this crawl branch.listPage— 1-based list page index on which this listing appeared.listingDocumentId— Numeric ad id (string) in the merged listing document.listingDocumentUuid— UUID of the advert in the merged listing document.
Willhaben attributes (flattened from listingDocument.attributesFlat)
LOCATION— Location label from search / advert attributes (e.g. locality name).POSTCODE— Postal code (PLZ).STATE— Austrian Bundesland (or state label).BODY_DYN— Short dynamic body / teaser text from the listing payload.ORG_UUID— Organisation UUID associated with the advertiser.ESTATE_SIZE/LIVING_AREA— Living area as string (m²), from slashed attribute name.DISTRICT— District (Bezirk) label.HEADING— Listing headline from Willhaben attributes.LOCATION_QUALITY— Internal quality / scoring field from the API when present.PUBLISHED— Publish time in epoch milliseconds as string.COUNTRY— Country label (e.g. Österreich).LOCATION_ID— Willhaben location id.PROPERTY_TYPE— Property type label (e.g. Erdgeschoßwohnung).NUMBER_OF_ROOMS— Room count as string.ADTYPE_ID— Ad type identifier.PROPERTY_TYPE_ID— Property type identifier.ADID— Advert id (string); same listing aslistingDocumentIdwhen aligned.ORGID— Organisation id.SEO_URL— SEO path segment for the listing on www.ALL_IMAGE_URLS— Semicolon-separated relative image paths under Willhaben’s image cache convention.PUBLISHED_String— ISO-style published timestamp string from the API.ESTATE_PREFERENCE— Comma-separated preference / filter codes when present.categorytreeids— Category tree ids for navigation / classification.RENT/PER_MONTH_LETTINGS— Monthly rent as string from slashed attribute name.PRODUCT_ID— Product vertical id (e.g. real estate product code).MMO— Primary MMO image path (multi-media object key).ROOMS— Encoded rooms bucket (e.g.3X3) from Willhaben.AD_UUID— Advert UUID (matcheslistingDocumentUuidwhen aligned).ADDRESS— Street address line.COORDINATES—lat,lonstring when present.PRICE— Raw price string from attributes.PRICE_FOR_DISPLAY— Localised display price (e.g. € 1.300).ESTATE_SIZE— Size string (often living area in m²).ISPRIVATE—"1"/"0"style flag for private vs commercial when provided.PROPERTY_TYPE_FLAT— String boolean for flat / apartment classification when present.
Display and widget-derived keys (detail layout)
DISPLAY/Gesamtmiete,DISPLAY/Wohnfläche,DISPLAY/Zimmer— Human-readable lines extracted from TITLE_WITH_ATTRIBUTES-style widgets (labels depend on locale and vertical).WIDGET_TEXT/Objektstandort— Free text from a widget section (here: object location block).WIDGET/Objektinformation/Objekttyp,Verfügbar,Bautyp,Heizung,Zustand— Key–value pairs from KEY_VALUE_PAIRS_LIST widgets under Objektinformation.DESCRIPTION_FROM_WIDGET— Long description assembled from PARAGRAPHED_TEXT / Beschreibung-style widget content when present.WIDGET_TEXT/Lage— Location description paragraph from widgets.WIDGET_TEXT/untitled— Widget paragraph where the section had no title (e.g. Privatperson).ALL_IMAGE_REFERENCE_URLS— Semicolon-separated absolute cache URLs for images.IMAGE_REFERENCE_URL— Array of absolute image URLs (same images as above, as a list).
Sample: Jobs (first object in data-jobs.json)
The JSON below is the first record of a real Jobs export (flattenOutput: true). _readme_note is documentation-only and does not appear in live output.
{"willhabenKind": "jobs","source": "willhaben_jobs_detail","id": 13158523,"title": "Zimmerer Hilfskraft (m/w/d)","slugTitle": "zimmerer-hilfskraft-m-w-d","description": "Zimmerer Hilfskraft (m/w/d)\nBruttogehalt: € 2.720 monatlich","employmentTime": "ab sofort","position": "Mitarbeiter:in","firstPublishDate": "2026-04-07T01:50:00","lastModifiedDate": "2026-04-07T01:50:00","expiryDate": null,"lastReorderDate": "2026-04-07T01:50:00","overpay": false,"forceExternalApplicationForm": true,"salary": 2720,"salaryTimeFrame": "monatlich","isExpired": false,"employmentModes": ["Teilzeit", "Vollzeit"],"jobLocations": [{"name": "Sankt Johann im Pongau","federalState": "Sankt Johann im Pongau","country": "Österreich"}],"languageSkills": [],"company": {"id": 79439,"title": "Maschinenring Personal u Service eGen","slugTitle": "maschinenring-personal-u-service-egen","type": "Firma","uidNumber": null,"url": "https://www.willhaben.at/jobs/firma/personaldienstleister/79439","logoUrl": "https://www.willhaben.at/jobs/api/v1/images/public/6355402?resolution=480","industry": "Personaldienstleistungen","address": null,"foundingYear": null,"employeeCountFrom": null,"employeeCountTo": null,"activeAdverts": null},"contact": {},"applyUrl": null,"apify_scrapedAt": "2026-04-07T06:42:52.366Z","apify_extracted_emails": []}
Output fields (Jobs) — field-by-field
Row labels (flattened Jobs runs only)
willhabenKind— Alwaysjobswhen this wrapper is present (flattenOutput: true).source—willhaben_jobs_detail: row produced by the Jobs detail mapper (not the iadwillhaben_detailpipeline).
Job identity and copy
id— Numeric Willhaben job advert id (same id used for the job detail request; seedetailUrlin export).title— Job title / headline (from detail widgets + list row).slugTitle— URL-style slug derived fromtitle(lowercase, hyphenated, diacritics normalised).description— Short teaser plus optional lines such as Bruttogehalt parsed from detail widgets. Not the full long HTML description (that content is behind authenticated / WEB_VIEW flows on Willhaben’s side).
Employment and dates
employmentTime— Start / availability label (e.g. ab sofort) from JOB_OFFER_DETAILS widget attributes when present.position— Position level label (e.g. Mitarbeiter:in, Lehre) from the same widget strip.firstPublishDate— First publish timestamp string derived from the list row (PUBLISHED_String) when available.lastModifiedDate— Last modified string merged from ADVERT_INFO (“Zuletzt geändert”) and publish time when available.expiryDate— Advert end date when the API exposes it; oftennullon the public widget payload.lastReorderDate— Reorder / bump date aligned with publish when available; else mirrorsfirstPublishDate.
Salary and application flags
overpay—truewhen the salary text indicates Überzahlung / willingness to pay above scale.forceExternalApplicationForm—truewhen the list row indicates the application is not only internal to Willhaben (isInternalApplication === false).salary— Parsed gross amount as a number (e.g. 2720), when a € amount is found in Bruttogehalt / salary widget text.salaryTimeFrame—monatlich,stündlich, etc., inferred from that text when possible.isExpired—falseby default for scraped active rows;trueonly if the mapper sets it from API data.
Modes, locations, languages
employmentModes— Array of employment types (e.g. Teilzeit, Vollzeit, Freiberuflich) from the first attribute line of JOB_OFFER_DETAILS or from the list row’sEMPLOYMENT_TYPE.jobLocations— Array of place objects for the job’s advertised locations:name— Place label (city, region, or free-text fragment from the API).federalState— Bundesland or same asnamewhen only a coarse label exists (heuristic).country—Österreich,Deutschland, etc., inferred from known regions (e.g. Bayern → Germany).
languageSkills— Array of language requirements when the API provides them; often[]on the public widget response.
Company (nested object)
company.id— Willhaben company id (parsed from company follow / agent links in widgets when present).company.title— Legal or display company name.company.slugTitle— Slug for the company profile URL.company.type— e.g. Firma or Personaldienstleister when set; default Firma in the mapper when unknown.company.uidNumber— Austrian UID when present; elsenull.company.url— Link to the www company profile when a companyProfile context link exists.company.logoUrl— Company logo URL on Willhaben’s www jobs CDN when present (often with a fixed resolution query).company.industry— Industry / sector label from COMPANY_INFORMATION when present.company.address— Structured address (street, zipCode, city, country) when the API provides it; elsenull.company.foundingYear,employeeCountFrom,employeeCountTo,activeAdverts— Firmographics when present; oftennullon the public widget payload.
Contact and apply
contact— Object for contact person fields. Often{}when widgets do not expose name/email; if an email is found indescription,contact.emailmay be set to the first match.applyUrl— External application URL when discovered in the payload; oftennull(many jobs only link to login-gated content).
Apify-only
apify_scrapedAt— ISO timestamp when this dataset row was written.apify_extracted_emails— Deduplicated email addresses regex-matched fromtitle,description, andcontactfields (order preserved).
Optional fields (not in this sample)
Some jobs may include extra keys when present in the source payload or mapper, e.g. professionalExperience (boolean). Treat the schema as stable core + optional extensions.
Benefits of the Willhaben scraper
- Four verticals in one actor: Marktplatz, Immobilien, Auto & Motor, and Jobs, matching Willhaben’s main navigation.
- List + detail: structured fields from detail JSON, not only search snippets.
- Flatten mode for CSV-friendly exports: mapped columns plus raw Willhaben attribute keys on one row (iad verticals).
- Traceability:
detailUrl,originalInputUrl,listPageon iad rows; Jobs rows focus on job + company fields (trace fields are not duplicated on the job object).
Why Choose This Actor?
Built for Austrian marketplace and classifieds research on willhaben.at: discovery from www URLs, list JSON then detail JSON. Outputs suit warehouses, BI, or price monitoring.
Use cases:
- Track rentals and sales in Immobilien with filters from the URL.
- Export Marktplatz or Auto listings with attributes and images.
- Collect Jobs in a native job JSON shape plus extracted emails.
Technical Implementation
- URL routing (
willhaben-url.ts,willhaben-classify.ts): Detects Jobs (/jobs/) vs iad paths (Immobilien, Marktplatz, Auto); builds the correct internal list URL per vertical; preserves raw query encoding for slashed keys. - Shared list/detail logic (
willhaben-list-detail-logic.ts):processWillhabenListPage(per-originmaxItems, pagination) andpushWillhabenDetailFromJson. - Pipeline (
willhaben-internal-run.ts+main.ts): one FIFO queue for list and detail tasks (details enqueued before list follow-ups),maxConcurrencyworkers, impit only; proxy rotation on 401/403/429; retries frommaxRequestRetries. Optional reference router:routes.ts(legacy Cheerio shape, not used at runtime).
Explore More Scrapers
If you found this actor useful, check out other scrapers at memo23's Apify profile.
Support
- For issues or feature requests, use the Issues section of this actor on Apify.
- For further assistance, contact the author:
- Author's website: https://muhamed-didovic.github.io/
- Email: muhamed.didovic@gmail.com
Additional Services
- Request customization or a full dataset: muhamed.didovic@gmail.com
- Need other platforms scraped? Contact muhamed.didovic@gmail.com
- For API services of this actor, reach out to muhamed.didovic@gmail.com
- Custom integrations and automation solutions available