Scrape willhaben.at — Austria's largest classifieds platform. Pull listings from any pasted search URL across every platform section with incremental change tracking that emits only new and updated items between runs.
jobSalaryPeriod — normalized period ("hour" / "day" / "week" / "month" / "year"), distinct from the raw jobSalaryTimeFrame Willhaben ships in German.
The pre-existing jobSalary / jobSalaryTimeFrame / jobSalaryText fields are preserved unchanged for backward compatibility.
Salary parser ships with bounds (defaults: min €100, max €10M) and a 5× ratio cap on description-parsed maxes, so phone numbers and VAT ids in descriptions can't leak into salary values. (Same defense as willhaben-scraper v0.2.6.)
Fixed
contactName preserved when detail's contact is empty (parity with willhaben-scraper v0.2.5). When Willhaben ships { email: ... } without firstname/lastname, prior jobContactName is no longer null-blanked.
0.3.2 — 2026-05-03
Fixed — Critical
SERP page-1 failure now propagates in both jobs and attribute (immo / autos / marktplatz) tasks instead of silently returning an empty universe. A single 5xx on page 1 in incremental mode would have marked every prior-state listing as EXPIRED (universe corruption); runs now fail fast so state is preserved.
Retry loop no longer self-cancels.fetchWithRetry now builds a fresh AbortSignal per attempt — the prior single-signal pattern caused the first timeout to abort every subsequent retry instantly. Network-level errors (ECONNRESET, ETIMEDOUT, EAI_AGAIN, fetch wrappers) are now retryable, not just retryable HTTP statuses.
Hostname validation on searchUrl / startUrls. Pasted non-Willhaben URLs are now rejected with a clear error (searchUrl must be a willhaben.at URL) instead of silently parsing into a request that goes nowhere meaningful.
Fixed — Important
EXPIRED stubs no longer charged. Billing event now sizes by live items only (NEW/UPDATED/UNCHANGED/REAPPEARED); tombstones are operational data with no business value.
Run-footer headline split. Footer now reports X new/updated listings exported (live only) plus a separate 🪦 Y expired listings emitted as stubs line — no more "exported X new/updated" when half the dataset is tombstones.
buildExpiredStub is now typed. Adding a new OutputItem field would (correctly) cause a TS error rather than silently leak undefined into the dataset.
State-lock failures throw a plain Error instead of throw await Actor.fail(...) — the prior pattern interfered with the outer try/catch's releaseLock cleanup path.
immoAvailableDate now ISO-normalized, matching all other date fields.
validThrough and jobExpiryDate consistent. Hash input and output field now use the same normalized expiryDate so a presentation-only timestamp drift can't trigger a spurious UPDATED.
State-key auto-scope is section-aware. For non-jobs runs, jobs-* filters no longer enter the state partition; for non-immo runs, immoSubTypes doesn't enter; etc. Prevents accidental state-sharing across unrelated result sets.
Fixed — Nice-to-have
Default memory raised from 256 MB to 512 MB; max raised to 2048 MB. The 256 MB default OOMed on maxResults: 0 runs over large universes.
Dead applyExtraction removed.
Tests
7 new tests covering emission-policy toggles and full classification round-trip (prior-active missing → EXPIRED, firstSeenAt preservation, REAPPEARED detection). Total: 157 unit/integration tests, all green.
0.3.1 — 2026-05-03
Added — Emission policy controls (parity with willhaben-scraper)
emitUnchanged — when enabled, incremental runs also emit listings
classified as UNCHANGED (no tracked content drift since last run). Default
false — most pipelines only want NEW / UPDATED / REAPPEARED.
emitExpired — when enabled, listings tracked in the prior state but
missing from the current run are emitted as EXPIRED stubs (timestamps +
listingId only, no live data). Notifications skip these stubs (no title /
apply link to render). Default false.
Closes the schema drift between this actor and willhaben-scraper, which
has had these toggles since v0.2.0.
0.3.0 — 2026-05-03
Added — Multi-URL startUrls input
New startUrls input accepts an array of willhaben.at search URLs. Each URL
becomes its own search task; results are merged round-robin across tasks
before the global maxResults cap so every URL contributes to the output.
Without this, a small maxResults would be filled entirely from URL #1's
items even when the other URLs were also fetched (the bug user emmat
reported against the jobs scraper).
All URLs in a single run must target the same section (jobs / immobilien /
autos / marktplatz). Mixed-section runs are rejected with a clear error
asking the user to split into separate runs.
Identical URLs are deduped by canonical form (sorted query params, trimmed
trailing slash) so an accidentally-pasted duplicate doesn't fire two
identical SERP pipelines.
The single-string searchUrl field still works (treated as a one-entry
startUrls array) for backward compatibility with existing schedules.
Fixed — Structural (parity with willhaben-scraper v0.2.4)
Jobs URL filters now extracted from URL. A pasted /jobs/suche URL with
?location=Wien®ion=900&employment_type=vollzeit previously dropped
these params silently into extraParams (which the jobs API client doesn't
read). They're now mapped onto the typed jobLocation / jobRegion /
jobEmploymentMode fields so per-URL job filters actually take effect.
Vienna-local timestamps now carry the correct timezone offset. Naked
firstPublishDate / PUBLISHED_String / expiryDate from the API used to
pass through without normalization, mis-representing Vienna wall-clock
values. Now appends +01:00 or +02:00 based on Europe/Vienna DST for
the date in question.
EXPIRED stub items no longer sent to notifications. Per-job stubs only
carry timestamps + listingId — Telegram/Discord/Slack would have rendered
them as (untitled) with no link. Still appear in the dataset; only
notification rendering is skipped.
Telegram messages over 4096 chars now split on blank-line (entry-block)
boundaries, falling back to hard-cut only when a single entry exceeds the cap.
HTTP 429 is now retryable (was previously bubbled directly to the caller).
Retry delays now include ±20% jitter to break up thundering-herd retries
when many parallel SERP pages hit the same upstream 5xx.
Actor.charge failures log at debug level instead of being silently
swallowed — systematic billing breakage is now observable in dev/CI logs.
Empty/whitespace stateKey is now coerced to null so the
automatic state partition fallback fires when the field is left blank in the UI.
Repost detection is now O(1) per current item via a pre-built
content-hash → expired-prior-entry index. Previously linear scan over the
full prior state for every item.
Tests
30 new tests (mergeTaskResults, computePerTaskBudget, searchTasks).
Total: 150 tests, all green.
0.2.1 — 2026-05-01
Changed — stateKey is now optional
When incremental mode is enabled, stateKey is no longer required. If omitted, a
stable identifier is auto-generated from your search inputs (section, query,
location, sub-type, price and job filters, plus any params from a pasted searchUrl)
so different searches never share state.
Migration note: existing schedules with an explicit stateKey keep their prior
state intact. Schedules that previously errored ("stateKey is required") will now
succeed and start fresh state.
0.1.x — 2026-04-14
Added: descriptionHtml, descriptionMarkdown output fields (triple-format descriptions for RAG/LLM pipelines)
Added: contentHash output field (stable hash of content-identifying fields, used for change detection)