Pricing

from $0.89 / 1,000 results

Willhaben Scraper 💰 $0.89/1K — Austria’s Largest Job Portal

Scrape willhaben.at - Austria's largest job portal. Structured salary fields and Austrian VAT/UID numbers for B2B outreach. Incremental mode with NEW/UPDATED/EXPIRED/REAPPEARED + repost detection.

Pricing

from $0.89 / 1,000 results

Rating

0.0

(0)

Developer

Black Falcon Data

Actor stats

Bookmarked

Total users

Monthly active users

3.6 hours

Issues response

4 days ago

Last modified

0.2.9 — 2026-07-05

Changed

More resilient to transient server errors. Search requests now automatically retry when the source site returns a temporary server error, so a brief upstream hiccup no longer fails the whole run.

0.2.8 — 2026-05-11

Added

Output usability fields. Added summary, contactPhone, searchQuery, and searchUrl to every live output record. summary is derived from the job description for quick scanning; contactPhone exposes the first validated phone number as a top-level field; search provenance makes multi-URL and pasted-URL runs auditable per row.
Compact output updated. Compact mode now includes summary, contactEmail, contactPhone, applyUrl, searchQuery, and searchUrl.
Local output audit tool. Added tools/local-output-audit.mjs and refreshed docs/OUTPUT_SAMPLE_AUDIT.md from a broad local Willhaben sample.

Changed

README output example refreshed. The sample now shows high-value populated fields instead of a null-heavy matrix.

0.2.7 — 2026-05-04

Added

Coverage summary footer. Every run now ends with a structured Coverage summary block that makes the full drop-chain visible: reported → fetched → unique → kept-after-filters → emitted. Multi-URL runs additionally get a per-URL table showing exactly which start URLs contributed how many jobs, and tasks whose later pages were skipped because of maxResults are flagged inline (e.g. ⚠ 237 jobs on pages 2–4 skipped (maxResults)). Motivated by emmat's reported scenario where 13 URLs at the default maxResults=25 produced output that looked like only a fraction of the source — the cap was doing exactly what it was documented to do, but the schema description never made that visible. The footer surfaces the full picture so the cause is self-diagnosable from the run log alone.
Pre-fetch warning for skewed maxResults/taskCount ratio. When more than 3 start URLs are combined with a maxResults value below taskCount × 10, the actor now emits a warning before SERP work starts so the user can cancel and re-run with a sensible cap rather than only discovering the truncation after the fact. Suggested raise value scales with task count (taskCount × 50, floor 500).

Changed

maxResults schema description rewritten to make explicit that the cap is a global total across all start URLs (not per-URL), with a worked example: 13 URLs + maxResults=25 ≈ 2 jobs per URL. Recommends 0 (unlimited) or a high explicit total for full multi-URL coverage.

Tests

37 unit tests for the coverage summary covering: emmat's 13-URL scenario, single-URL no-truncation, cap=0 unlimited with cross-URL overlap, invalid-URLs-rejected, filters dropping with reason breakdown, incremental-mode classification breakdown, empty-SERP rendering, number formatting, and the pre-fetch cap-warning helper across all threshold edge cases. Total: 312 tests, all green.

0.2.6 — 2026-05-03

Fixed — Critical

Phone-number digits no longer leak into salaryMax. Live smoke-test against Senior Java Developer:in (STRABAG) produced salaryMax: 224221491 because parseSalary scanned every digit run in the description, including the contact phone +43 1 224221491. The parser now accepts minBound / maxBound options (defaults: 100 / 10M) that filter implausible candidates before computing min/max. With the fix, the same job now correctly returns salaryMax: 55772 (the real upper bound from "50.908 bis 55.772 brutto/Jahr"). 3 regression tests added.

0.2.5 — 2026-05-03

Fixed — Critical

SERP page-1 failure now propagates instead of silently returning an empty universe. Previously a single 503 on page 1 in incremental mode would mark every prior-state listing as EXPIRED (universe corruption); the run now fails fast so state is preserved.
Retry loop no longer self-cancels. fetchWithRetry now builds a fresh AbortSignal per attempt — the prior single-signal pattern caused the first timeout to abort every subsequent retry instantly. Network-level errors (ECONNRESET, ETIMEDOUT, fetch wrappers) are now retryable, not just retryable HTTP statuses.

Fixed — Important

EXPIRED stubs no longer charged. Billing event now sizes by live items only (NEW/UPDATED/UNCHANGED/REAPPEARED); tombstones are operational data with no business value.
Run-footer headline split. Footer now reports X new/updated jobs exported (live only) plus a separate 🪦 Y expired jobs emitted as stubs line — no more "exported X new/updated" when half the dataset is tombstones.
buildExpiredStub is now typed. EXPIRED stubs returned OutputItem via as unknown as cast, leaving every field except a handful as undefined. They now explicitly null-init every field so dataset views render predictable cells, and adding a new OutputItem field correctly causes a TS error.
State-lock failures throw a plain Error instead of throw await Actor.fail(...) — the prior pattern interfered with the outer try/catch's releaseLock cleanup path on lock-loss.
contactName preserved when detail's contact is empty. When Willhaben occasionally ships { email: ... } with no firstname/lastname, the prior code null-blanked any SERP-side contactName; we now fall through to the existing value.
salaryMax now parsed from description even when API has no salary. The fallback used to require an API min before attempting description parsing; we now parse both min and max from description text when the API returns nothing.

Fixed — Nice-to-have

maxResults default aligned across the actor configuration (was inconsistent before).

Tests

1 smoke-test expectation updated for the default result-limit change. All 201 unit/integration tests green.

0.2.4 — 2026-05-03

Fixed — Critical

applyUrl now populated with the public Willhaben URL instead of always being null. Notifications (Telegram, Discord, Slack, WhatsApp) previously rendered without an apply link; they now include a working URL for every job. The field is held back at null inside the tracked-content hash so existing incremental state stays compatible — no spurious UPDATED flood on the first run after upgrade.
Incremental mode universe is now complete when maxResults is set. Previously only page 1 (~90 ids) per task was used to build universeIds, so jobs on page 2+ were wrongly flagged EXPIRED next run. Incremental mode now lifts the per-task page cap entirely so EXPIRED detection sees the full universe.
compact mode no longer breaks notifications. Compact stripping is now applied only at dataset-write time. Notifications always receive the full record so titles, links, and descriptions render correctly.
Vienna-local timestamps now carry the correct timezone offset. Naked creationDate/postedDate/lastModifiedDate/lastReorderedAt from the API used to get a bare Z suffix that mis-represented Vienna wall-clock as UTC (off by 1–2 hours depending on DST). The actor now appends +01:00 or +02:00 based on Europe/Vienna DST for the date in question.

Fixed — Important

Identical startUrls are now deduped by canonical form (sorted query params, no trailing slash). An accidentally-pasted duplicate no longer fires a redundant SERP pipeline.
employmentModes is sorted before hashing in TrackedFields, so a re-ordering of the same set no longer triggers a spurious UPDATED classification. Migration note: jobs whose multi-mode order in the API does not happen to match alphabetical may emit one UPDATED on the first run after upgrade — single-mode jobs (~majority) are unaffected.
Per-task fetch budget now scales with task count. With multiple startUrls each task fetches min(maxResults, max(pageSize, ⌈maxResults × 2 / tasks⌉)) rows. Power-user runs with maxResults ≫ pageSize now save SERP requests by not fetching the full cap on every task.
Repost detection is now O(1) per current item via a pre-built content-hash → expired-prior-entry index. Previously linear scan of priorState.jobs for every item.
Page data parser now matches across newlines. If Willhaben ever ships a pretty-printed payload, parsing won't silently fail.
EXPIRED stub items are now routed through filterCompact when compact: true is set, matching the schema of live items in the dataset.
Empty/whitespace stateKey now falls back to the automatic search-specific state key when the field is left blank in the UI.
EXPIRED stub items are no longer sent to notifications. When emitExpired: true, the per-job stub has only timestamps + jobId — Telegram/Discord/Slack would have rendered them as (untitled) with no link. They still appear in the dataset; only notification rendering is skipped.

Fixed — Nice-to-have

Retry delays now include ±20% jitter to break up thundering-herd retries when many parallel SERP pages hit the same 5xx.
HTTP 429 is now retryable (was previously bubbled directly).
Actor.charge failures log at debug level instead of being silently swallowed — systematic billing breakage is now observable.
descriptionHtml field removed — was always null since v0.2.0. Use description (raw, often HTML) or descriptionMarkdown (auto-converted).
Telegram messages over 4096 chars now split on blank-line (job-entry) boundaries, falling back to hard-cut only when a single block exceeds the cap.
parseStartUrl treats present-but-empty params (?keyword=&region=900) as undefined, matching how absent params behave.

Tests

51 new tests across transform, searchTasks, notifications, mergeTaskResults, computePerTaskBudget, and a new live multi-URL regression suite that verifies emmat's reported scenario against the real Willhaben API.
Total: 281 tests, all green (incl. 80 live tests).

0.2.3 — 2026-05-03

Fixed — Multi-URL output now distributes across all `startUrls`

With multiple startUrls, the global maxResults cap was filled entirely from the first URL's items because the merge preserved task-1-first insertion order. Users with 13 startUrls and maxResults=5 saw only 5 results, all from URL 1, even though the other 12 URLs ran successfully. Reported by user emmat.
Results are now interleaved round-robin across tasks before the global cap, so every URL contributes at least floor(maxResults / tasks) items.
Per-task pagination still fetches up to maxResults rows so the merger has enough data to interleave from. Cap is applied once globally after merge + filters.
Log line for multi-task runs now shows per-task contributions: T1=2 T2=2 T3=1 ….

Changed

maxResults prefill in the input schema raised from 5 to 25 so casual UI runs with multiple startUrls see a fair sample from every URL by default.

0.2.2 — 2026-05-01

Changed — `stateKey` is now optional

When incremental mode is enabled, stateKey is no longer required. If omitted, a stable identifier is auto-generated from your search inputs so different searches never share state — narrower runs no longer accidentally mark jobs from broader runs as EXPIRED.
Migration note: existing schedules that already pass an explicit stateKey keep their prior state intact. Schedules that previously errored ("stateKey is required") will now succeed and start fresh state.

0.2.1 — 2026-04-30

Fixed

startUrls now processes all URLs in the array. Previously only the first URL was used; subsequent URLs were silently dropped. Each URL becomes its own search task; results are merged and deduped by job ID across all tasks. Reported by user emmat.

0.2.0 — 2026-04-25

Added — Output fields

companyVatId — Austrian VAT/UID number for direct B2B outreach (~70% populated)
companyActiveAdverts — Count of active job postings per employer (hiring-volume signal, 100% populated)
salaryMin, salaryMax, salaryCurrency, salaryPeriod — Structured salary fields (parsed from API + description text)
countryCode — ISO 3166-1 (always "AT")
locations[] — Multi-location array {name, federalState, country}
isFeatured — Promoted/topJob flag
isFreshlyPosted — 24-48h freshness flag
internalApplicationOnly — Apply via Willhaben vs external
requiresExternalApplication — External application form required
requiresProfessionalExperience — Professional experience required
createdAt, lastReorderedAt — Separate from firstPublishDate/lastModifiedDate
extractedEmails[] — Regex-extracted emails from description text
extractedPhones[] — Defensive phone-number extraction (strict mode default; lenient available)

Added — Inputs

startUrls[] — Paste raw Willhaben search URLs; query params merge with explicit input
sortBy — publish_date_desc (newest) or relevance
salaryMinFilter, salaryMaxFilter — Post-fetch salary range filter (EUR)
whatAnd, whatExclude — Post-fetch keyword AND/NOT filter
emitUnchanged, emitExpired — Incremental emission policy
skipReposts — Drop reposts of previously expired jobs
telegramToken, telegramChatId — Telegram notifications
discordWebhookUrl — Discord notifications
slackWebhookUrl — Slack notifications
whatsappAccessToken, whatsappPhoneNumberId, whatsappTo — WhatsApp Cloud API (free-form, 24h service window)
notificationLimit, notifyOnlyChanges — Notification controls
phoneExtractionMode — strict (default) or lenient

Changed

Full incremental classification — changeType now correctly emits NEW, UPDATED, UNCHANGED, EXPIRED, REAPPEARED (uppercase). Previously only new/updated were generated despite README claims.
Incremental fields populated — firstSeenAt, lastSeenAt, previousSeenAt, expiredAt are now real fields on output records (not just README aspiration).
State-lock on incremental runs — Concurrent runs sharing the same stateKey now refuse with Actor.fail instead of silently corrupting state.
Default memory — 128 MB → 256 MB; max 512 MB → 1024 MB.
Salary structure: legacy salary/salaryTimeFrame retained; new salaryMin/Max/Currency/Period are the canonical fields. salaryMax parsed from description text via salaryParser.
Date timestamps now carry UTC Z suffix (was naked CET/CEST).
descriptionMarkdown actually runs htmlToMarkdown (was passthrough of plain text).

Fixed

README claims about firstSeenAt/lastSeenAt/emitUnchanged/UNCHANGED/REAPPEARED/EXPIRED/isRepost are now true.
"Skill tags" → "language skills" in description.

Compliance

Now imports canonical _lib/incrementalState.ts + _lib/stateLock.ts + _lib/notifications.ts + _lib/phoneExtractor.ts (no more hand-rolled simpleHash state).

0.1.x — 2026-04-14

Added: descriptionHtml, descriptionMarkdown output fields (triple-format descriptions for RAG/LLM pipelines)
Added: contentHash output field (stable hash of content-identifying fields, used for change detection)

0.1.x — 2026-04-14

Added: cross-run repost detection (isRepost, repostOfId, repostDetectedAt)
Added: skipReposts input to exclude detected reposts from output

0.1.0 (2026-03-20)

Initial release
Search Austrian job listings on willhaben.at by keyword, location, and filters
Salary, company profile, and contact info extraction
Incremental mode with change detection
Compact output mode for AI-agent and MCP workflows

Willhaben Jobs Scraper — Austrian Job Listings & Salaries

studio-amba/willhaben-jobs-scraper

Scrape job listings from the willhaben.at jobs vertical — Austria's largest classifieds portal. Get titles, companies, locations, salaries, and full descriptions for every open position.

Studio Amba

Willhaben.at — Austria's Largest Classifieds

blackfalcondata/willhaben-all-scraper

Scrape willhaben.at — Austria's largest classifieds platform across every section · listings from any pasted search URL · price & attribute fields per item. Incremental mode tracks new and changed listings across scheduled runs.

Black Falcon Data

🇦🇹 willhaben Scraper - Austrian Marketplace Listings

benthepythondev/willhaben-scraper

willhaben Scraper to extract classified listings from willhaben.at, Austria's largest marketplace. Get title, price, location, postcode, district, description, images, seller type (private or dealer) and URL by keyword. For price research, market analysis, reselling and lead generation in Austria.

Ben

Willhaben Jobs Details Scraper

ecomscrape/willhaben-jobs-details-scraper

Automate job data extraction from willhaben.at, Austria's largest digital marketplace with over 17,000 job listings. Extract detailed job information including salaries, company details, locations, and employment terms for market analysis, recruitment intelligence, and competitive research.

ecomscrape

Willhaben.at Classifieds Scraper (Austria)

engaging_pyrite/willhaben-at-classifieds

Scrape Austrian classified ads from willhaben.at by search URL: title, price, location, exact publish date, seller, images. Export JSON/CSV/Excel.

lysum

Willhaben.at Scraper — Austria Property, Cars & Marketplace

ocrad/willhaben-scraper

Scrape Willhaben.at (Austria's largest marketplace) from any search URL — property, cars & general listings. Get title, price, price/m², location, size, rooms, advertiser and images per item. Export JSON/CSV/Excel.

Ocrad

Willhaben Scraper - Austrian Classifieds & Marketplace

studio-amba/willhaben-scraper

Scrape classified listings from Willhaben.at, Austria's largest marketplace. Search by keyword, category, and price range. Extract titles, prices, descriptions, images, location, and seller info. Covers classifieds, real estate, cars, and jobs. No login or cookies required.

Studio Amba

Willhaben Property Details Scraper

ecomscrape/willhaben-property-details-scraper

Access Austria's largest property marketplace data with our Willhaben.at scraper. Extract detailed property listings, pricing, locations, and contact information from millions of real estate ads. Perfect for market analysis, price comparison, and real estate research across Austrian regions.

ecomscrape

Willhaben.at Scraper - Austrian Classifieds

santamaria-automations/willhaben-at-scraper

Scrape classified listings from willhaben.at, Austria's #1 classifieds platform. Cars, real estate, jobs, electronics and more. Paste any search URL with all filters. HTTP-only, fast, pay-per-result.

NanoScrape

Willhaben Jobs Search Scraper

ecomscrape/willhaben-jobs-search-scraper

The Willhaben.at Jobs Scraper automates the extraction of job listings from Austria's largest digital marketplace. Collect comprehensive employment data including job titles, locations, salaries, company information, and employment details from over 17,000+ active listings for recruitment analytics,

ecomscrape

Willhaben Scraper 💰 $0.89/1K — Austria’s Largest Job Portal

Changelog

0.2.9 — 2026-07-05

Changed

0.2.8 — 2026-05-11

Added

Changed

0.2.7 — 2026-05-04

Added

Changed

Tests

0.2.6 — 2026-05-03

Fixed — Critical

0.2.5 — 2026-05-03

Fixed — Critical

Fixed — Important

Fixed — Nice-to-have

Tests

0.2.4 — 2026-05-03

Fixed — Critical

Fixed — Important

Fixed — Nice-to-have

Tests

0.2.3 — 2026-05-03

Fixed — Multi-URL output now distributes across all startUrls

Changed

0.2.2 — 2026-05-01

Changed — stateKey is now optional

0.2.1 — 2026-04-30

Fixed

0.2.0 — 2026-04-25

Added — Output fields

Added — Inputs

Changed

Fixed

Compliance

0.1.x — 2026-04-14

0.1.x — 2026-04-14

0.1.0 (2026-03-20)

You might also like

Willhaben Jobs Scraper — Austrian Job Listings & Salaries

Willhaben.at — Austria's Largest Classifieds

🇦🇹 willhaben Scraper - Austrian Marketplace Listings

Willhaben Jobs Details Scraper

Willhaben.at Classifieds Scraper (Austria)

Willhaben.at Scraper — Austria Property, Cars & Marketplace

Willhaben Scraper - Austrian Classifieds & Marketplace

Willhaben Property Details Scraper

Willhaben.at Scraper - Austrian Classifieds

Willhaben Jobs Search Scraper

Changelog

0.2.9 — 2026-07-05

Changed

0.2.8 — 2026-05-11

Added

Changed

0.2.7 — 2026-05-04

Added

Changed

Tests

0.2.6 — 2026-05-03

Fixed — Critical

0.2.5 — 2026-05-03

Fixed — Critical

Fixed — Important

Fixed — Nice-to-have

Tests

0.2.4 — 2026-05-03

Fixed — Critical

Fixed — Important

Fixed — Nice-to-have

Tests

0.2.3 — 2026-05-03

Fixed — Multi-URL output now distributes across all startUrls

Changed

0.2.2 — 2026-05-01

Changed — stateKey is now optional

0.2.1 — 2026-04-30

Fixed

0.2.0 — 2026-04-25

Fixed — Multi-URL output now distributes across all `startUrls`

Changed — `stateKey` is now optional

Fixed — Multi-URL output now distributes across all `startUrls`

Changed — `stateKey` is now optional