Scrape prace.cz — one of Czech Republic's largest job boards — and extract structured job listings with titles, companies, locations, salaries, employment types, and detailed descriptions. Incremental mode tracks new and changed listings.
Plain-text HTML cleanup now uses the shared workspace helper for entity decoding. This keeps description, descriptionText, and markdown conversion aligned with the same decoding rules.
[0.2.12] - 2026-05-15
Fixed
Plain-text job descriptions now decode HTML entities correctly. For example, employer text such as R&D is no longer emitted as R&D in description / descriptionText; descriptionHtml remains valid HTML.
[0.2.11] - 2026-05-15
Fixed — cleaner contact and social signals
extractedEmails, extractedPhones, and the contact component of
jobQualityScore now use only employer-authored contact details from the
job ad. Shared page-level contact links are ignored.
socialProfiles now also excludes prace.cz-owned social links, so the
field only surfaces employer-relevant profiles when available.
Fixed — more precise ATS detection
atsProvider / atsUrl detection is now stricter for supported ATS
platforms. Vendor websites are ignored, while employer-specific ATS links
continue to be surfaced.
Changed — clearer runtime logs
Filtering and auto-enrichment logs now use user-facing wording while
preserving structured diagnostics for operators.
[0.2.9] - 2026-05-15
Added — atsProvider + atsUrl output fields
When the employer publishes an apply-link inside the job description
pointing to a known Applicant Tracking System (Workday, Greenhouse,
Lever, SmartRecruiters, Recruitee, Personio, BambooHR, Workable,
Teamio, iCIMS, Taleo, Jobvite, JazzHR, Ashby), the actor now surfaces
both the canonical provider name (atsProvider) and the direct apply
URL (atsUrl). Independent of applyUrl, which remains the prace.cz
internal Reply form. ATS provider is a high-signal B2B field —
enterprise ATS use correlates with company size and maturity.
Changed — cleaner companyWebsite values
companyWebsite is now populated only from employer-authored links in
the job ad. This reduces false positives from shared page navigation and
keeps the field null when no external employer website is published.
socialProfiles remains available separately for social links detected
on the job page.
[0.2.6] - 2026-05-15
Fixed
techStack — bare C no longer falsely matches incidental
uppercase letters in Czech descriptions ("verze C", "úroveň C2",
"CNC operátor"). Pattern now requires either a C/C++ pairing or
an explicit programming/language/developer context.
extractedUrls / companyWebsite — namespace URIs leaked from
inline SVG xmlns="..." attributes (e.g. http://www.w3.org/2000/svg,
including the trailing-backslash variant from JSON-escaped HTML)
are now stripped at extraction. companyWebsite additionally skips
recurring CZ/EU ATS/staffing platforms (Recruitee, Greenhouse,
Lever, Workable, SmartRecruiters, Personio, BambooHR, jobspin.cz,
lmc.eu, cvonline.varbamisteenused.ee, teamio.cz) so the field
surfaces the employer's own homepage instead.
Internal
Replaced literal 40 with PRACE_SERP_PAGE_SIZE in the post-filter
over-fetch cap. Added direct regression tests for the over-fetch
factor so a future refactor can't silently drop the compensation.
Varied four duplicate input-schema sentences flagged by the
pre-push proofread.
[0.2.4] - 2026-05-15
Added — Ship-2a output value fields
region / district — canonical CZ kraj + city slugs derived
from location via the sitemap reverse lookup. A SERP-card-aware
fallback chain handles hyphenated patterns ("Ostrava-Poruba"),
Roman-numeral suffixes ("Klatovy IV"), and dash-separated annexes
("Praha 4 – Modřany"). 100% resolution on the SERP fixture (40/40).
companyWebsite — best-effort external employer homepage
picked from extractedUrls, filtered against an aggregator and
social-platform exclusion list.
includeKeywords / excludeKeywords — post-detail
case-insensitive substring filter over title+description+techStack.
Exclude wins over include.
includeRunMetadata — optional flag to omit run-level metadata
from notification payloads (default true preserves existing behavior).
Fixed
Post-detail filter underfill — when any post-detail filter is
active, the actor now over-fetches raw SERP cards by a factor of 3
and caps emission at maxResults. Previously a minSalaryCzk or
maxAgeMinutes filter could silently emit ~30% of the requested
target.
[0.2.1] - 2026-05-14
Fixed
Raised maxPages automatically when maxResults requires more pages.
[0.2.0] - 2026-05-14
Fixed — location filter now actually filters
Earlier versions sent location as ?l=<city>, which prace.cz silently
ignores. Runs that specified a location were returning unfiltered
nationwide results. The actor now resolves the location to its canonical
prace.cz path segment (e.g. "Praha" → /nabidky/hlavni-mesto-praha/praha/)
sourced from the prace.cz sitemap. Diacritics and case are normalised
automatically. Unknown names fall back to a nationwide search with a
warning — non-breaking for users passing arbitrary strings.
minSalaryCzk — post-detail filter on JSON-LD baseSalary. Auto-
enables includeDetails.
maxAgeMinutes — post-detail filter on JSON-LD datePosted. Auto-
enables includeDetails. Pair with incremental + notifications for
near-real-time alerts.
jobUrls — direct UUID-keyed detail fetch. 404s and soft-404s
("listing expired" pages) are cached for 30 days so re-runs skip dead
jobs without spending a request.
techStack: string[] on every record — heuristic match against a
versioned vocabulary of ~85 tech keywords (Python, AWS, React, Spring,
...).
isStaffingAgency: boolean on every record — exact match against a
versioned whitelist of ~45 CZ staffing agencies (Grafton, Manpower,
Hays, Adecco, Randstad, ...). Strips legal-form suffixes and diacritics
before lookup.
Changed — incremental state semantics
stateKey auto-derivation now factors in employmentType, jobUrls,
minSalaryCzk, and maxAgeMinutes so different filter values get
isolated state automatically. Existing
incremental runs with auto-derived stateKey will see a one-time
rotation on first execution after upgrade — yesterday's active jobs
classify as NEW once, then resume normal incremental behaviour.
Users who set stateKey explicitly are unaffected.
Post-detail filters are now applied before incremental classification.
Raising minSalaryCzk between runs correctly classifies no-longer-
matching jobs as EXPIRED from the user's view, instead of leaving
them stuck active in state.
Removed
includeCompanyProfile input — feature was advertised but never wired;
prace.cz doesn't expose company-registry data natively.
[0.1.x] - 2026-05-01
Changed — stateKey is now optional
When incremental mode is enabled and stateKey is omitted, the actor now
auto-derives a stable identifier from your search inputs (keyword, location,
startUrls, filters). Different filter combinations get isolated state
automatically — no more accidental cross-pollution between runs that
fetched different universes.
Existing runs that explicitly set stateKey are unaffected — your value
still wins.
[0.1.0] - 2026-04-22
Added
Initial release
SERP extraction: title, company, location, salary, employment type from prace.cz search results
Detail enrichment: full description (HTML + Markdown), postedAt, validThrough, salary (min/max/currency), education level, benefits, apply URL from JSON-LD JobPosting