Phase 3 - price history tracking (the feature that closed the
biggest single competitive gap, against martas_kristof).
enableHistory boolean (default off, opt-in). When on, the
actor persists a map across runs in the actor's key-value store.
historyStoreId string optional. Name of an Apify Key-Value
Store; lets multiple cron tasks polling different filters share
history of the same property pool. Default uses the actor's own KVS.
New output fields (only present when enableHistory: true)
isNew - listing wasn't in state before this run.
priceChanged - current price differs from the last
observation. Only true when both sides are real numbers; transitions
to/from priceHidden don't trip the flag.
previousPriceCzk - the prior observation when price changed,
null otherwise.
daysTracked - days since first observation (>=0).
firstSeenAt - ISO timestamp of the first observation, stable
across all subsequent runs.
- KVS state size is checked at save time; warns if it crosses 8 MB
(the 9 MB Apify per-key limit is non-negotiable). Hidden->revealed
price transitions retain the last known number rather than
overwriting with null.
_compute_history_fields is a pure function, mutates state in place
but never raises; malformed firstSeenAt falls back to 0 days +
current timestamp.
- 7 new (first observation, repeat unchanged, price drop, hidden
transition, malformed firstSeenAt fallback, int->str id normalization,
defensive against non-dict state). Total: 76, all passing.
Phase 2 hardening after a Šťoura review (verdict 7/10 -> 10/10 over
two rounds). No new features; production-readiness fixes.
- Suggest-API resolver no longer fakes a match for city quarters.
_pick_locality previously preferred quarter_cz over municipality_cz
in its category priority, so _resolve_city("Praha") could land on
a Holešovice/Vinohrady quarter instead of all-of-Praha. Worse, for
inputs like Praha 6 Sreality returns quarter_cz with
district_id=0, which our code accepted - producing a filterless,
nationwide scrape instead of erroring out. The resolver now walks
Sreality's own ranking and returns the first match with a real
region_id or district_id; quarter/ward results without IDs raise an
actionable error suggesting URL-paste mode instead, with a URL
hint composed from the user's category/offerType inputs.
- Suggest API and resume state load handle errors gracefully.
The suggest call now uses the same 3-attempt retry/backoff helper
as bootstrap (no more single-flight failure on transient blips).
Resume state with malformed
page/scraped values no longer
crashes the run with a TypeError - falls back to (1, 0, 'fresh')
with a warning log.
- POI distances prefer walkDistance over straight-line distance.
Sreality computes both via mapy.cz; walkDistance reflects the
actual route a resident would walk and is consistently larger
than the straight-line value (typically 1.3-1.7x). Falls back to
distance when Sreality didn't compute the route.
- HTTP 429 (rate limit) honors
Retry-After header. Previously
treated as a generic HTTPError and retried with the standard
2/4/8s backoff - that's too aggressive if Sreality returns a
longer cool-down window.
api_params log values truncated >48 chars. Defensive against
any future opaque token a user might wire into apiParams.
_resume_from_state and _parse_retry_after extracted as pure
helpers, unit-tested in isolation. Resume status reported with a
string tag () so the main-loop logging stays grouped.
- 15 new tests (POI walkDistance preference + closest-across-items,
resume helper across all 4 status codes incl. clamping, retry-after
parsing edge cases). Total: 69, all passing.
Phase 2 - data quality + UX expansion + crash recovery.
Three new fields turn Palindrom into a structured-input scraper as
well as a URL-paste one:
city - free-text Czech city, district or street. Resolved via
Sreality's own suggest API (/api/cs/v2/suggest), so accents and
spelling variants are handled automatically. Verified on probe
queries praha, Praha, plzen, Plzeň.
category - enum byty / domy / pozemky / komercni / ostatni.
Used together with city.
offerType - enum prodej / pronajem / drazba. Used together
with city.
The actor now requires at least one of: searchUrl, apiParams, or
city.
houseNumber - parsed from the street component of localityRaw,
handles trailing letters (12a) and split numbers (3/5).
- 6 POI distance fields (only present in
includeDetails mode):
poiTransportDistanceM, poiSchoolDistanceM, poiGroceryDistanceM,
poiDoctorsDistanceM, poiRestaurantDistanceM,
poiLeisureDistanceM. Values are the closest match per category in
meters (straight-line). Sreality already collects these via mapy.cz
/ firmy.cz; we just surface them.
resumeFromLastRun input (default off). When enabled, the
actor persists {signature, page, scraped} to its key-value store
after every page push. A subsequent run with the same input
signature picks up where the crashed/aborted run left off. The
signature hashes searchUrl + city + category + offerType + sort +
apiParams, so changing any filter dimension forces a fresh start.
State is cleared on successful completion to avoid resuming past
EOF.
- 16 new unit tests (suggest API priority order, signature
stability + input-change detection, POI extraction edge cases,
street/number splitting). Total: 52, all passing.
Phase 1 quick wins after competitive feature audit (15 Apify Store actors).
- Per-page error counter (was: cumulative). The previous code tracked
errors globally across all search pages, so 5 random transient
failures spread across a 100-page run would abort everything. Errors
now reset on each successful page; a page is retried 3 times before
it is skipped, and the run aborts only if 3 pages in a row fail.
pricePerSqm - precomputed priceCzk / areaM2 (round to int),
null when price is hidden or area is unknown. Matches a feature 5
of 15 competitor scrapers expose.
agentUrl - direct link to the agency profile on
sreality.cz/adresar/<slug>/<id>, lowercase slug verified against a
live 301 redirect.
sort - optional override for search ordering. Enum values
verified against the Sreality JSON API:
-date / +date / -price_norm / +price_norm. Empty (default)
keeps whatever ordering the search URL specifies.
apiParams - power-user bypass for the HTML bootstrap. When
provided, the actor calls /api/cs/v2/estates directly with these
snake_case params and skips the 725 KB search-page download.
Useful for cron-driven repeat polls of a known filter.
- 3 retries with exponential backoff (was: single attempt). A
flaky proxy IP no longer drops detail data for the affected listing.
- 6 named views in
dataset_schema.json (was: 1):
overview, pricing, location, media, agent, compact. Apify Console
now lets users switch perspectives without exporting the full row.
- 8 new unit tests (
pricePerSqm math, hidden-price handling,
no-area handling, _build_agent_url slug/id edge cases).
- Total: 36 tests, all passing.
- Renamed actor to
palindrom (slug janmatejka/palindrom).
Title becomes "Sreality Palindrom Scraper". The name is a small joke:
the word "palindrom" is itself not a palindrome.
- Removed hardcoded Decodo credentials from
scripts/measure_economics.py.
The script now reads DECODO_USER / DECODO_PASS / DECODO_HOST from
the environment and skips proxy scenarios when they're not set. The
previously-hardcoded credentials must be rotated on the Decodo
dashboard - they appeared in builds 0.1 - 0.5 of the old actor.
- Search-page error handling now catches
JSONDecodeError in addition to
httpx.HTTPError. If a proxy ever returns malformed HTML in place of
JSON, the run retries (up to 5 times with exponential backoff) instead
of crashing with an unhandled exception.
- Cosmetic cleanup: removed em-dashes from all repo files (CZ style guide).
- Up-version for trackability.
- Auction listings (
category_type_cb=3) now produce a working URL.
Sreality uses /detail/drazby/... (plural slug) for the URL while the
offerType field stays drazba (singular Czech word). Two separate
maps handle the difference.
This release fixes critical data-integrity bugs and ships substantial
performance improvements based on a thorough review.
priceUnit field added. Rental listings carry priceUnit: "za měsíc"
while sales have priceUnit: "". Without this, rental prices look like
total prices in the dataset.
subtype now covers all 44 Sreality subcategories, not just the 13
flat dispositions. Houses, plots, commercial properties and "other"
(garage, wine cellar, mobile home, ...) all get a meaningful subtype.
- Public listing URL works for every property type, not just flats.
Verified live against Sreality on 50+ listings spanning 5 categories
and 29 subtypes (100% HTTP 200).
subtypeId exposed for users who prefer Sreality's numeric ID.
- Labels flattened. Was
[["panel"], []], now ["panel"].
- Image gallery in high resolution.
images are 1200x900 (was 400x300
thumbnails); thumbnails preserved in imageThumbnails.
- Locality parsing handles multi-comma addresses robustly.
pageSize default 60 -> 500. Sreality accepts up to 500 listings
per request; 1000 listings now requires 2 search requests (was 17).
- Concurrent detail fetching with configurable
detailConcurrency
(default 10). Detail-mode runs are roughly 5x faster.
- Inter-page pacing removed. Sreality has no detectable rate limit.
- Bootstrap retries up to 3 times with exponential backoff on
network/parse errors.
- 403 / 429 distinguished from schema-break errors. Users get
actionable error messages ("proxy IP is likely blocked" vs "Sreality
may have changed their frontend").
- Safety cap of 100 000 listings when
maxResults=0 (unlimited)
prevents runaway billing on multi-million-result searches.
- 28 unit tests covering filter conversion, locality parsing, URL
building, normalization edge cases.
scripts/discover_slugs.py re-verifies the URL-slug mapping against
live Sreality data.
Initial pilot release. Search URL bootstrapping via __NEXT_DATA__,
basic field extraction, Decodo BYOP support, single Praha-byty smoke
test.