Extract wine ratings, prices, taste profiles, reviews, and grape varieties from Vivino. Search by wine name or URL. Fast HTTP-only approach with no browser needed. Export JSON, CSV, or Excel.
All notable changes to Vivino Wine Data Scraper are documented here.
v0.4.76 (2026-06-19)
Changed — log hygiene: the Console "errors" tab now reflects real failures
Downgraded expected multi-strategy / escalation probing noise from warning to debug
(no-match-in-explore, Strategy 2.3 shortened-query skips, speculative winery-page probe failures,
transient single-attempt retries). Kept warning for genuine terminal failures and upstream 429
rate-limiting. The api-client retry log is split by status (429 → warning, 5xx → debug) to match
html-client. No matching / billing / retry logic changed; recordRateLimitTrip() accounting intact.
This makes the Apify Console "errors" tab show the real error rows (e.g. ~23) instead of 160+
cascade-noise warnings on escalation-heavy name-search runs.
generateWinerySlugs now applies the -fils/-et-fils suffix to the FORWARD producer-truncation
slugs (e.g. jean-claude-bachelet → jean-claude-bachelet-fils), not only to the reversed/rotated
cores. Estates indexed as "X & Fils" on Vivino but written without "& Fils" in the catalog (e.g.
Jean Claude Bachelet & Fils — Sous le Puits, Murgers des Dents de Chien) now resolve. Advanced-only;
basic benefits via the C1 escalate-on-miss path. High-priority placement within the slug set (cap
raised 14→16 to keep both the new -fils slugs and the existing domaine-/maison- variants).
Wrong-producer bills stay guarded by the existing resolveFromWinery match gates (gate A/B).
v0.4.74 (2026-06-17)
Fixed — C1: basic name-search misses escalate to an advanced retry
In matchingMode: basic, a name search that finds nothing now retries once in advanced
before emitting NAME_SEARCH_ERROR. This recovers grower/estate wines that only the advanced
cascade surfaces (winery-page wine-list parsing / Strategy 1.5) — e.g. Domaine des Tours,
Domaine Fourrier, Pierre Peters, Simon Colin, Jamet. Zero regression on already-resolving
wines (the retry only runs when basic returns null). New escalatedRecoveries count in the
run summary; recovered rows carry a +escalated diagnostic strategy tag (stripped from output).
429 guard: the escalation is skipped when a 429 was tripped during the basic search — such
a rate-limited miss is left to the second-pass cooldown retry (preserving the 2nd-retry-429
recovery path) rather than immediately re-hammering a throttled endpoint.
Cost: a genuine basic miss now costs an extra advanced cascade (only on misses). ON by default.
v0.4.73 (2026-06-16)
Added — opt-in proxy for concurrency survival (throughput / 429)
New proxyConfiguration input (off by default). When several runs hit Vivino in parallel,
they previously shared the Actor's single outbound datacenter IP and hit a shared 429 wall.
Enabling the proxy rotates a fresh IP per request, so concurrent runs no longer cannibalize one
IP. When enabled without a group, the RESIDENTIAL pool is coerced (Vivino often rejects
datacenter IPs). Off by default = current behavior, no imposed bandwidth cost.
Adaptive cooldown: with the proxy active, the shared rate-limiter cooldown drops from 15s to
~1s (each rotating IP sees almost no traffic), reclaiming throughput. With the proxy off, the 15s
cooldown is unchanged.
The run summary (last-run-summary KV) now reports proxy.enabled.
Changed
HTTP transport migrated from native fetch to got-scraping (new src/http.ts single-shot
transport routed through the opt-in proxy). With the proxy off, behavior is preserved (better TLS
fingerprints, never worse against 429). The shared rate-limiter cooldown is now mutable
(RateLimiter.setCooldown in wine-core). 445 tests.
v0.4.72 (2026-06-14)
Added — P4: vintage fidelity on the name/explore path (test 2026-06-14)
The explore API returns a REPRESENTATIVE vintage (often ≠ the one requested) → price/rating reflected the
wrong year on name searches (measured 70–87% agreement vs 100% on the URL path). Now, when a vintage is
requested and the explore-resolved vintage differs, reconcileVintage re-fetches the wine page with
?year=requested and overlays the vintage-specific fields (vintage, price, currency, rating, image, url).
Anti-mislabel guard (fail-closed): the page's displayed vintage is extracted from the year-prefixed
JSON-LD product name ("2010 Raveneau…"), and the overlay is applied ONLY IF that equals the requested year.
If the year is absent on Vivino (page falls back to wine-level, no year prefix), the fetch fails, or the
URL carries no seo-name → the explore row is returned UNCHANGED. A year-X row is never labelled year-Y.
Bonus: placing reconcile upstream of dedup makes the wineId::vintage key more accurate — two inputs for
different vintages of the same wine that explore collapsed to one representative vintage are now billed
distinctly. Cost: one extra fetch per explore-resolved wine whose vintage differs from the request.
Note: a vintage-mismatched row may now show price:null (correct year, no offer) where it previously
showed a wrong-year price — this is intended (BI correctness); the price null-rate drift metric may tick up
legitimately. TDD: 6 cases (parser extraction + 5 reconcile branches). Adversarial review SHIP (no holes). 436 tests.
P2 — grape as distinguisher: in varietal regions (Alsace/Germany) the grape IS the wine identity.
"René Muré Côte de Rouffach Riesling" was billed the "…Pinot Gris" of the same producer/lieu-dit. Now the
cuvée gate rejects when query and candidate NAME name grapes that are fully DISJOINT. Conservative: a shared
token ("Pinot Blanc" vs "Pinot Gris" → 'pinot') does not reject; color-ambiguous tokens (blanc/noir/gris)
are excluded so "Bourgogne Blanc" ≠ "Bourgogne Chardonnay" is NOT triggered; and grape SYNONYMS
(Syrah=Shiraz, Garnacha=Grenache, Ploussard=Poulsard via GRAPE_SYNONYMS) are canonicalized so the same
wine labelled differently is not over-rejected (adversarial-review fix).
P3 — producer recombination via hyphenated first name: "Jean-Marie Fourrier, Bourgogne, Chardonnay"
was billed "Jean Biecher / Marie Vin d'Alsace" ('jean' matched the wrong winery, 'marie' — half of the
hyphenated first name — matched its NAME). New hyphen-aware producer span: when one half of a hyphen-group
matches the candidate winery, its partners join the producer span, so the real surname ('fourrier') becomes
the kernel and, absent from the candidate, the pre&&post invariant rejects. Monotonic (only hardens post).
Fixed — P1: bare-appellation query billed a Premier/Grand Cru of the right producer (test 2026-06-14)
A bare-village name query ("Benoit Moreau Chassagne-Montrachet") resolved to a higher-rated 1er/Grand
Cru lieu-dit of the SAME producer ("…1er Cru La Maltroie") because, at equal word-match score, the
explore ranker's rating tie-break (rating*0.01) favoured the more prestigious cru. Right producer, wrong
cuvée (and a very different price). Found via the 65k-wine ground-truth test corpus (3 cases).
Fix: cruMisalignmentPenalty — an ADDITIVE tie-break (0.1) subtracted in processExploreMatches when a
candidate carries a Premier/Grand Cru level the query did NOT ask for. 0.1 > the rating tie-break (≤0.05)
so the village wins; 0.1 < one word-match (≥0.5) so it NEVER flips a real match (a cru that matches a
REQUESTED lieu-dit stays on top). Floor 0.001 + the >0 filter preserve coverage (a lone cru still resolves).
The entries path (rankWineEntriesByQuery) already prefers the village via its matchCount/nameWords
length-normalization — penalty deliberately omitted there. TDD 7 cases, adversarial review SHIP. 421 tests.
v0.4.69 (2026-06-13)
Fixed — R1: homonym-cuvée bills an arbitrary cuvée of the right producer (audit 2026-06-13, SEV-3)
The cuvée gate (candidateMatchesQueryCuvee) passed VACUOUSLY when the requested cuvée is
homonymous with the producer. "Clos des Grillons – Grillons": the cuvée word "grillons" collapses
into the winery (isWineryWord) → baseDistinctive empties → gate accepted ANY cuvée of the right
producer (it billed "Le Clos des Grillons / Œillet Rouge"). Right producer, wrong cuvée billed.
Guard: when baseDistinctive is empty AND a winery word repeats ≥2× in the raw query tokens
(producer span + homonym cuvée), require that repeated word in the candidate NAME or region;
else not-found. A bare-producer query (no repeat) is untouched → its top wine still resolves.
Adversarial-review hardening (Finding 1): the found-check scans NAME + region/appellation
(same haystack as B1), so an estate flagship whose repeated word is the APPELLATION
("Château Margaux" + "Margaux" column → "Pavillon Rouge") is NOT over-rejected, while a true
producer-homonym ("grillons", never an appellation) stays rejected. TDD: 6 RED→GREEN cases.
Companion fix (caught by the live A/B gate): rejecting the right-producer arbitrary cuvée made the
cascade fall through to the WRONG producer ("Clos des Grillons – Grillons" → "Clos des Fous / Grillos
Cantores", a Chilean Cabernet) via grillons≈grillos in the candidate NAME, with coherent=∅ so post
was vacuously true. Closed the vacuous-post hole: when the coherent set empties but distinctive words
remain (all producer-span), accept only if a DISTINCTIVE query word (excl. stop/hierarchy articles)
matches the candidate WINERY (producerConfirmed) — keeps the legitimate "cuvée producer" order
(Sassicaia Tenuta San Guido) while rejecting inter-entity recombination. Monotonic (only hardens
pre&&post; reviewer-proven NEW.post ⟹ OLD.post). 414 tests.
v0.4.68 (2026-06-13)
Hardened (adversarial review) — recovery guards
Cuvée-evidence guard (the critical one): a producer-index recovery is accepted only when a
distinctive non-appellation/non-grape query word matches the candidate NAME (hasCuveeEvidence,
no B1 fallback). Without it, a degenerate single-token producer guess ("papes") + fuzzy + the
gate's appellation fallback billed the generic wine of a HOMONYM estate ("Clos des Papes –
Châteauneuf-du-Pape" → Caves/Château des Papes CNDP) — the C1 class reopened. Verified RED→GREEN
on the colliding-estate regression test.
Single-word producer guesses that are region names are refused (Margaux class — SEV-4 R8 lesson);
Index pivot requires UNIQUENESS (≥2 matching wineries → ambiguous → no pivot) and PREFIX
coherence (Clos≠Château/Caves des Papes);
Tier-b respects the run's matchingMode (basic runs don't pay the advanced winery-page fallback).
Added — end-of-run second pass (producer-index reuse + 429 retry)
Coverage (audit 2026-06-10 residual classes 1+2): on a hard 250-wine catalog in
advanced mode, 36% of the 55 not-found rows had a producer that ANOTHER wine of the
SAME run resolved (e.g. "De Moor – Azimut" failed while other De Moor wines resolved),
and 5% failed inside a 429 window. A bounded second pass at end of run now recovers
both classes:
Name-search not-found rows are buffered instead of pushed immediately (URL-path
errors keep the immediate push). Mid-run checkpoints count buffered rows as errors
(documented approximation; the final summary is exact).
A run-level producer index records {wineryId, winerySeoName, wineryName} for
every resolved name-search primary.
Tier (a): rows with rate_limited_during_search > 0 retry the full
searchWineByName once (the 429 window has cooled) → _strategy: '2nd-retry-429'.
Tier (b): otherwise, the producer phrase (producerHint ?? dash-left segment ??
deriveProducerGuesses) is matched against the index with strictProducerMatch
(ALL distinctive words, fuzzy ±1 — never a partial match), then resolved via the
new resolveWineFromKnownWinery (explore-by-winery + page-2 rule + winery-page
wine-list fallback) → _strategy: '2nd-producer-reuse'. Acceptance gates
(cuvée gate, ranking, non-winery coverage) are STRICTLY unchanged — recovery is
pure discovery.
Tier (c): no 429 context and no index hit → the original error row is pushed
unchanged (still unbilled), exactly once.
Recovered rows flow through the NORMAL path (enrichment, dedup vs processedKeys,
finalize, atomic push+charge — billed as confirmed successes). If the recovered wine
was already pushed under another query, the original error row is pushed instead
(traceability preserved, no duplicate billed row).
Bounds & safety: one pass, no recursion, one attempt per row, same inter-wine sleeps,
respects chargeLimitReached, per-row try/catch (a recovery failure never kills the
rest — the original error row is pushed).
; the 2nd-* strategies are counted in output.strategies; the
final status message appends (+N recovered) when N > 0.
Changed
run.ts: the per-row enrich/dedup/count/finalize/push+charge block is extracted into
a shared pushRow helper (used by both passes); the primary-row construction from a
SearchWineByNameResult is extracted into wineryResultToPrimary (shared with
processName). No behavior change on the main path.
v0.4.67 (2026-06-13)
Changed — README: faster-support guidance
The feedback section now asks issue reporters to include their run ID and suggests enabling
Share run data with developers (Apify Settings → Login & Privacy) so the maintainer can open
the shared run directly and diagnose matching problems in minutes. Doc-only change.
v0.4.66 (2026-06-11)
Added — audit N4 (KV cache slug→wineryId)
Audit N4 (cost/throughput): findWineryIdBySlug re-fetched the same
/wineries/<slug> page on every call — catalog reruns paid the full slug-probing
fetch volume again. The slug→wineryId mapping is now cached in the vivino-wine-cache
KV store: positive entries 14 days (CACHE_TTL_WINERY_SLUG_MS), negative entries
(404 slug) 3 days (CACHE_TTL_WINERY_SLUG_NEG_MS). Keys: winery-slug-<slug>.
Only the MAPPING is cached, never the page HTML (700-900 KB): a cache hit returns
{ wineryId, html: null }. The two call paths that need the HTML (the Advanced
wine-list fallback in resolveFromWinery, and the Strategy 2.1 discovered-slug
fallback) lazily re-fetch it via fetchWineryPageHtml — same cost as a cache miss
on that rare path, zero extra fetches in Basic mode (which never uses the HTML).
Transient failures (403/429/5xx/network) are NEVER cached; a 200 page without an
extractable winery id (possible interstitial) is not cached either.
Audit I9 (observability): last-run-summary (KV, checkpointed every 20 wines +
final) now carries a quality block: nullRates = per-field null-rate percentages
(wine-core computeNullRates) for the key fields price, alcohol, image_url,
region, computed over the RESOLVED primary rows only (error rows are all-null by
construction and would drown the drift signal; alternatives are excluded too);
sampleSize; and errorRate = error rows / primary rows. A rising null rate between
runs is the earliest selector-rot signal.
Loud status on degradation: when errorRate > 0.5 (ERROR_RATE_ALERT_THRESHOLD),
the FINAL terminal status message is prefixed with ⚠ so a degraded run is visible
at a glance in the Console run list.
Memory-light: only the 4 tracked field values are sampled per resolved primary —
never full rows.
394 tests (3 new), typecheck clean.
v0.4.64 (2026-06-11)
Added — audit N2 (explore-by-winery page 2)
Audit N2 (matching coverage): the explore-by-winery call returns at most
per_page=50 records while large catalogs carry more (records_matched up to >100 —
114 on the real Penfolds fixture), so a requested cuvée beyond the first 50 was
unreachable on the S1 path. Now, when page 1 yields NO gate-passing candidate AND
records_matched > 50, a single page=2 fetch is made and evaluated with the exact
same gates (relevance probe → ranking → non-winery coverage). Strict cap at page 2 —
never a page 3 (no match across 100 entries is a plausible not-found, not a fetch pit).
When page 1 satisfies, NO extra fetch (zero cost on the common path).
fetchExploreByWinery gains an optional page option and propagates
explore_vintage.records_matched; the page-1/page-2 evaluation is factored into
evaluateExplorePage inside resolveFromWinery (S1 and the 2.1b/2.4 pivots benefit).
Audit Q3 (output traceability): new public column requested_vintage = the vintage
year parsed from the INPUT (free-text/catalog name, or a URL's ?year= param), echoed on
EVERY pushed row — resolved, alternative AND error rows, name and URL paths alike. Null
when the input carried no vintage. Distinct from vintage (the RESOLVED year, which can
differ when Vivino lacks the requested year) — comparing the two verifies the delivered
row matches the asked-for year. Null-coerced by finalizeRow (column always present).
dataset_schema gains the column (fields + Detailed view, next to vintage).
387 tests (5 new), typecheck clean.
v0.4.62 (2026-06-11)
Added — audit N1 (explore payload expansion)
Audit N1 (extraction coverage): 12 new public columns mapped from fields the real
explore response already carries but the mapper dropped: merchant_url (price.url, the
merchant buy link), discounted_from / discount_percent (price discount; Vivino sends
discounted_from: 0 + discount_percent: null when no discount), bottle_volume_ml
(price.bottle_type.volume_ml — 750/1500…), vfm_score / vfm_category (price-level
value-for-money), is_natural (wine.is_natural boolean), winery_id /
winery_seo_name (wine.winery — stable join keys), region_id (wine.region.id),
country_code (wine.region.country.code, origin country), label_image_url
(image.variations.label_large with label_medium/label fallback, https:-prefixed like
image_url). All columns always present on every pushed row (null-coerced by finalizeRow).
HTML (URL) path: finalizeRow now COPIES the privately-parsed _wineryId /
_winerySeoName into the public winery_id / winery_seo_name before stripping them,
so URL rows expose the producer ids too.
dataset_schema gains the 12 columns (fields + Detailed view).
382 tests (17 new fixture-driven), typecheck clean.
v0.4.61 (2026-06-11)
Added — audit C2-a (HTML-path price from the inline island)
Audit C2-a (extraction coverage): on non-vintage wine pages the JSON-LD offers is an
array of AggregateOffer with NO price key → the HTML path emitted price: null even
though the page carries the ship-to market price inline. New bounded fallback
extractInlineMarketPrice: anchored on the "prices_and_availability" block of the main
wine's <script> (same script as the island, often BEYOND the 60k island cap), cut before
"merchants_with_prices". Primary = availability.price.amount (current offer for the
ship-to market, same semantics as JSON-LD offers.price); fallback =
availability.median.amount (market median); currency from market.currency.code.
Plausibility guard 0 < price < 100 000.
JSON-LD offers.price stays primary (Raveneau vintage fixture keeps 1695 EUR). Verified on
real fixtures: Cloudy Bay 18978 → 27.95 EUR (median 28.25), H3 Wines 1136600 →
13.88 EUR. Carousel/highlight "price":{…} objects are never taken (no
prices_and_availability anchor → null).
Audit I11 (rating semantics): new public columns wine_average_rating /
wine_ratings_count = the wine's rating across ALL vintages, distinct from
average_rating / ratings_count which KEEP their vintage-level semantics (explore:
vintage.statistics.ratings_*; HTML: JSON-LD aggregateRating). Always present on a pushed
row (coerced to null by finalizeRow when the source carries no wine-level stats).
Explore path: wine-level read from wine.statistics.wine_ratings_* (documented location)
with fallback to vintage.statistics.wine_ratings_* — where the real explore response
actually carries it (verified on the fixture).
HTML path: the inline island carries NO wine_ratings_* keys; the wine-level lives in the
island's "statistics":{"status":…,…,"vintages_count":M} object — vintages_count is the
discriminant (vintage-level stats carry reviews_count instead). Verified on real pages:
Raveneau 2010 vintage page → average_rating 4.7/41 (JSON-LD, vintage) vs
wine_average_rating 4.6/782 (island, wine).
dataset_schema.json: 2 new columns + detailed view.
358 tests (8 new).
v0.4.59 (2026-06-11)
Added — audit I10 (zero-fetch taste profile from the explore payload)
Audit I10 (extraction coverage): the explore API response already embeds the FULL taste
payload (matches[].vintage.wine.taste — structure + flavor, same shape as the tastes-API
response). exploreMatchToScrapedWine now maps it through the existing
tastesToTasteProfile mapper (no duplication) → explore-ranked name-search rows carry
taste_profile with ZERO extra fetch.
enrichWine skips the /tastes API call when the row already carries a taste profile;
HTML-path rows and explore matches without taste keep the API fallback.
The includeTasteProfile opt-out is unchanged: finalizeRow still strips the column when
the option is off — the option controls the column's PRESENCE, not its cost.
Adversarial-review hardening (Cheval Blanc guard): a color token belonging to the
PRODUCER name ("Château Cheval Blanc", "Mas Blanc") is not a variant request — without
neutralization, the type_id penalty demoted Cheval Blanc's grand vin (red) below its white
second wine. neutralizeProducerColor nulls the color when its token appears as a standalone
word of a candidate winery name or the producerHint (safe direction: disables the tie-break,
pre-I1 behavior). Also: alternativeCount no longer early-breaks (color promotion makes the
ranked list non-monotonic; full bounded scan). 3 new tests.
Audit I1 (SEV-3, measured 7% cross-mode instability): when a query carries a standalone
color token (Blanc/Rouge/Rosé — also White/Red/Pink), the rankers did not use it to
disambiguate color VARIANTS of the same lieu-dit. Real production disagreements (same query,
different runs/modes billed different wines): «Caroline Morey Beaune 1er Cru Les Grèves
Blanc» billed the ROUGE variant (w8450274) on one run and the generic «Les Grèves»
(w7461261) on another; «Domaine Dujac Morey-Saint-Denis Blanc» billed the white (w92283) or
the generic red (w92266) depending on the path. Rouge vs Blanc is a different product/price
benchmark for a wine merchant.
New helpers in scoring.ts: queryColor (standalone color-token detection — compound grape
names like «Pinot Blanc»/«Sauvignon Blanc» are NOT a color; mixed/no signal → null),
candidateColorConflict, colorScoreFactor, TYPE_ID_COLORS (Vivino type_id 1=Red,
2=White, 4=Rosé; 3=Sparkling stays neutral). Factors: COLOR_CONFLICT_PENALTY (×0.5) on a
conflicting candidate, COLOR_AGREEMENT_BOOST (×1.1) on explicit agreement.
Color is a TIE-BREAK, never a hard filter: a candidate with NO color word/type signal is
neutral (Vivino often omits the color word on the canonical variant — the generic
«Les Grèves» may BE the white). Wired into rankWineEntriesByQuery (seoName words only),
and into the explore path (processExploreMatches scoring + rankExploreCandidates) where
the wine type_id is a stronger signal than name words. A color-conflicting candidate is
also demoted AFTER vintage / exact-appellation promotion: color correctness outranks
carrying the requested year of the WRONG color variant.
343 tests (16 new: 12 helper unit tests + Caroline Morey entries case + no-color regression
witness + Dujac vintage-promotion case + type_id exact-tie break + neutral-candidate
witness + Bourgogne Blanc gate guard). The cuvée gate (candidateMatchesQueryCuvee) is
untouched — the GRAPE_WORDS fallback (plain «Bourgogne Blanc») still resolves.
Audit C1 (SEV-4, measured 5% wrong-billed on hard lists): candidateMatchesQueryCuvee
validated the query's distinctive words against one folded haystack, so query tokens could
recombine ACROSS candidate entities — one word matching the winery, another the wine name,
with no single entity coherently matching. Three real wrong BILLED production matches:
«Clos des Grillons Rouge» → a Chilean Cabernet by Clos des Fous ('grillons' fuzzy-hit
'grillos' in the name, 'clos' hit the winery); «Domaine Chaume-Arnaud Pontias» → a Chablis
by Domaine Jérémy Arnaud ('chaume' in the name, 'arnaud' in the winery); «Roc des Pins
Grenache» → «Les Pins» by Closerie Saint Roc ('roc' in winery, 'pins' in name).
Fix is INSIDE the gate's acceptance logic (no producer filter, no producerHint override):
query words that STRUCTURALLY belong to the producer span are excluded from cuvée evidence
via two lifts — (a) backward adjacency: the word right before a winery-matched word is the
other half of the producer's name (Chaume-Arnaud); (b) possessive forward lift: in
« de/des/du [article] » the trailing word is part of the
producer's name (Clos des Grillons, Roc des Pins). The cuvée kernel is then chosen
from the remaining coherent words; when none of them appears in the candidate's name/region
the gate rejects (honest not-found, not billed). Note: the remediation plan's literal
"kernel must match in name" rule rejected none of the three cases (the mis-kernel DID match
in the name each time) — the structural producer-span exclusion is what kills the
recombination signature.
Strictly additive toward rejection — enforced as a hard invariant (adversarial-review
fix): the gate returns pre && post (verdict on the pre-lift distinctive set AND on the
coherent set), so the producer-span lifts can only flip accept→reject, never reject→accept.
Without it, a producer-LAST query ("Meursault Charmes Roulot") had its lieu-dit swallowed
by the backward lift, the kernel fell to the appellation, and a NEIGHBOURING lieu-dit
("Les Luchets") re-passed the gate — re-opening the v0.4.34 SEV-4 class. 3 producer-last
regression tests added. All generic fallbacks kept: appellation-only fallback (Bourgogne
Aligoté), Penfolds Grange trailing-words tolerance, region-carried kernel (B1), fuzzy ±1
producer spelling (Bernaudau). fuzzyMatch's min length 4 already prevents stop-class
article bridges (des↛les) — now locked by a test.
327 tests (12 new: 3 production rejection cases + 3 producer-last invariant cases +
stop-class fuzzy lock + 5 explicit
accept witnesses). Gate signature unchanged (winery was already a separate parameter);
no call-site changes.
v0.4.56 (2026-06-11)
Added — audit I8 (error rows distinguish rate-limited from absent)
Audit I8 (SEV-3): a name-search not-found row carried only error/errorMessage;
the 429 counter was run-level only — a 429-starved miss was indistinguishable from a
genuine Vivino absence (confirmed pain: the Dauvissat investigation). Name-search
error rows now carry a public rate_limited_during_search field: the number of 429s
encountered DURING the search for THAT wine (snapshot of the run-level trip counter
around searchWineByName). 0 = likely a real absence; >= 1 = retry in a smaller
batch before concluding. Resolved rows and URL-path error rows do NOT carry the field.
Audit I7 (SEV-3): a TIMED-OUT or OOM-killed run left NO last-run-summary and no
progress status (both were written once at run end) — diagnosing a 3h timed-out 670-wine
run required log archaeology. The summary builder is now a closure (buildSummary(partial))
over the run counters, and every 20 wines the loop checkpoints a partial: true snapshot
to the last-run-summary KV key plus a progress status message
(Processed N/total — X errors, Y 429). The final summary carries partial: false.
Checkpoint writes are best-effort (try/catch, log.debug) — they can never kill the run.
New errorCount run counter (pushed error rows) feeds the progress message.
Audit N12 (SEV-4): the final Actor.setStatusMessage now passes
{ isStatusMessageTerminal: true } so the Console keeps it as the run's terminal status,
and appends (stopped: charge limit) when the run stopped on eventChargeLimitReached.
v0.4.54 (2026-06-11)
Fixed — audit I12 (pin HTML extraction to the main-wine JSON island)
Audit I12 (SEV-3): parseWineHtml extracted winery/region/country/wine name/grapes/
alcohol/food (and the extractMax rating fallback) with PAGE-GLOBAL regexes taking the FIRST
occurrence anywhere in the ~1 MB page. A Vivino page embeds other wines inline (carousels,
recommendations): any foreign JSON appearing before the main wine's silently contaminated
every field. New findWineIsland(html, wineId) locates the main wine's inline JSON island
(anchor "wine":{"id":<wineId>, bounded by the enclosing </script> capped at 60 000 chars
— deepest main-wine field observed on real 1.1 MB pages sits at ~+49 100). All inline-JSON
regexes now run against the island; page-global extraction remains ONLY as a fallback when
the island is absent, surfaced by a log.warning drift canary. JSON-LD stays the primary
rating/price source (untouched). 10 new tests (synthetic contamination, fallback canary,
3 real-page fixtures × 2 — skipped gracefully when local gitignored fixtures are absent —
plus a perf sanity: 1.1 MB page parsed in ~83 ms).
v0.4.53 (2026-06-11)
Fixed — audit N10 (/wines/{id}?year= URLs keep their vintage)
Audit N10 (SEV-3): parseVivinoUrl always returned vintage: null for /wines/{id} URLs,
even when a ?year= parameter was present. The /seo/w/{id}?year= branch already read
searchParams.get('year') correctly; the /wines/{id} branch now does the same.
1 new test.
Audit N5 (SEV-4): fetchTasteProfile cached all-null taste profiles for 30 days. An all-null
structure + zero flavor notes has no value to cache (re-fetching may recover data later). Added
isUsefulTasteProfile gate before setCached; profiles with at least one non-null structure
field or ≥1 flavor note are cached as before. 2 new tests.
Audit N6 (SEV-4): rateLimiter.waitIfNeeded() was called once before the retry loop in both
fetchApi (api-client.ts) and fetchHtml (html-client.ts). Subsequent retry attempts skipped
the limiter entirely. Moved await rateLimiter.waitIfNeeded() inside the loop in both functions.
1 new test using spy on sharedRateLimiter.waitIfNeeded.
Audit N11 (SEV-4): getCached returned null for expired entries but left the stale record in
KV storage indefinitely. Added store.setValue(key, null) on expiry (Apify KV convention: null
value deletes the record). 1 new test verifying the delete call.
v0.4.51 (2026-06-11)
Fixed — audit C2-c (currency null on price-less rows) + N8 (stale field docs and policy comments)
Audit C2-c (SEV-3): currency was always 'EUR' even when price is null. Three sites
fixed: EMPTY_SCRAPED constant, exploreMatchToScrapedWine, htmlToScrapedWine. Rule: currency
is non-null only when a price is present (price?.amount != null); falls back to 'EUR' when the
price exists but currency.code is absent (keeps backward compat for priced rows). currency type
widened to string | null in ScrapedWine. One existing test updated (htmlToScrapedWine with no
price in HTML asserting currency: 'EUR' → now asserts currency: null); 5 new tests added.
Audit N8 (SEV-5 docs): README "What data does this scraper extract?" table: Price, Image,
Alcohol content changed from "Yes" → "When available*"; footnote added explaining availability
conditions. Stale _noCharge comment in types.ts updated to reflect the 2026-06-02 policy (all
error rows unbilled). CLAUDE.md: version, test count, memory default, billing rule updated.
Audit I2 (SEV-3): result.chargedCount > 0 was unreliable — apify SDK v3.7.2 merges the
synthetic apify-default-dataset-item event into chargedCount (typically 2 per call). The
=== 0 failure-detection branch was unreachable in practice. Fix: snapshot
Actor.getChargingManager().getChargedEventCount(PPE_EVENT_WINE) before the push and compare
after; billed = countAfter > countBefore. Deleted the dead else if (chargedCount === 0) branch.
Audit I13 (SEV-3): alternative rows (isAlternative = true) called processedKeys.add(key),
which meant a later query whose PRIMARY is the same wine would be silently skipped (no billed row,
no error row). Fix: alternatives CHECK the set (still deduped if already pushed by a primary) but
no longer RESERVE the key via add().
Audit I14 (SEV-2): a transient pushData platform error simply counted a charge failure and
lost the row. Fix: one automatic retry of pushData(scraped, PPE_EVENT_WINE); if the retry also
fails, a best-effort plain Actor.pushData(scraped) (no event) preserves the data un-billed.
Uses per-event counter (I2 fix) to avoid double-billing if the first attempt silently succeeded.
5 new tests (I2a, I2b, I13, I14a, I14b); mock harness updated: getChargingManager added to
ActorMock, per-event counters tracked, beforeEach restores pushData to canonical impl. 268
tests pass.
v0.4.48 split a spaced dash and used the left segment as a producerHint. An A/B re-run on the
250-wine Rhône list showed it regressed: resolved 46% → 40%, with 3 previously-correct matches
broken (Monier-Perréol Châtelet, Coulet Brise Cailloux, Croze-Granier Élise). Cause: the extracted
producer hint (sometimes carrying a parenthetical, e.g. "Domaine du Coulet (Matthieu Barret)") fed
the strict producer filter and rejected correct candidates — the same failure mode as the rejected
LWIN producer-hint experiment. The original "dash = 68% error vs 36%" signal was confounded: dashed
entries are simply harder, more-specific cuvées, and some old "resolved" rows were wrong matches.
Reverted to the v0.4.47 matching behaviour. The A/B did surface a separate, real issue worth a
future look: a few wrong billed matches on obscure Rhône producers (Roc des Pins → Closerie Saint
Roc; Chaume-Arnaud → a Chablis). No code change beyond the revert. 258 tests pass.
v0.4.47 (2026-06-09)
Changed — document Basic vs Advanced matching + the 250-wine cap
README: new "Matching modes: Basic vs Advanced" section explaining when to use each (Basic for
clean names ~6 s/wine; Advanced for last-name/first-name order, omitted "& Fils", abbreviations,
~15-25 s/wine) and a "Why a run is capped at 250 wines" sub-section (Advanced throughput vs the
3 h timeout; memory is count-independent at ~350-390 MB). Added matchingMode to the input table,
refreshed the stale memory guidance (1024 MB default, no need to raise) and the large-volume tips
(split into ≤250-wine parallel batches), and corrected an apify blog URL missing ?fpr=.
input_schema: rewrote the matchingMode description to match (250 cap, 1024 default, billing-safe).
No code change.
v0.4.46 (2026-06-09)
Changed — cap wines input at 250 per run
input_schemawines gains maxItems: 250 — the Console and API now reject inputs over
250 wines. Rationale: a 670-wine Advanced run timed out at 3 h (461/670 processed; ~2.6-3.6
wines/min on hard micro-domaine names). 250 finishes in ~1.5 h worst-case, with comfortable
timeout margin. Memory is unaffected by count (per-wine streaming; ~350-390 MiB peak regardless),
so 1024 MB is ample. For larger lists, split into ≤250-wine batches (run them in parallel).
.actor/actor.jsondefaultMemoryMbytesanddefaultRunOptions.memoryMbytes 512 → 1024
(both are needed; the live default is also set via the Apify API since apify push does not
update defaultRunOptions on an existing actor). A 670-wine basic run reproducibly OOM'd
(exit 137) at 512 MB; steady working set is ~350 MiB but transient HTML-parse spikes on large
batches cross 512. 1024 gives comfortable headroom. Under pay-per-event the consumer price is
unchanged (charged per wine-result, not per compute).
Operational guidance (already in the matchingMode description): for messy/divergent lists use
Advanced matching at 1024 MB and split into ~130-wine batches — this resolves the
hard micro-domaine cases (Comtes Lafon Bouchères, Bernard Moreau Cardeuse, Bachey-Legros, etc.)
with zero regression, and keeps 429 rate-limiting low. (An LWIN-canonical normalization layer was
prototyped and rejected: it regressed matches — e.g. Château Les Carmes Haut-Brion — and added no
coverage over Advanced; kept in git history on feat/lwin-canonical-resolution for reference.)
v0.4.44 (2026-06-08)
Changed — Advanced-mode headroom: raise memory cap to 2048 MB + document batching
.actor/actor.jsonmaxMemoryMbytes 1024 → 2048, giving Advanced runs OOM headroom (Advanced + bundled match-table + large winery HTML pages could SIGKILL/exit-137 at 512 MB on big batches). defaultMemoryMbytes stays 512 (Basic is light).
input_schemamatchingMode description now tells users to run large Advanced jobs at 1024–2048 MB and to split into ~130-wine batches (each finishes well under the 3 h timeout and keeps rate-limiting low).
Investigation of the residual "reachable-but-failing" Advanced producers (Dauvissat / Dancer / De Moor): the failures are a mix of 429 (Advanced still issues ~41 rate-limit trips on a 126-wine batch) and cuvée spelling variants (catalog "Les Séchets" plural vs Vivino "Séchet" singular) — not a discrete bug. Conclusion: the right lever is smaller batches, not more fallback code (which would worsen 429). No matching code changed.
No code change; config + docs only. 258 tests still pass; typecheck clean.
v0.4.43 (2026-06-08)
Changed — not-found rows use null placeholders instead of "Unknown"
A not-found / error row now has name/winery/wine_type/region/country = null (was "Unknown"), so a run with many unmatched wines no longer looks like rows of fake "Unknown" wines — the absence is read from the error column. Resolved rows are unchanged (a real wine with a genuinely missing field still falls back to "Unknown"). These fields are now typed string | null; dataset_schema updated.
TDD: 1 new test; 258 pass. Typecheck clean.
v0.4.42 (2026-06-07)
Added — LWIN output enrichment from the shared AVA match-table (safe, zero-fetch)
Each name-search row now carries lwin (Liv-ex LWIN-7) + lwin_canonical (AVA canonical name) when the queried wine is confidently in the bundled AVA match-table — letting Vivino output be joined to an LWIN/AVA catalog. "Confident" = a new resolveLwinSafe (wine-core): exact key OR an order-independent token-set match, but only when exactly one canonical shares that token set (ambiguous short forms → null). Never guesses, so a returned LWIN is trustworthy. null when unknown (column always present).
Zero extra fetches and no change to Vivino matching — purely additive metadata, attached to primary, alternative, and not-found rows alike (so a wine absent from Vivino but known in AVA still surfaces its LWIN).
The shared AVA table is bundled (2021 entries); a regeneration was verified to reproduce it exactly (parity preserved).
TDD: wine-core resolveLwinSafe tests (exact, order-independent, ambiguous→null); 257 vivino + 97 wine-core pass. Typecheck clean. dataset_schema gains the two columns.
The fallback strategies added in v0.4.34–v0.4.39 (winery-page list, search-hit pivot, shortened-query, bare-producer search, plus the -fils/domaine-/maison- slug variants) collectively drove ~40-60 HTTP fetches per failing wine, tripping Vivino rate-limiting (429) on large batches — a 378-wine run reached only ~167 wines with an inflated error rate (resolvable wines failing on 429). New input field lets the user pick:
Basic (default): core slugs (forward + LastName↔FirstName reversal + single-word) + Strategy 1 (explore) + Strategy 2.1 (discovered slug) + Strategy 2.2 (top entry). ~3-6 fetches/wine, predictable cost, no 429 spiral. For lists named like Vivino.
Advanced (opt-in): the full cascade (Strategies 1.5 / 2.1b / 2.3 / 2.4 + all slug variants) for messy/divergent names — much slower; the run logs the mode + a timeout warning when a large list runs advanced.
All zero-fetch quality guards (cuvée gate, exact-appellation tie-break, IDF ranking, grape/fuzzy logic) stay active in both modes, so neither mode bills a wrong match. last-run-summary records matchingMode.
Fixed (fuzzy in the wine-entry ranker — 1-letter cuvée spelling variants)
rankWineEntriesByQuery's near-match branch now also accepts a fuzzy ±1 match (in addition to prefix), so a catalog cuvée spelled one letter off from Vivino's surfaces the right entry instead of a sibling: "Maltroye" → Vivino "Maltroie", "La Reine" → "la Reina". The cuvée gate already tolerated this; the ranker didn't, so the correct entry never reached the gate. Half weight (exact match still wins). → Pierre-Yves Colin-Morey "Chassagne Maltroye" resolves.
TDD: 1 new test; 255 pass. Typecheck clean.
v0.4.39 (2026-06-06)
Added (Strategy 2.4 — bare-producer search for hard winery-slug cases)
When every other strategy fails because the catalog winery label can't be turned into a Vivino slug AND the full-query search returns a different same-name / same-lieu-dit producer, the scraper now searches the producer name alone, finds the producer's winery via a strict all-words match (deriveProducerGuesses + strictProducerMatch), and resolves the requested cuvée from its full winery page. The strict match (every producer word must fuzzy-match the winery) prevents billing a wrong same-lieu-dit producer — e.g. a "Colin Morey" query resolves to Pierre-Yves Colin-Morey and never to Joseph Colin (which shares only "colin"). Runs only on otherwise-not-found queries.
TDD: 5 new tests (deriveProducerGuesses, strictProducerMatch); 254 pass. Typecheck clean.
v0.4.38 (2026-06-06)
Fixed (winery pivot — reach the full winery page from a search hit, run WzFuasI2nOhSGgGWf)
Search hits now pivot to the producer's FULL winery page via the winery slug read from the wine page. When the catalog winery label can't be turned into the Vivino slug (spelling variants, & Fils, domaine-des-…, locale prefixes), Strategy 1's slug guesses 404 and the search page surfaces only a subset of the producer's cuvées — so a requested cuvée absent from that subset returned not-found even though it's on Vivino (e.g. Comtes Lafon "Meursault Les Bouchères", Bernaudeau "Les Terres Blanches", Colin-Morey, Marc Colin). parseWineHtml now extracts the wine's wineryseo_name + id; new Strategy 2.1b takes a producer-matching search hit, reads its winery slug, and resolves the full winery range (explore + winery-page list) from there. The slug-loop body is refactored into a shared resolveFromWinery helper.
TDD: 1 new parser test (winery seo_name/id extraction); 249 pass. Typecheck clean. Internal _winerySeoName/_wineryId fields carry the slug through and are stripped before push.
v0.4.37 (2026-06-06)
Fixed ("& Fils" winery slug — Jean-Claude Bachelet cuvées were reachable all along)
generateWinerySlugs now emits -fils / -et-fils winery-slug variants. Vivino indexes "Jean Claude Bachelet & Fils" at slug jean-claude-bachelet-fils (winery_id 20909), whose page lists every cuvée — Murgers des Dents de Chien (w/1927348), Sous le Puits, Les Encégnières, Les Combes au Sud, Blanchot-Dessus, Bienvenues-Bâtard, … The catalog writes only "Bachelet Jean Claude", so that slug was never generated; the search page surfaced at most one Bachelet wine per query, too few for slug-discovery (needs ≥2). Earlier I wrongly classified those cuvées as Vivino absences — they were simply unreachable. The producer cores now also get -fils / -et-fils suffixes (high priority) so Strategy 1 fetches the full winery page.
Family words no longer pruned as non-winery tails. The E2 slug prune (v0.4.31) dropped any slug ending in a HIERARCHY_WORDS token — which wrongly included fils / fille / freres / pere. Those legitimately end a winery slug (…-fils, …-pere-et-fils), so they're now exempt; slug cap raised 12 → 14 to fit the variants.
TDD: 2 new tests; 248 pass. Typecheck clean. (Bachelet winery page content confirmed via Firecrawl before the fix.)
v0.4.36 (2026-06-06)
Fixed (producer spelling variant — from run SKaoR5e1NBohQrTu5)
Fuzzy (±1) producer-name matching recovers 1-letter catalog spelling variants. A producer the catalog spells one letter off from Vivino's was dropped: "Bernaudau" → Vivino "Stéphane Bernaudeau" — all the wines exist (les-coqueries, les-nourrissons, les-ongles-blanc) but processWineEntries's producer-word check (matchCount) and candidateMatchesQueryCuvee's winery-word exclusion both compared exactly, so "bernaudau" both failed the producer guard and was mistaken for a distinctive cuvée word no wine name contains. Both now use the existing fuzzyMatch (edit-distance ≤1), consistent with how cuvée tokens already tolerate singular/plural and typos. → the four Bernaudeau cuvées resolve.
TDD: 2 new tests; 246 pass. Typecheck clean.
Note: this was the only fixable item in that run's Bachelet/Bernaudau set. Jean-Claude Bachelet & Fils resolves for the cuvées Vivino actually carries (Macherelles, Les Aubues); the rest (Encégnières, Sous le Puits, Les Combes du Sud, Le Charmois, Murgers) are genuine Vivino absences for that producer → honest unbilled not-founds.
v0.4.35 (2026-06-06)
Fixed (multi-cuvée producers + run timeout — from the 378-wine run TiG3EzwWimsG98FsX analysis)
IDF-weighted wine-entry ranking (Ganevat / Labet multi-cuvée fix). For a producer with a wide range, the winery-page / search wine-entry scorer summed an equal +2 per matched query word — so the grape descriptor ("- Savagnin", "- Chardonnay") and the producer name, which recur across most of the range, outscored the actual cuvée word. The selector then checked only the top entry, so e.g. Ganevat "Les Rescapés" (the wine exists on Vivino, w/9432554) lost to "Les Résistants Savagnin" and returned not-found; Labet "La Reina Chardonnay" / "Cuvée du hasard" similarly mis-resolved. rankWineEntriesByQuery now weights each query word by inverse document frequency across the candidate entries (a word in many entries → weak signal; a rare lieu-dit → strong), so the distinctive cuvée surfaces. Only re-orders the score>0 set (same inclusion), so non-affected producers are unchanged. findWineFromWineryPage now also selects the first entry that passes the cuvée gate, not just the top — recovering the right cuvée if a sibling still edges ahead. Side benefit: fewer wasted wine-page fetches per query (throughput).
Default run timeout raised 1800 → 7200 s (.actor/actor.jsondefaultRunOptions.timeoutSecs). The 30-min default silently truncated batches (the 378-wine run TIMED-OUT at 1 h having done 128); 2 h is a ceiling — runs still stop when work is done, so small runs are unaffected — covering ~250 wines/run. Larger batches should still be split or given a higher per-run timeout (≈28 s/wine).
TDD: 2 new tests (IDF ranking); 241 pass. Typecheck clean.
Note: not every Ganevat/Labet cuvée is recoverable — some are genuinely absent from Vivino, and a few catalog spellings/abbreviations diverge from Vivino's seo (e.g. "La Reine" vs Vivino "la Reina", "chardo" vs "chardonnay"); those remain honest unbilled not-founds.
v0.4.34 (2026-06-06)
Fixed (producer resolution gaps, found in a 378-wine user test)
\uXXXX JSON escapes now decoded in regex-extracted winery/region/country (the real "& Fils" blocker).parseWineHtml's fallback regex pulled the winery name straight from the page's inline JSON without unescaping — so "Jean Claude Bachelet & Fils" arrived as Jean Claude Bachelet & Fils. The producer surname guard then tokenized the literal escape and took "u0026" as the surname, which (never in any query) dropped every wine of any "& Fils" / "& Fille" domaine — a whole class of Burgundy estates. Now decodeJsonUnicode unescapes the three regex-extracted JSON string fields, so the winery reads "Jean Claude Bachelet & Fils" → surname "bachelet" → the wines resolve. Also fixes accent mojibake (é → é) on that extraction path.
generateWinerySlugs rotates a 3-word producer to "first-middle-surname". Catalogs list producers "LastName FirstName(s)" — e.g. "Bachelet Jean Claude". The producer-order reversal (v0.4.24) only swapped the first two words → jean-bachelet (a different Mâcon winery); the rotation now also moves the leading surname to the end of the first three words → jean-claude-bachelet, with domaine-/maison- variants. General improvement for 3-word producers whose Vivino winery page lives at that slug (a 2-word producer just 404s the extra probe harmlessly).
Cuvée gate now matches the cuvée KERNEL, not a bare appellation (prevents billing a neighbouring lieu-dit). Recovering the "& Fils" producers exposed a pre-existing leniency: catalog rows are "Appellation Lieu-dit" ("Saint-Aubin Murgers des Dents de Chien"), so candidateMatchesQueryCuvee's first distinctive word is the appellation ("saint-aubin"), shared by every wine of the appellation. When the exact lieu-dit was absent on Vivino, the gate billed a neighbouring one (Bachelet Murgers → billed Bachelet Derrière la Tour). The gate now skips words that belong to the candidate's own region/appellation and requires the first non-appellation distinctive word (the lieu-dit kernel "murgers") to match — falling back to the appellation word only when every distinctive word is appellation (generic-named wines, B1, unchanged). searchWineByName now also threads the candidate region into the Strategy 2.2 / 2.3 gate calls so the kernel test applies on the search-page path (where the Murgers mis-match occurred). Penfolds-Grange-style trailing-descriptor tolerance is preserved (no region passed → kernel = first distinctive).
TDD: 7 new tests (1 parser decode, 3 slug rotation, 3 cuvée-kernel gate); 239 pass. Typecheck clean. E2E: Bachelet Macherelles / Les Aubues resolve correctly; Murgers (absent on Vivino for this producer) is now an honest unbilled not-found instead of a billed wrong lieu-dit; the v0.4.33 appellation fixes (Rousseau→Chambertin, Girardin→Montrachet) unchanged.
Note (throughput): the same 378-wine test timed out at the Actor's 1 h default, completing 128 wines (~28 s/wine, error cases being the slowest as they exhaust every strategy). For batches this size, raise the run timeout (≈3 h) or split into ~130-wine lots. No code change — pacing/anti-bot I/O is the floor.
v0.4.33 (2026-06-06)
Fixed (82-wine catalog revalidation on build 0.4.42, run qyb4aXvP0Rx8coumc)
Re-ran the original 82-wine ITQS catalog (run N9g6V8KyM01Kpqt92) on the current build. Error rate fell from 35/82 (build 0.4.18) to 4. Of those four, the real, fixable defect was two BILLED wrong-cuvée matches (the worst class — the user pays for the wrong wine):
Exact appellation now wins over vintage promotion (billed wrong-cuvée fix). When the query asked for a bare Grand Cru ("Rousseau Chambertin", "Girardin Pierre Montrachet"), rankExploreCandidates returned — and billed — the WRONG cuvée: Charmes-Chambertin Grand Cru (2022) instead of Chambertin Grand Cru, Puligny-Montrachet Les Enseignères instead of Le Montrachet Grand Cru. All *-Chambertin / *-Montrachet candidates tie on scoreWineMatch, then processExploreMatches vintage-promotes whichever has the requested year — so the wrong wine, having a closer vintage, won the primary slot and the correct one was demoted to an (unbilled) alternative. New promoteExactAppellationHead (scoring.ts) lifts the candidate whose name head is the exact appellation ("Chambertin Grand Cru") above same-suffix neighbours ("Charmes-Chambertin") before the per-wine collapse — cuvée correctness now outranks getting the exact year of the wrong wine. Vintage promotion still decides among vintages of the correct wine. Validated end-to-end on build 0.4.43: both now resolve to the right Grand Cru as the billed primary, with the others demoted to unbilled alternatives; 5 control wines (Roumier Bonnes-Mares, Sauzet Chevalier-Montrachet, Rousseau Ruchottes-Chambertin, Mugneret-Gibourg Échezeaux, Colin Marc Montrachet) unchanged — no regression.
Known limitations confirmed this revalidation (not billed — left as-is)
Producer-entity name divergence (Naudin Claire → Henri Naudin-Ferrand). "Naudin Claire" (the winemaker) is indexed on Vivino under the estate entity "Henri Naudin-Ferrand"; the Échezeaux Grand Cru is found on the winery page (w/1484043) but matchesProducer drops it because the catalog's first name "Claire" is absent from Vivino's "Henri …". This cannot be relaxed by name alone without re-admitting genuinely-distinct same-surname estates (e.g. Anne-Claude Leflaive vs Olivier Leflaive). Remains an unbilled not-found (3 vintages).
Pierre Girardin Richebourg — genuine Vivino absence (the producer's catalog there has no Richebourg; girardin-pierre-richebourg 404, no search hit). Correctly unbilled.
v0.4.32 (2026-06-05)
Added (winery-page fallback on the discovered slug — Vougeraie Musigny)
Strategy 2.1 now also falls back to the winery page's wine list. When the search page discovers the real winery slug (e.g. domaine-de-la-vougeraie, which we can't reconstruct from "La Vougeraie"), but explore-by-winery has no cuvée match, we now parse that winery page's own wine list (its HTML was already fetched for the id) — exactly like Strategy 1.5 does for the slug loop. Reaches rare cuvées absent from BOTH explore-by-winery and the search page but present on the winery page: Domaine de la Vougeraie Musigny Grand Cru (w/2376338) now resolves instead of a not-found.
TDD: 1 new test; 225 pass.
v0.4.31 (2026-06-05)
Changed (E2 — fewer HTTP fetches per query, run SEgHqs89Gta5PlYKU audit)
Pruned junk winery slugs.generateWinerySlugs no longer emits slugs ending in a hierarchy/appellation/négoce word (…-igp, …-pays, …-cru, …-neg) — a producer name never ends that way, so those only burned winery-page fetches (and 429 budget). Producer-prefix truncations are also capped at 3 segments (from 5), and the slug list is capped at 12. Producer-focused slugs (reversed, single-word, domaine-/maison- prefixed, négoce) are preserved. Cuts ~3-5 winery fetches per appellation-laden multi-word query.
Note: deeper perf levers (anti-bot sleep tuning, capping Strategy 2.3 shortened-query retries) were left untouched — they trade throughput for coverage/429-safety. The wall-clock cost is mostly I/O wait + pacing sleeps (compute billed is tiny, ~$0.002/12 wines).
TDD: 1 new test; 224 pass.
v0.4.30 (2026-06-05)
Fixed (A3 — abbreviated long-slug producers, run SEgHqs89Gta5PlYKU audit)
Winery-name filter no longer rejects abbreviated catalog producer labels.processWineEntries's winery guard required ≥50% of the Vivino winery's words to appear in the query — which wrongly skipped wines whose producer the catalog abbreviates: "De Vogue" is 1/4 of "Domaine Comte Georges de Vogüé", "Mugnier" 1/4 of "Domaine Jacques-Frédéric Mugnier". The HTML search page does return these wines (verified via the run log), but the filter dropped them. Now the guard only requires one producer word in common; the surname check (last significant winery word must be in the query) remains the real producer-identity gate. Fixes De Vogüé Bourgogne Blanc and Mugnier Bonnes-Mares.
Refactor: the Strategy-1 <70% non-winery-coverage check is extracted into a shared passesNonWineryCoverage helper (no behaviour change; reused for clarity).
Note: an explore text-search (/api/explore/explore?q=…) was prototyped and dropped — the live API ignores the free-text q param (returned nothing for de Vogüé while the HTML search page returned it), so it was a no-op that only added a request.
TDD: 2 new tests; T17 kept green; 223 pass.
v0.4.29 (2026-06-05)
Added (coverage — Strategy 1.5 winery-page wine-list, run SEgHqs89Gta5PlYKU audit)
New fallback for micro-producers absent from the explore API. For small domaines (e.g. Maison Glandien, Domaine Maxime Renaudin) the winery slug resolves but explore-by-winery returns no (or no relevant) matches — the wines exist only on the winery landing page. findWineryIdBySlug now returns the page HTML alongside the id; when the explore path yields nothing usable, findWineFromWineryPage parses the winery page's own wine list (every cuvée is linked as /<seo>/w/<id>) and runs it through processWineEntries + the cuvée gate. New _strategy value winery-entries (counted in the Strategy-2-fallback stat). Wine-page dedup (seenWineIds) is now shared across Strategy 1.5 / 2.2 / 2.3.
C1 (partial): added pays to HIERARCHY_WORDS so appellation-category noise ("IGP Pays d'Hérault") no longer becomes the gating distinctive token — lets generic-named Vin-de-France/IGP wines ("Rouge") match on their region.
TDD: 1 new integration test; 221 pass.
v0.4.28 (2026-06-05)
Added (coverage — A1/A2/D1, run SEgHqs89Gta5PlYKU audit)
A1 — prefix-augmented winery slugs. Vivino winery slugs often carry a domaine-/maison- prefix the catalog label omits. generateWinerySlugs now appends domaine-<core> / maison-<core> variants of the producer-core slugs (reversed, single-word, forward 2-word) as low-priority candidates — fixes e.g. "Renaudin Maxime" → domaine-maxime-renaudin, "Glandien Cruci" → maison-glandien. Cores already carrying a prefix are skipped (no domaine-domaine-).
A2 — négociant markers. New NEGOCE_TOKENS ("neg", "nego", "negoce", …). When a query carries one (e.g. "Leroy Neg …"), the producer is a maison: maison-<core> (core = words before the marker) is tried first, so it resolves to "Maison Leroy" rather than the unrelated "Domaine Leroy". The marker is also stripped from buildQueryWords so it never pollutes scoring or the cuvée gate.
D1 — appellation typo normalization. New APPELLATION_TYPOS map applied in buildQueryWords (mersault→meursault, echezaux→echezeaux, …) — fixes "Coche Dury Mersault Rougeots".
D1+ — consistent fuzzy ±1 (edit-distance) matching.fuzzyMatch now tolerates one insertion/deletion (not just one substitution) and is applied consistently across the four name checkpoints that previously used exact includes: distinctiveWordsFilter, the relevance probe, the <70% non-winery fallback, and the B1 cuvée gate (candidateMatchesQueryCuvee). Fixes singular/plural & missing-letter mismatches (cote↔cotes, village↔villages) — "Leroy Neg Côtes de Nuits village" now resolves to Vivino "Côte de Nuits Villages" (winery found via A2, wine no longer rejected by the multiple exact-match gates).
Winery-id extraction from landing pages (enables A1/A2).extractWineryIdFromText now also reads the app deep-link vivino://?winery_id=<N> present on every /wineries/<slug> page. Small producers (few wines) embed their id ONLY there — without this, the A1/A2 slugs resolved to a real page (HTTP 200) but the id could not be parsed, so they still 404'd logically. Confirmed via Firecrawl (Domaine Maxime Renaudin = 293195).
Deferred: A3 (extracting long prefixed winery slugs like domaine-comte-georges-de-vogue from the search page) needs winery seo_name extraction, not a prefix-cap change (which would over-extend into appellations) — to be done separately.
TDD: 10 new tests; 217 pass.
v0.4.27 (2026-06-05)
Fixed (matching — B1 cuvée gate, run SEgHqs89Gta5PlYKU audit)
No more wrong-cuvée matches billed as success. When a name query had a single distinctive cuvée/appellation word, distinctiveWordsFilter (active only at ≥2 distinctive words) let the explore path return ANY wine of the right producer — e.g. "La Vougeraie Musigny" → Bonnes-Mares Grand Cru (matchScore 0.5), "Mugneret-Gibourg Echezeaux" → NSG Chaignots (1), "Girardin Richebourg" → Vosne-Romanée (1), "Selosse Carelles" → Millésime (0.5, with the real Carelles demoted to alternative). These were returned as successful, billed rows.
Fix: rankExploreCandidates now applies a region-aware cuvée gate — the query's first distinctive token must appear in the candidate's name, region, or appellation (candidateMatchesQueryCuvee gains an extraHaystack param). Wrong cuvées are rejected; the Strategy-2 search fallback then has a chance to surface the correct wine, and otherwise the row is a (non-billed) not-found instead of confidently-wrong data. Generic-named wines ("Bourgogne Blanc", "Rouge") still pass via the region/appellation haystack.
v0.4.26 (2026-06-05)
Fixed (alternatives quality — smoke findings on build test/0.4.22)
Alternatives are now DISTINCT wines, not other vintages of the same wine.rankExploreCandidates collapses candidates by wineId (keeping the best-ranked vintage) before selecting alternatives, and pulls a wider explore pool (maxResults: 20) so enough distinct cuvées remain. Previously "Domaine Leflaive Puligny-Montrachet" returned the same wine (Les Pucelles) three times.
Each alternative must individually clear a relevance bar.alternativeCount now gates every runner-up on closeness to the top (within ALT_GAP_RATIO) — or surfaces alternatives wholesale only when the top match is itself low-confidence. A near-tie no longer drags in a far, irrelevant runner-up (e.g. a score-1 Gevrey alongside a score-9 primary).
Smoke also validated the v0.4.24 producer-order fix: Charles Lachaux (slug charles-lachaux, winery 273769) now resolves from the catalog label "Lachaux Charles …".
Deploy hygiene
Excluded tests/, vitest.config.ts, and dist/*.map from Apify sourceFiles[] via .gitignore (CLAUDE.md rule #3): test files carried internal commentary and the bundle source map exposed bundled wine-core source. Files remain tracked in git; only the deploy upload is trimmed.
v0.4.25 (2026-06-05)
Added (alternatives on uncertain matches)
Alternative candidates delivered when a name match is uncertain. When the best name-search match scores below the confidence floor (ALT_TOP_SCORE_FLOOR) or the runner-up is a near-tie (within ALT_GAP_RATIO of the top), the Actor now also emits up to ALT_MAX_ALTERNATIVES (2) alternative rows — other cuvées from the same producer — so the user can disambiguate.
Alternatives are fully enriched (taste profile + reviews follow the same opt-ins as the primary).
New public fields: matchScore (relevance of the match; on every name-search row) and isAlternative (true on alternative rows).
Alternatives are never billed — _noCharge / isAlternative rows are pushed via plain pushData (no wine-result event). You pay for one primary result per input.
Alternatives apply to explore-ranked name searches (Strategy 1 / 2.1) only; URL inputs and Strategy 2.2/2.3 always yield a single row. Confident matches produce no alternatives.
Implementation: searchWineByName surfaces matchScore + alternatives (DRY rankExploreCandidates / selectionFromRanked helpers shared by s1 and s2.1); alternativeCount is a pure, unit-tested decision function. processName now returns ScrapedWine[] and the runActor loop iterates rows, deduping and enriching each. README + dataset schema updated; the stale "error rows are billable" note corrected (error rows are not billed since the 2026-06-02 policy).
v0.4.24 (2026-06-05)
Fixed (matching)
Producer-name-order reversal. Catalogs often list producers as "LastName FirstName" (e.g. Lachaux Charles) while Vivino indexes them "FirstName LastName" (charles-lachaux). generateWinerySlugs now also emits the reversed 2-word producer slug (winery prefix stripped) as an additional candidate, so wineries like Charles Lachaux Côte de Nuits-Villages "Aux Montagnes" resolve instead of 404-ing on every lachaux-charles-* slug. The forward slug stays the first candidate; the reversal is additive. Uses producerHint when available, falling back to the query's winery part.
v0.4.23 (2026-06-04)
Changed (input schema)
Renamed the wines field title to "Wines or Vivino URLs or mix of both" to make explicit that an entry can be a wine name, a Vivino URL, or any mix (each line is auto-detected).
Removed the legacy wineNames and wineUrls input fields — superseded by the unified wines field. collectInputs (run.ts) no longer reads them. ⚠️ Backward-compat: integrations still sending wineNames/wineUrls will have those entries ignored — migrate to wines.
v0.4.22 (2026-06-03)
Fixed (SEV-3 matching — producer filter too strict on catalog labels)
matchesProducer required every producer-hint word to be a substring of Vivino's winery name (.every), and filterByProducer drops all results when none match. Once parseWineName started supplying a producerHint (v0.4.19) and it was threaded into Strategy 2 (processWineEntries), a catalog label carrying a non-winery token — e.g. "Leroy Neg" (négociant marker) vs Vivino's "Maison Leroy" — dropped every valid result → NAME_SEARCH_ERROR. Now only distinctive (≥4-char) producer words must all match; short tokens (négociant markers, initials) are tolerated, with a fall-back to all words when the hint has no ≥4-char word. Preserves precision (e.g. "Pierre Girardin" still does NOT match "Pierre-Yves Colin-Morey"). filterByProducer also treats a hint that normalizes to no usable words (prefix + initial) as no-hint instead of dropping all.
Changed (cleanup)
search.ts: destructure producerHint once in searchWineByName instead of repeating opts.producerHint ?? null at six call sites (behavior-preserving).
Error rows are no longer billed. Previously NAME_SEARCH_ERROR (definitive not-found after a real search) was charged the wine-result event ("success or error item", the X-1 stance). Following run N9g6V8KyM01Kpqt92 (a catalog batch that showed 49% billed error rows — a figure inflated by the now-fixed parseWineName regression) and a monetization review, the policy is now charge on confirmed success only: every error row (NAME_SEARCH_ERROR, URL_PROCESSING_ERROR, infra/transient failures) is pushed to the dataset for visibility but never billed. Single gate in run.ts: noCharge = _noCharge || error != null. PPE doctrine — if the user did not get a wine, the event does not fire.
pay_per_event.jsonwine-result description updated to state success-only billing and that error rows are not charged (renders on the Pricing tab).
Tests
Updated 3 existing run tests (NAME_SEARCH_ERROR / too-short / mixed batch) to assert error rows are pushed but not charged; the per-run-summary test now verifies charged (1) < pushed (2). 187 tests passing. Typecheck + build clean.
WINE_TYPES had Vivino type_id 3 and 4 swapped (3: 'Rosé', 4: 'Sparkling'), a latent bug inherited verbatim from the monolith. Every sparkling wine was labeled "Rosé" and every rosé "Sparkling". Surfaced by run N9g6V8KyM01Kpqt92: Jacques Selosse Les Carelles Blanc de Blancs (a Chardonnay sparkling Champagne) came back wine_type: "Rosé" — only possible from type_id 3, which Vivino assigns to sparkling. Corrected to 3: 'Sparkling', 4: 'Rosé' (src/constants.ts). Affects both the explore and HTML paths (wineTypeLabel).
Added (DX — shipTo fallback so prices resolve)
New exported resolveShipTo(shipTo, countryCode) in run.ts. Vivino prices come from the Explore API keyed by country_code = the delivery market (= shipTo), not countryCode (origin filter only). When shipTo is empty/invalid but countryCode is a valid 2-letter code, the actor now falls back shipTo := countryCode and logs a warning. In run N9g6V8KyM01Kpqt92 the user set countryCode: "FR" with empty shipTo and got price null on 100% of rows; this fallback would have unlocked prices. Explicit shipTo still wins.
Fixed (SEV-3 migration regression — tab-separated catalog input not parsed)
The monolith's parseWineName (spreadsheet-column split + catalog cleanup + producer-hint derivation) was never ported during the TS migration, so tab-separated catalog rows (Producer⇥Cuvée⇥Vintage⇥Color) were sent verbatim to search — tabs and the color token included. Live run N9g6V8KyM01Kpqt92 showed the impact: vintage null on 100% of rows (the year is a middle column, and the TS parseWineQuery only matched a leading/trailing year), color/tabs polluting every query, producerHint never set, and same-wine vintages collapsing under the wineId::'' dedup key.
New src/wine-name.tsparseWineName(name, searchMode) → { cleanedName, vintage, producerHint }, faithfully ported from the monolith: multi-column rows (vintage from any column, color/level dropped via GENERIC_SKIP, winery prefix stripped from the query but kept in producerHint), the 2-column description ⇥ vintage|NV format (cleanCatalogText + region-prefix strip), and classic free-text (inline vintage + bottle-volume strip). French abbreviations expanded (NSG → Nuits-Saint-Georges, 1er → Premier, CDP, VV, …) so the validator recognizes them in Vivino names.
run.ts now parses name inputs through parseWineName; producerHint is threaded through searchWineByName → processExploreMatches / distinctiveWordsFilter / processWineEntries (previously hard-coded null, the documented "Phase 3" TODO). This restores per-vintage dedup, feeds producer disambiguation for short négociant names, and removes color/tab noise from the search query. searchQuery output unchanged for non-tab inputs (still the cleaned query).
Unparseable input URLs (wineId could not be extracted) are no longer billed: no fetch occurred, so zero work was done — now carries the internal _noCharge flag, consistent with the HTML-fetch-failure branches. NAME_SEARCH_ERROR (not-found after a real search) stays billable. Audit X-1.
v0.4.17 (2026-06-02)
Fixed (SEV-4 billing — infrastructure failures were charged)
HTML-fetch-failure rows (URL mode, URL_PROCESSING_ERROR / "HTML fetch failed") are now pushed for visibility but NOT billed: they carry an internal _noCharge flag that suppresses the wine-result event. An infrastructure/transient failure delivers zero value (Apify PPE doctrine). NAME_SEARCH_ERROR (no match found after a real search) remains billable — a definitive not-found is a delivered result per the documented policy. The _noCharge flag is stripped before the dataset push. Audit 2026-06-01 finding X-1.
v0.4.16 (2026-06-01)
Fixed (SEV-4 schema conformance — live-confirmed 27% of rows)
image_url is now always a string (or null). When a wine page's JSON-LD ships image as an array of URLs (observed on ~27% of live rows in the 2026-06-01 audit), the HTML parser previously emitted the raw array into image_url, violating the dataset schema (["string","null"]). Now coerced to the first URL (parser.ts:237).
Strategy 2.3 (shortened-query retry) now dedups wine-page fetches against Strategy 2.2 via a shared seenWineIds: Set<number> initialized at the top of the Strategy 2 block. Previously, on difficult queries (e.g. Penfolds Grange Bin 95 Shiraz where 2.2 surfaced Bin variants that failed candidateMatchesQueryCuvee), Strategy 2.3 would re-fetch the same processWineEntries candidates already retrieved by 2.2 - up to 30-40 HTTP roundtrips per query. processWineEntries now accepts an optional seenWineIds parameter, skips entries whose wineId is already in the Set, and adds successfully-fetched wineIds back to it.
Empty-string region / country / appellation are now normalized to 'Unknown' (region / country) or null (appellation). Vivino sometimes returns region: { name: "" } (or whitespace-only) instead of dropping the key entirely, which made the previous ?? 'Unknown' fallback emit "" verbatim into the dataset. New helper normalizeOrUnknown(s) in src/parser.ts collapses null / undefined / whitespace-only strings, applied to both exploreMatchToScrapedWine and htmlToScrapedWine paths.
Strategy 2.3 fetch failures (!sqResp?.ok) no longer fall through silently. The 5xx case now emits log.warning and calls recordRateLimitTrip() so the run-level counter surfaces the upstream pressure mode that previously hid behind a silent continue.
Tests
+3 unit tests in tests/wine-entries.test.ts (seenWineIds dedup: pre-populated Set skips entry; cross-call persistence; combined with skipNameFilter).
+4 unit tests in tests/parser.test.ts (empty / whitespace-only region / country / appellation normalization across both extractor paths).
Total: 167 tests passing (was 161 in v0.4.14).
v0.4.14 (2026-06-01)
Fixed (SEV-4 false-positive match)
"Chateau Margaux" (and similar queries where the only non-prefix word is a region name) no longer falls onto a wrong winery. The slug-generation single-word fallback now skips well-known wine region names (Margaux, Pomerol, Beaune, Pauillac, etc. - 40+ entries). Previously, Chateau Margaux had its chateau prefix stripped, then the residual margaux was used as a fallback winery slug - which on Vivino resolves to an unrelated winery whose explore matches passed the relevance probe by sharing only chateau and margaux words with the query.
New constant REGION_NAMES_BLOCKLIST in src/constants.ts: lowercase, accent-stripped region names that must not be used as winery-slug fallbacks. Expandable on observation of similar false-positives.
Strategy 1 relevance probe now uses min(2, qw.length) as the per-match threshold instead of a hard-coded >= 2. Previously, when buildQueryWords returned a single distinctive word (e.g. "Chateau Margaux" -> qw = ['margaux'] because chateau is filtered as a WINERY_PREFIX stop word), the probe was mathematically impossible to pass, so the correct winery slug (chateau-margaux -> winery 1319) was rejected and the actor fell through to Strategy 2, which surfaced "Andrena Margaux" (winery "Château Le Coteau") as the documented false positive. The relaxed threshold allows single-word matches to land on the right producer while keeping the protective behavior for multi-word queries (e.g. the T12 "Jean Other Specific Cuvee" path still requires 2/4 matching words).
v0.4.12 (2026-06-01)
Fixed (SEV-5 silent data loss)
Strategy 2 paths (s2-entries, s2-shortened) no longer drop 9 dataset
fields on output. The actor previously synthesized a stub
VivinoExploreMatch from a fully-parsed ScrapedWine, then re-mapped it
via exploreMatchToScrapedWine — losing appellation, grape_varieties,
food_pairings, wine_description, alcohol, image_url, vintage,
wine_type ('Unknown'), and emitting vintageId=0 instead of null. The
cyclic round-trip is replaced by passing the parsed ScrapedWine
directly through SearchWineByNameResult.scrapedWine. All HTML-derived
fields now propagate to the dataset rows.
vintageId nullable contract restored (no more 0 sentinel for unknown
vintage; aligns with dataset_schema.json declared [integer, null]).
Root cause: Apify SDK v3.7.2 pushDataAndCharge (apify/dist/charging.js:407-412) auto-schedules BOTH the explicit event (wine-result) AND the synthetic apify-default-dataset-item event when pushing to the default dataset. mergeChargeResults sums chargedCount across both. So every successful Actor.pushData(item, 'wine-result') returned chargedCount = 2 (1 wine-result + 1 synthetic). The pre-fix counter accumulated raw chargedCount, producing charged = pushed * 2 on every run.
Fix: src/run.ts:336 now collapses chargedCount > 0 to a +1 increment. This restores the original observability intent (track "rows actually billed" for revenue-leak detection — Sprint 5 goal) by ignoring the synthetic dataset event.
No revenue impact: the Apify Platform only bills configured-price events (charging.js:290-294 skips apify- synthetic events). Only wine-result × $0.003 was ever charged.
Verified by replaying the build 0.4.10 parity run: 10 pushes returned chargedCount = 2 each, summary showed charged: 20. After fix the same scenario will report charged: 10.
155 unit tests (+1 explicit "SDK doubling collapse" test that mocks chargedCount = 2 per call and asserts charged === pushed).
v0.4.9 (2026-05-28)
Sprint 6 of the post-audit roadmap: Penfolds Grange Strategy 2 deep-dive. Closes a known parity gap where the user query "Penfolds Grange Bin 95 Shiraz" produced a wrong-wine row (Bin 128 Shiraz) instead of the correct Penfolds Grange.
Root cause (confirmed empirically via Firecrawl probe of the live Vivino search page): Vivino's search ranker, when fed the long historical name, returns ~22 Penfolds Bin variants and ZERO Grange. The same search with the shorter form Penfolds Grange correctly surfaces Grange (wineId 1136930).
New: candidateMatchesQueryCuvee(name, winery, query) in scoring.ts - softer than distinctiveWordsFilter, asserts only that the user query's first distinctive word (the cuvee kernel, e.g., 'grange') appears in the candidate's name. Used to reject candidates that pass score thresholds but miss the cuvee.
New: Strategy 2.2 (processWineEntries on search-page entries) now runs the cuvee check on its top result. Candidates that fail fall through to Strategy 2.3 instead of being emitted as wrong-wine rows.
New: Strategy 2.3 - shortened-query retry. When Strategy 2.2 returns a candidate without the cuvee word, the search page is re-queried with progressively shorter forms (drop trailing word, max 3 attempts, minimum 2 words). Candidates are still scored against the ORIGINAL query so we never demote a valid full-query match.
New: _strategy value s2-shortened for rows surfaced via Strategy 2.3; counted in last-run-summary.output.strategies and in the Strategy 2 fallback aggregate.
Verified non-regressing on all 5 audit cases: Chateau Margaux, Domaine Leflaive Puligny-Montrachet, Chateau Petrus, Sassicaia Tenuta San Guido, Tignanello Antinori. The new cuvee check only fires when distinctive words exist after the winery is excluded.
Regression closed: the TS port was emitting a false-positive Penfolds row (Bin 128) where the monolith correctly returned no row. Sprint 6 restores the parity floor AND goes one row above (Grange surfaced via 2.3).
Sprint 5 of the post-audit roadmap: schema-level cleanups + observability honesty.
Internal: _strategy diagnostic field stripped from every public dataset row at push time (was previously emitted but undeclared in dataset_schema.json). The aggregate strategy distribution stays available in last-run-summary.output.strategies.
Internal: status message and last-run-summary now distinguish pushed (rows in the dataset) from charged (rows successfully billed via PPE). On runs with zero charge failures these two numbers are identical; when they diverge it is a real revenue-leak signal worth investigating.
Schema: dataset_schema.json:fields now declares nullable types (["string","null"] etc.) for every field that the runtime can actually emit as null (winery, vivino_url, appellation, average_rating, price, alcohol, image_url, wineId, vintageId, inputSource, shipTo, vintage, wine_description, taste_profile, reviews, searchQuery). Clients that validate the schema against the rows no longer get type-mismatch warnings.
Schema: key_value_store_schema.json now embeds a full jsonSchema for the last-run-summary blob so the Apify Console auto-documents the structure and any future shape drift in the code is caught.
147 unit tests (no count change; rewrote 1 test to assert _strategy absence + 1 test to assert pushed/charged split).
v0.4.7 (2026-05-28)
Sprint 4 of the post-audit roadmap: observability + documentation.
New: per-run JSON summary written to Key-Value Store under key last-run-summary at the end of every run. Captures timing, input shape, strategy distribution, cache stats, and resilience counters. Useful for offline drift detection and capacity planning.
New: key_value_store_schema.json declares the last-run-summary collection so the Apify Console surfaces it next to OUTPUT.
Docs: README now documents the error / errorMessage fields, the diagnostic _strategy field, and the Key-Value Store run summary. Includes a sample error row and a sample summary blob.
147 unit tests (+1 covering the per-run summary KV write).
Note: adoption of the speculative /api/wines/{id} JSON endpoint (originally on the Sprint 4 plan) was deferred to a dedicated research sprint - the endpoint shape needs validation against the live Vivino API with debug-tagged probes before going to production.
v0.4.5 (2026-05-28)
Sprint 3 of the post-audit roadmap.
New: atomic Actor.pushData(item, 'wine-result') shortcut (Apify v3.4+). Closes the brief push-without-charge gap that existed when push and charge were separate calls.
New: run-level counters for HTTP 429 trips and Strategy 2 fallbacks now surface in the final status message alongside cache hits/misses and charge failures.
Internal: hoisted WINERY_SUFFIXES constant out of the per-entry loop in processWineEntries.
Cleanup: deleted the previously dead exploreToScrapedWine export and its 11 tests; deleted the unused SearchPageResult interface.
Tooling: new scripts/canary-vivino.sh runs the live Actor against a 3-wine canary input (one per strategy) and asserts that core fields are still present. Designed to be wired as a daily Apify scheduled task for early detection of Vivino API/HTML drift.
146 unit tests (down from 157 after removing tests covering the deleted dead export; net delta: removed 11 dead-code tests).
v0.4.3 (2026-05-28)
Sprint 2 of the post-audit roadmap: Phase 4 hygiene + 2 quality fixes + diagnostic field.
Build: Dockerfile now sets NODE_ENV=production and uses npm ci against a committed package-lock.json (reproducible installs).
Build: actor.json metadata updated post TypeScript migration (templateId is now ts-empty, minMemoryMbytes bumped from 128 to 256 for Node 22 + cheerio reliability, defaultRunOptions now pins build: latest and memoryMbytes: 512).
UI: dataset now exposes a third view "Errors" with columns searchQuery, inputSource, error, errorMessage, scrapedAt.
Fix: winery now returns null (instead of the placeholder 'Unknown') when extraction fails on a successful row. This restores parity with the JS monolith and stops the downstream quality filter from treating 'Unknown' as a real producer name. Error rows still use 'Unknown' for display.
Fix: Strategy 2.1 slug discovery now uses the chosen wine-entries set (script entries when present, link entries otherwise) instead of always reading link entries only. Closes a recall gap on winery prefixes that only surface in React data-props.
Diag: every dataset row now carries a _strategy field (url, s1, s2-discovered-slug, s2-entries, error) so operators can trace which code path produced each result. Diagnostic-only - safe to ignore in client integrations.
New: 3-strategy name search restored end-to-end (winery slug fan-out, HTML search page, wine entries fallback)
New: discoverWinerySlugFromSearchEntries retries Strategy 1 with a slug discovered from the search page
New: error rows now produced for failed lookups instead of silent drops (every dataset row is billable, per the pay_per_event schema)
New: /wines/{id} URLs follow Vivino's redirect to recover the canonical seoName
New: shared rate-limiter across API + HTML fetches (a 429 on one channel now backs off the other)
New: 500-1500 ms jittered sleep between wines reduces 429 risk on bursts
New: duplicate inputs (same wineId + vintage) produce one row + one charge instead of N
New: cross-strategy non-winery-word coverage check falls through to the search page when Strategy 1 returns a weak match
New: charge failure count surfaced in the final status message
151 unit tests across cache, parser, api-client, html-client, scoring, search, wine-entries, run
No changes to input/output schemas: existing customer integrations continue to work unchanged
v0.2.120 (2026-05-22)
Fixed
average_rating is no longer silently 0 when Vivino has rated the wine. The HTML rating extractor was looking for a field name (average_rating) that Vivino does not use in its inline data; the actual field is ratings_average (plus wine_ratings_average for wine-level fallback). When extraction failed, the output reported average_rating: 0 together with a non-zero ratings_count, suggesting a real "zero star" rating that did not exist. The scraper now scans the correct field names (with the legacy spelling as a last-chance fallback) and prefers wine-level rating when the vintage-level rating is unavailable. Fields with no rating data return average_rating: null so consumers can distinguish "unrated" from "actually zero".
v0.2.119 (2026-05-21)
Fixed
Common French wine abbreviations are now expanded before matching. Catalog entries like NSG, Gds, VV, BLC, RGE, 1er, P.C., G.C., CDP, CDR, SGN, VT, MC (monopole) are expanded to their full form (Nuits-Saint-Georges, Grandes, Vieilles Vignes, Blanc, Rouge, Premier, Premier Cru, Grand Cru, Chateauneuf-du-Pape, Cotes du Rhone, Selection de Grains Nobles, Vendanges Tardives, Monopole) so the quality filter recognizes them in Vivino result names. Recovers previously-rejected wines such as Mugneret Gerard NSG Aux Cras (now matches Nuits-Saint-Georges 1er Cru 'Aux Cras').
v0.2.118 (2026-05-21)
Fixed
Appellation markers (AOC, AOP, IGP, VDF, VDP, DOC, IGT, and others) and 4-digit years are now stripped from the search query. They are derived from the vintage field already, so leaving them in the search text just polluted Vivino's matcher and confused the quality filter into rejecting good results. Knock-on effect: previously-rejected wines like Clos de la Hutte (Thibaud Boudignon Savennières) and Marc Kreydenweiss Clos du Val D'Eléon should now match.
Legal-entity descendants of producers (Fils, Fille, Frères, Père) are now treated as non-distinctive when comparing the query against a match. This stops false rejections on wines like Boulard & Fille Les Murgiers and Dehours & Fils Grande Réserve.
v0.2.117 (2026-05-21)
Fixed
Catalog region prefixes are now stripped before searching. Inputs that begin with a regional or category label followed by a period — e.g. Champagne Blanc., L'Alsace., La Corse., Le Rhône Septentrional., La Côte Chalonnaise et le Mâconnais. — have that prefix removed automatically. The remaining text is what gets searched on Vivino, so the producer and wine name become the leading tokens (e.g. Franck Pascal Fluence Brut Nature instead of Champagne Blanc. S.A. Franck Pascal "Fluence" Brut Nature 50). Substantially improves match rate on wine-list-style catalog exports.
v0.2.116 (2026-05-21)
Fixed
Catalog-style 2-column tab inputs (e.g. Champagne Blanc. S.A. Franck Pascal "Fluence" Brut Nature 50<TAB>NV) are now parsed correctly: the description column is used as the search query, the vintage column accepts both years and NV, and stray catalog elements are stripped before searching: curly quotes (""''), legal-entity markers (S.A., SCEA, SARL, SAS, EARL, GAEC`), and a trailing price.
v0.2.115 (2026-05-21)
Fixed
Common first names no longer cause wrong producer matches. For queries like Jean-Claude Bachelet, Chassagne-Montrachet, La Boudriotte Rouge, the scraper could previously return a wine from Jean-Claude Ramonet because the first two words ("jean", "claude") matched. The validator now also requires the producer's family name (last significant word in the winery) to appear in the query.
v0.2.114 (2026-05-21)
Fixed
Match accuracy on queries with punctuation: wine names containing commas or other punctuation (e.g. Domaine des Comtes Lafon, Meursault Premier Cru, Charmes) are now correctly tokenized, restoring matches that were previously rejected by the validator due to trailing characters attached to words. Expected effect: significantly higher extraction success rate on real catalog inputs.
v0.2.113 (2026-05-21)
Fixed
Mismatched wines no longer leak through with the search query as winery name. When the wine page extraction did not yield a producer, the previous code substituted the search query into the winery field, which also disabled the downstream quality filter and let unrelated wines be returned. The filter now activates correctly on these cases and rejects results whose distinctive query words do not appear in the matched wine. Side effect: fewer null prices in the output, since most of the missing prices came from these mismatched results.
v0.2.112 (2026-05-21)
Changed
Include Taste Profile is now unchecked by default. Both enrichment options (Include Taste Profile and Include Reviews) start unchecked so the cheapest, fastest configuration is the first thing users see. Enable either checkbox to fetch the corresponding data.
v0.2.111 (2026-05-20)
Changed
Output tab simplified: results remain available in the default dataset, accessible via the Dataset tab or the standard Apify dataset API.
v0.2.110 (2026-05-19)
Added
Result caching: taste profiles and user reviews are cached so repeat runs on the same wines are faster and place less load on Vivino. Cache statistics appear in the final run status message.
v0.2.108 (2026-05-19)
Changed
README now documents the ACTOR_MAX_TOTAL_CHARGE_USD setting users can configure in Run options to cap per-run spending.
v0.2.107 (2026-05-19)
Added
Output tab populated with direct access patterns for the dataset (overview, detailed, full JSON, CSV).
v0.2.106 (2026-05-19)
Changed
Store description rewritten for broader feature coverage.
README restructured and shortened for clarity.
Input form field descriptions reworded with consistent phrasing.
error and errorMessage dataset fields documented as null on success.
Fixed
Memory recommendation is now consistent across all README sections (512 MB default, 1024 MB for very large batches of 5,000+ wines).
v0.2.105 (2026-05-19)
Changed
wines field prefill in the Console form now showcases the 6 accepted input shapes pedagogically: wine name only, name with vintage, Vivino URL (canonical), Vivino URL with ?year= parameter, scheme-less Vivino URL (auto-prefixed with https://), and Vivino URL with locale path (/fr/). Helps users discover the breadth of accepted formats without reading the docs.
v0.2.104 (2026-05-19)
Changed
Input schema redesign: replaced wineUrls + wineNames with a single wines array that auto-routes URLs and names by shape. Mix freely; the actor handles routing.
Removed maxResultsPerSearch (now hardcoded to 1 -- best match only) and proxyConfig (was unused).
Flat Console layout: removed all sectionCaption/sectionDescription keys. All inputs are visible without expanding sections.
Backwards compatibility
Legacy wineUrls and wineNames keys still work silently. Existing scheduled tasks and API integrations continue running without change. No deprecation warning.
v0.2.103 (2026-05-18)
Added
## Is it legal to scrape Vivino? canonical H2 (hiQ Labs reference, GDPR caveat for EU review data, Apify blog link)
## Support & feedback closing block at end of README with Issues tab link
FAQ entry "Can I use this scraper via MCP?"
Changed
Canonical name unified to "Vivino Wine Data Scraper" across README H1, actor.json title, and seoTitle (was 3 different forms)
Wine emoji 🍷 removed from seoTitle
Pricing: Starter plan $49/month → $29/month (~9,600 wines/month)
H2 "Which Vivino actor should I use?" → "Which Vivino scraper should I use?"
Console URL console.apify.com/settings/integrations now includes ?fpr=mrbridge
Removed etc. from L49 wine-type listing and from dataset_schema.json wine_type description
Removed all em-dashes (zero em-dash policy 2026-05-18)
v0.2.102 (2026-05-13)
Documentation: corrected throughput claim (5-50 wines/min depending on enrichment, was "50-100"), increased recommended memory to 512 MB default / 1024 MB for large batches (was 256/512), bumped Actor default memory to 512 MB
v0.2.101 (2026-05-12)
Resilience: replaced Promise.all with Promise.allSettled in dual-search (with/without shipTo) and enrichment (taste+reviews) so a transient Vivino failure on one fetch no longer drops the whole wine. Charge errors now logged as warnings with a per-run counter exposed in the final status message (silent revenue loss becomes observable).
v0.2.99 (2026-05-07)
Remove optional fields from output when not requested: reviews field omitted when includeReviews: false, taste_profile and food_pairings omitted when includeTasteProfile: false. Eliminates empty columns in exported data.
v0.2.98 (2026-05-07)
Add rate limit handling: retry+backoff on HTTP 429 for HTML pages (fetchHtml), global rate limit cooldown shared across all conc