Extract popularity, critic scores, prices (always per 75cl bottle) and winery info from Wine-Searcher.com. Input wine names, URLs or LWIN codes; get structured JSON. Success-only billing: $0.025 per wine actually extracted, errors and not-found are free.
All notable changes to Wine-Searcher Scraper from List are documented here.
Format based on Keep a Changelog . Versions follow the Apify build numbering (0.1.XX).
[0.1.136] (2026-05-21)
Changed
Rate limiter switched from a uniform 250 ms delay between launches to a gaussian-distributed delay (mean 1500 ms, stddev 500 ms, floored at 250 ms). PerimeterX/HUMAN Security's continuous-validation sensors use uniform-timing patterns as a bot signature — even with concurrency 3, three requests landing every 250 ms is robotic. With the new jitter, 10 initial requests now span ~15-20 s instead of 2.5 s, much closer to human browse cadence.
Rationale
The scraping-expert doctrine for PerimeterX (always Level 4) explicitly calls out "every action must continue to look human" — passing the first request is not enough. Combined with the 0.1.135 concurrency drop, this gives the actor two independent levers against burst-detection: lower parallelism + non-uniform per-request timing. Trade-off: throughput per minute decreases proportionally to the new mean delay (250 ms → 1500 ms ≈ 6× slower at the launch step), but successful scrapes per minute is what matters, not theoretical request rate. If 0.1.135's concurrency drop wasn't enough to keep blocks down, this should close the gap.
[0.1.135] (2026-05-21)
Changed
DEFAULT_MAX_CONCURRENCY lowered from 10 to 3. Two back-to-back production runs (UuikZcdSG47tFf3aU, pgVV01sINcOqbIB5t) reproduced a full-cascade blocked_px failure on the first batch of 10 parallel requests: PerimeterX flagged the burst as bot traffic, all 10 wines exhausted their 3 retries with HTTP 403, run effectively delivered 0 scraped wines. The previous default of 10 was calibrated against a single isolated run; under back-to-back load or stricter PerimeterX modes it craters. Default 3 gives a launch pattern ~3× less aggressive (~9 req/3s burst vs ~30 req/3s) and brings actual scrape success back. MAX_CONCURRENCY_HARD_CAP stays at 10 so power users on cooperative sessions can opt into faster wall-clock manually.
README updated: default concurrency mentions changed from 10 to 3, expected wall-clock for 1000 wines from ~2 h to ~6 h (default) with the ~2 h figure noted as the conc=10 opt-in.
Trade-off
1000-wine batches now take ~6 h at default vs ~2 h previously. PPE revenue per wine unchanged. Compute cost rises (more wall-clock × 4096 MB) but stays under ~$0.40 per 1000 wines additional. Net: reliability gain >> compute cost increase.
[0.1.134] (2026-05-21)
Added
TSV pipeline now expands common French wine-name abbreviations (Gds → Grands, NSG → Nuits-Saint-Georges, VV → Vieilles Vignes, Vge → Village, Chass → Chassagne, Mch → Montrachet). Whole-word, case-insensitive matching; no false positives on words like VVin. Aligns POS-shorthand inputs with Wine-Searcher's canonical naming.
TSV pipeline now flips known reversed-producer surnames to canonical order ("Lachaux Charles" → "Charles Lachaux"). Conservative whitelist of one entry to start; we only add names when a customer run surfaces them as a systematic miss. Most producers (e.g. "Lamy Hubert") are left alone because Wine-Searcher's fuzzy search handles them already.
8 new unit tests for abbreviation expansion + producer flip.
Context
Run FNRPfCGBMDFRUgSI4 (115 wines, build 0.1.133): 51/59 succeeded (86%). The 8 remaining failures split into 3 categories: (A) reversed producer order — 2 cases for "Lachaux Charles", (B) abbreviations in wine names — 2-3 cases like "Gds Suchots", and (C) inputs Wine-Searcher genuinely cannot resolve (generic queries like "Leflaive Chardonnay", 2023 vintages not yet indexed, or obscure producers). This release covers categories A and B. Category C is documented as a Wine-Searcher coverage limitation — no fix possible from the scraper side.
[0.1.133] (2026-05-21)
Added
TSV / Excel-paste input support: when a wine name contains tab characters (typical of POS export pasted from a cellar-management spreadsheet — format "Producer\tWine\tColor\tFormat\tVintage"), the actor now extracts producer + wine name + vintage and drops color/bottle-format metadata. Previously the entire string was URL-encoded as a single search query, which Wine-Searcher could not parse (e.g. "La Vougeraie Vougeot 1er cru Les Cras Rouge 75 cl 2011" → no match). Now the same input becomes "La Vougeraie Vougeot 1er cru Les Cras Rge 2011" and matches reliably. Implemented in parseTsvWineInput() which feeds into the existing cleanWineName() pipeline; 10 new unit tests cover the format variants (Magnum, Jéroboam, accented names, empty fields, vintage in non-final column).
[0.1.132] (2026-05-21)
Changed
Internationalization sweep: all remaining French log strings, status messages, error messages, JSDoc and inline comments across src/ and tests/ translated to English. Aligns with the project's English-only doctrine for internal docs and applicative logs. Wine names (Château, Pétrus, Rosé, Cuvée, etc.) preserved as proper nouns. 279/279 tests passing. No runtime behavior change.
[0.1.131] (2026-05-21)
Changed
Refactor: handleFatalError + FatalErrorOutcome extracted from src/main.ts into a dedicated src/error-handling.ts module. Pure module with no Apify SDK dependency, unit-testable without the 90-line vi.mock block that the previous test file required to import main.ts. No behavior change; the deployed binary is functionally identical.
[0.1.129] (2026-05-20)
Changed
MEMORY_TIERS recalibrated for the post-0.1.121 concurrency cap of 10 (was 40). New recommended-memory floors: ≤200 wines → 512 MB, ≤500 wines → 1024 MB, ≤800 wines → 2048 MB, >800 wines → 4096 MB (was 8192). Aligns with the new defaultMemoryMbytes: 4096 so default-configured 1000-wine runs no longer fire a spurious "memory tight" warning. Refuse thresholds untouched — genuinely under-provisioned configs still get a hard stop.
French strings in src/memory-guard.ts translated to English (header comment, refuse messages, warn message). Aligns with the internal-language doctrine.
[0.1.128] (2026-05-20)
Changed
defaultMemoryMbytes lowered from 8192 to 4096. The 8192 default was provisioned for the old concurrency-40 mode; since 0.1.121 the concurrency hard-cap is 10, which cuts peak memory pressure roughly fourfold. Observed peak in production: 50 MB. maxMemoryMbytes stays at 8192 so users with 1000-wine batches can still scale up if the memory-guard warning fires. Halves the default compute cost ($3.20 vs ~$6.40 per 1000-wine run at 2 h wall-clock).
[0.1.126] (2026-05-20)
Fixed
Empty or malformed input no longer crashes the run with exit 1. Calls with inputType: "wineNames" (or urls, lwins) and an empty/missing corresponding array, an unknown inputType, a batch over the 1000-wine cap, or zero items surviving per-item validation, now exit cleanly with a single dataset row carrying the error reason. Prevents buggy user scripts from crashing the actor on repeated bad input.
Per-item errors (LWIN format, URL domain, empty name) keep their existing behavior of being partitioned into the invalid[] array — only top-level structural errors changed semantics.
Changed
Internal: validateInput now throws a typed InputValidationError (with code and field metadata) instead of a generic Error. Error messages migrated to English in line with the project doctrine. New handleFatalError helper in src/main.ts classifies caught errors and converts validation errors into structured dataset rows + status messages. 278 unit tests cover the new paths.
[0.1.122] (2026-05-18)
Changed
Canonical name unified across 3 artefacts to "Wine-Searcher Scraper from List" (hyphenated): README H1 + H2, actor.json title, input_schema.json title (was 3 different forms)
actor.json description, seoTitle and seoDescription reordered to popularity-first ("popularity, critic scores, prices")
## Legal & Compliance H2 renamed to ## Legal & compliance (sentence case per de-AI-ification doctrine)
Cross-promo to sister wine-searcher-grape-scraper: slug fixed (was broken wine-searcher-region-scraper), wording updated from "Region Scraper" to "Grape Scraper" and from "appellation" to "grape variety"
Pricing: Starter plan tier renamed $49 → $29/month with recomputed capacity (~1,160 wines/month)
Cross-promo table cell and Option 1 link text: "Wine Searcher Scraper from List" → "Wine-Searcher Scraper from List" (hyphenated)
Features bullets reformatted: **Title Case** - AI-tell pattern replaced with **Sentence case**:
Removed all em-dashes from input_schema.json (6 → 0)
Removed all em-dashes from README and remaining .actor/*.json (zero em-dash policy 2026-05-18)
[0.1.121] (2026-05-13)
Changed
Concurrency capped at 10 (default + hard cap). Empirically: conc=30 delivered 7.9 wines/min on the 102-wine reference run vs 8.77 wines/min at conc=10 — higher concurrency triggered more PerimeterX blocks on the winery pages, dragging wall-clock with 30-60s retry backoffs that cancelled the parallelism win. Conc=10 is the measured sweet spot for reliability.
MAX_CONCURRENCY_HARD_CAP = 10 introduced in src/main.ts. The previous schema range 1-50 is now clamped to 1-10. Existing callers passing maxConcurrency: 30+ will be silently capped — no breaking change for the success path.
Adaptive concurrency caps removed (LARGE_BATCH_THRESHOLD, VERY_LARGE_BATCH_THRESHOLD, MAX_CONCURRENCY_LARGE_BATCH, MAX_CONCURRENCY_VERY_LARGE_BATCH, RECOMMENDED_MIN_CONCURRENCY) — dead code once the hard cap is 10.
Default timeoutSecs 7200 → 14400 (4h) to accommodate 1000-vin runs at conc=10 (~2 h expected, comfortable margin for retries).
Why this reversal
The 0.1.120 default of conc=40 was based on linear extrapolation from a single 100-vin / conc=10 benchmark (8.77 vins/min → 35 vins/min @ conc=40). The 102-vin / conc=30 run that actually existed in production showed worse throughput than conc=10 — the bottleneck is PerimeterX retry tail, not provider slots. Aggressive concurrency burns retries, not throughput.
Trade-off
1000-vin batches now take ~2 h instead of the previously claimed ~30 min. PPE revenue unchanged ($25 per 1000 vins). Compute cost rises slightly (more wall-clock × 8192 MB) but stays under 5 % of revenue.
[0.1.120] — 2026-05-13
Changed
Capacité par run : 500 → 1000 vins.MAX_ITEMS_PER_RUN (input.ts) et maxItems (input_schema, 3 endroits) relevés à 1000. DEDUP_READBACK_LIMIT (main.ts) 1000 → 2000 pour rester strictement > MAX_ITEMS_PER_RUN après migration container.
Mémoire par défaut : 1024 → 8192 MB (maxMemoryMbytes aligné à 8192). Calibré pour 1000 vins @ conc=40 avec headroom confortable. Coût compute ~$1,20/run vs $25 revenu brut = 4,8 %, marge largement positive.
Default concurrency : 30 → 40. Cible 1000 vins / 30 min sous le throughput baseline mesuré (~8,77 vins/min @ conc=10, scaling linéaire jusqu'à conc=40 sur les 100 slots provider).
Adaptive concurrency caps recalibrés pour la plage 1000 vins :
LARGE_INPUT_WARNING_THRESHOLD 300 → 1000 (warning "divisez en batches de 500-800 vins" ne se déclenche plus qu'au-delà du nouveau plafond).
Memory guard
Refus tier 2 ajouté : <2048 MB sur >700 vins → refuse (avant le scraping). Évite l'OOM probable sur la plage 1000 vins.
Tiers de mémoire recommandée introduits via recommendedMemoryForTasks(taskCount) : ≤200 → 1024 MB, ≤500 → 2048 MB, ≤800 → 4096 MB, >800 → 8192 MB. Le warn est désormais proportionnel au volume au lieu d'un seuil fixe.
Seuil minimum de warn : pas de nudge mémoire sous 100 vins (tout config qui passe les tiers de refus reste "ok").
Tests
270 tests passants (+ 11 nouveaux : 1 sur le refus tier 2, 3 sur le ok à 8192/1000, 8 paramétrés sur recommendedMemoryForTasks).
Performance attendue (à valider en live)
1000 vins @ conc=40, 8192 MB → ~30 min wall-clock (vs ~57 min @ conc=20 avec ancien default).
Analyse des 61 derniers runs (30 jours) : 5/5 incidents (TIMED-OUT, ABORTED, FAILED non-trivial) en mai concernent des batchs 350-500 vins avec RAM 512-1024 MB. Le default 1024 MB était sous-dimensionné dès qu'on dépassait ~300 vins.
[0.1.112] — 2026-05-10
Fixed
Détection des pages "Showing results for…" jusqu'ici manquée : le scan de isSearchResultsPage était limité aux 30000 premiers caractères du HTML, mais sur Wine-Searcher les pages font 250+ KB et le marker apparaît parfois à ~50 KB. Désormais full HTML scan (~1 ms supplémentaire). 33 vins par batch de 500 (~7%) tombaient dans cette faille — leur H1 "Showing results for 'lwin1065208'" devenait wineName, tous les autres champs étaient null, et le PPE était facturé pour de la donnée poubelle.
Garde aval isSuccessfulResult : si malgré la détection upstream un wineName commence par "Showing results for", on refuse désormais de facturer (backstop défense en profondeur). Économie estimée : ~$0.83 par batch de 500 vins, et popularity-sur-succès passe de 92.4% → ~98%.
Changed
MAX_ADDITIONAL_WINERY_PAGES 5 → 10 : sur les gros producteurs (Méo-Camuzet >60 vins listés sur >6 pages), le wine spécifique se trouvait en page 7+ et l'algo abandonnait après 5 pages additionnelles. Le PER_WINERY_BUDGET_MS = 600s reste le hard cap, donc cette extension ne génère pas de fan-out incontrôlé.
Tests
252 → 254 (+2 tests sur le backstop "Showing results for" dans isSuccessfulResult).
[0.1.116] — 2026-05-10
Fixed
cheapestPriceAmount est désormais TOUJOURS per-bottle — quand l'offer card est une caisse ("Case of 12", "6x75cl", "12 bottles"), le prix est divisé par le nombre de bouteilles avant retour. Avant ce fix, certains marchands EU (ex: SELECTION SOMMELIER sur Smith Haut Lafitte 2020 → €408 pour 12 bouteilles) renvoyaient le prix de la caisse entière, donnant un overpricing massif (~12×) silencieux. Affecte principalement les vins Bordeaux/Bourgogne sourcés depuis l'Europe. Si tu as un baseline interne basé sur le prix brut, attend-toi à voir les prix chuter sur ces vins — c'est le comportement correct selon la documentation cheapestPriceAmount = "per 75cl bottle".
Changed
Nouveau champ bottlesPerUnit dans le dataset : 1 pour single-bottle (cas par défaut), 2-24 pour les caisses détectées. Permet de reconstruire le prix de caisse (cheapestPriceAmount × bottlesPerUnit) ou d'auditer la normalisation. Patterns détectés : case of N, Nx75cl, Nx750ml, N bottles, N-pack. Sanity guard : si le prix per-bottle après division sort de [$1, $100000], l'offer est skippée (faux positif probable).
Bug rapporté par un utilisateur via la Issues tab Apify Store, reproduit en production sur Smith Haut Lafitte 2020.
[0.1.110] — 2026-05-10
Changed
Concurrency caps relevés après stress tests : MAX_CONCURRENCY_LARGE_BATCH 15→25 (>200 vins) et MAX_CONCURRENCY_VERY_LARGE_BATCH 10→20 (>400 vins). Calibrés sur stress tests à 350 et 500 vins (useCache:false) où la mémoire est restée ≤320 MB peak grâce au LRU borné de 0.1.103. Speedup attendu ~1.8×–2× sur les gros batches sans risque OOM (mémoire pré-LRU = 80+ MB pour 200 wineries → désormais ~12 MB plafonnée).
Fixed
Message memory guard warn corrigé : recommande désormais le palier supérieur (1024→2048 MB pour les batches >300 vins déjà à 1024 MB) au lieu de la phrase contradictoire "1024 MB recommandé" quand l'utilisateur est déjà à 1024 MB. Découvert lors des stress tests cPUWZ1l3K938D8J9j et Y2AANGNGo5H9srfTI.
[0.1.106] — 2026-05-10
Changed
Politique PPE durcie : facturation uniquement sur succès complet.Actor.charge('wine-extracted') est désormais gardé par isSuccessfulResult(data) qui exige wineName !== null && pas d'erreur. Conséquence : les notFound, erreurs infrastructure (timeout, blocked, transient_api, transient_net, rate_limit), bugs internes et résultats parsed-empty sont poussés au dataset comme erreurs structurées mais ne facturent plus. Auparavant tout push (sauf input invalide + auth) facturait.
Fixed
Détection parsed-empty + alarme analytics. Quand un scrape réussit (HTML 200 OK) mais que le parsing retourne wineName/price/winery tous null, c'est désormais traité comme une erreur structurée (error: "Parsed empty (possible site change)") sans facturer le PPE et avec un compteur parsedEmptyCount. Si > 0, un warning explicite invite à vérifier le DOM Wine-Searcher. Free DOM-change canary.
Compteur de succès isolé : le message final "Done: N wine(s) extracted" utilise désormais validCompletedCount (succès réels uniquement) au lieu de pushedInputValues.size qui incluait les error rows.
Try/catch défensif autour de pushAndCharge dans le catch global de processWine — un hoquet Apify storage transitoire pendant l'error-handling ne flippe plus le run en FAILED.
maxConcurrency clampé entre 1 et 50 (NaN → défaut). Empêche les CPU spins 2h sur input REST API malformé.
Sanitizer logs scrub désormais aussi la valeur literal de la clé API (en plus de l'URL et du brand provider).
Cache getCached retourne maintenant un structuredClone pour empêcher les mutations callee de corrompre le cache process-local.
LRU peek() non-touchant + identity guard avant wineryHtmlCache.delete() pour empêcher un handler stale de supprimer une entrée fraîchement poussée par un autre consumer (race rare post-éviction).
Constante DEDUP_READBACK_LIMIT = 1000 nommée + assertion > MAX_ITEMS_PER_RUN au démarrage. Garantit que tout futur passage à >1000 vins par run n'entraînera pas de duplications post-migration container silencieuses.
Tests
245 → 252 tests (+7). Suite complète passe en local.
[0.1.103] — 2026-05-09
Fixed
Garde mémoire au démarrage : un run avec une combinaison RAM × volume structurellement OOM-bound (ex. 512 MB × 350 vins) est désormais refusé immédiatement avec un message d'action clair, au lieu de tourner 18 minutes avant un SIGKILL.
Validation non-fatale : un LWIN, une URL ou un nom de vin invalide ne tue plus le batch entier. Les entrées rejetées sont poussées au dataset comme erreurs structurées (sans facturation PPE), permettant à l'utilisateur de voir précisément ce qui a été refusé. Avant ce fix, un seul format LWIN à 18 chiffres faisait échouer un input de 500 vins.
Suppression du double-retry winery : la couche externe de retries (MAX_WINERY_SCRAPE_RETRIES) qui doublait la politique interne de scraper.ts est supprimée. Worst-case par winery passe de 10.5 min à ~8 min sur les blocages, libérant le pool de concurrence et évitant les timeouts de run.
Cache winery borné (LRU 30) : le cache HTML des fiches winery est désormais limité à 30 entrées (≈ 12-30 MB max) au lieu de croître linéairement. Évite les OOM SIGKILL sur les batches >300 vins en mémoire 512 MB.
Budget per-winery 10 min : un nouveau plafond global protège du worst-case où plusieurs pages de pagination d'une même winery sont toutes bloquées (qui pourrait sinon atteindre 6 × 8 min = 48 min par winery).
[0.1.102] — 2026-05-08
Changed
Winery timeout increased from 90s to 120s for better reliability on slow winery pages.
Tests
Exported MAX_WINERY_SCRAPE_RETRIES constant; tests now derive all magic numbers from it. Fixed invalid HTML fixtures (<a> without <td> wrapper). Renamed shadowed WINERY_URL variable.
[0.1.100] — 2026-05-07
Fixed
Winery URL fallback for producers without carousel/profile section. When the wine page lacks the "Also from..." carousel and profile section (small producers like Edmond Vatan), the parser now searches all /merchant/{id}-{slug} links on the page and matches the slug against the extracted winery name. Fixes winePopularity: null for wines where the winery name was found but the URL wasn't.
Changed
Cache version bumped to v3 to invalidate entries cached without the winery URL fallback.
Winery popularity: multi-page search for wines not on page 1. Winery pages show 10 wines per page sorted by popularity. When a wine isn't on page 1, all Chassagne-Montrachet entries (for example) tie at the same score, and the first entry's rank is returned incorrectly. The scraper now detects ambiguous matches (multiple entries tied at the best score) and lazily scrapes additional winery pages (/11, /21…) until finding a confident match. Fixes wines like Colin-Morey Les Charmes, Baudines, La Garenne which were all returning "487th" (Corton-Charlemagne's rank from page 1).
Added
detectWineryPagination() — extracts additional page URLs from winery HTML pagination links.
Multi-page fetch loop in fetchWineryPopularity() with lazy evaluation (stops on first confident match) and cap at 5 additional pages.
Tests
9 new tests: pagination detection (4), tie detection (2), multi-page resolution (3). 203 tests total.
[0.1.98] — 2026-05-07
Fixed
Cache version bump to invalidate stale popularity data. Wines cached before 0.1.97 had the wrong shared popularity (all wines from the same winery got the first wine's rank). Added CACHE_VERSION = 2 to cache.ts — old v1 entries are now treated as cache misses, forcing a fresh scrape with correct per-wine parsing.
Changed
CacheEntry interface gains optional v field (version number).
getCached() rejects entries where v !== CACHE_VERSION.
setCached() writes v: CACHE_VERSION on every new entry.
[0.1.97] — 2026-05-07
Fixed
Winery popularity: per-wine parsing instead of shared result. Refactored the winery cache to store raw HTML instead of the parsed popularity string. Each wine now gets its own popularity ranking from the winery page — previously, all wines from the same winery shared the first wine's ranking (e.g. all 5 Colin-Morey wines returned "487th" instead of their individual ranks).
New internal fetchWineryHtml() handles scrape + cache + retries. fetchWineryPopularity() now calls it then parses individually per wine name.
Tests
New test: "concurrent wines share same scrape but get individual popularity" — 2 wines from the same winery get different rankings (487th vs 2,622nd) with a single scrape call. 194 tests total.
[0.1.96] — 2026-05-07
Fixed
Winery popularity: normalize hyphens and accents in name matching.parseWineryPopularity() now normalizes both the wine name and the winery page HTML text before comparison — hyphens are converted to spaces and diacritics are stripped (Unicode NFD). Fixes winePopularity: null for producers like Pierre-Yves Colin-Morey where the winery page uses different hyphenation/accentuation than the wine page (e.g. "Pierre Yves" vs "Pierre-Yves", "Chatenière" vs "Chateniere").
Winery timeout raised from 60s to 90s. Some winery pages (e.g. Pierre-Yves Colin-Morey) intermittently require >60s due to scraping provider retry loops. The increased timeout improves first-attempt success rate without excessive pool blocking.
Tests
tests/winery.test.ts — 4 new tests for hyphen/accent normalization (hyphens in name only, hyphens in HTML only, both, neither). 194 tests total (was 190).
[0.1.92] — 2026-05-07
Fixed
Winery popularity: internal retries benefit all concurrent consumers. Moved retry logic inside the cached Promise so that when multiple wines share the same winery, a transient scrape failure triggers up to 2 internal retries (5s delay each) and ALL concurrent consumers receive the successful result. Previously, the external wineryFailCount mechanism only worked for sequential requests — under concurrency 30, all wines from the same winery joined the same failing Promise with no retry opportunity.
Changed
Removed wineryFailCount Map and handleWineryFailure() — replaced by internal retry loop within fetchWineryPopularity().
clearWineryCache() simplified (no longer needs to reset fail counters).
Tests
tests/winery.test.ts — Fully rewritten: 7 tests (was 6). New test: "concurrent wines benefit from internal retry on failure" — 5 concurrent calls, first scrape fails, retry succeeds, all 5 get the result. 190 tests total.
[0.1.89] — 2026-05-07
Added
Warning on large batches (>300 wines). A log warning now recommends splitting into batches of 200-300 wines when more than 300 are submitted, to prevent performance degradation.
Success rate monitoring. At the end of each run, the analytics module calculates and logs the success rate (%). If the rate drops below 85% on batches >10 wines, a warning is emitted to flag potential recurring errors. The success rate tracks scraping infrastructure failures (timeouts, blocked requests, API errors) — not whether a wine exists on Wine-Searcher. Invalid or unknown wines are still scraped successfully and return a result row with partial data (name from Wine-Searcher's "Showing results for…" fallback, no score/price).
successRate field in RunAnalytics (persisted in KV Store).
tests/analytics.test.ts — 3 new tests (successRate calculation, null on empty, threshold constant). 20 tests total (was 17).
[0.1.88] — 2026-05-07
Changed
Default memory raised to 1024 MB (was 512 MB). Prevents OOM kills (exit 137) on large batches that previously hit the 512 MB ceiling at ~500 wines.
Adaptive concurrency cap. Batches >400 wines are automatically capped at concurrency 10; batches >200 wines at concurrency 15 — regardless of the user's maxConcurrency setting. Reduces peak memory footprint by limiting in-flight HTML pages.
Added
Winery timeout (60s). Each winery scrape is now wrapped in a Promise.race with a 60-second deadline. If a winery page is blocked and the scraping provider retries for minutes, the timeout fires, the winery is evicted from cache (allowing a future retry), and the wine is pushed with winePopularity: null instead of blocking the entire pool.
tests/winery.test.ts — 1 new test (winery timeout returns null). 6 tests total (was 5).
[0.1.86] — 2026-05-05
Fixed
winePopularity null on batch runs. When multiple wines share the same winery and the winery scrape fails (blocked, timeout), the null result was cached permanently in the Promise-based dedup cache. All subsequent wines from the same producer inherited null popularity without retrying. Now failed entries are evicted from cache with up to 2 retries per winery URL before giving up.
Added
tests/winery.test.ts — 5 tests covering retry/cache logic for fetchWineryPopularity.
[0.1.85] — 2026-05-05
Fixed
403 errors on user URLs. URLs passed in urls mode are now normalized: country/currency suffixes (/usa/usd, /fr/eur) are stripped and non-ASCII characters (e.g. Rosé) are percent-encoded. This prevents systematic 403 errors caused by malformed URLs.
Increased blocked request retries from 2 to 3 (4 total attempts, backoff 30s + 60s + 120s) to handle transient blocks more reliably.
Added
normalizeWineSearcherUrl() exported from src/input.ts for URL sanitization.
"Blocked by Wine-Searcher (403 errors)" troubleshooting section in README.
[0.1.82] — 2026-05-05
Added
Graceful shutdown. Actor now handles aborting (user cancel), migrating (container migration), and a soft deadline (timeout − 45s) to stop cleanly. In-flight wines finish, partial results are delivered as SUCCEEDED with a clear status message (Partial: X/Y wines (stopped by {reason})). Reduces TIMED-OUT and ABORTED failure rates.
Typed shutdown reason.ShutdownReason union type ('user abort' | 'migration' | 'timeout') prevents stringly-typed bugs.
Changed
Timeout estimation formula corrected. Now accounts for concurrency: max(120, ⌈batchSize ÷ concurrency⌉ × 25) instead of batchSize × 8. Documentation updated in input schema and README.
[0.1.80] — 2026-05-05
Fixed
Memory pressure reduced — fixes OOM on large batches (500 wines). Winery cache now stores parsed popularity strings instead of raw HTML, freeing ~60 MB of permanently retained data. Wine page HTML is released immediately after parsing, saving ~9 MB at concurrency 30. Combined savings prevent OOM kills (exit code 137) that occurred at 503/512 MB.
Dead code removed — redundant if (useCache) guard inside already-guarded block.
Changed
getCached() returns T | null directly (was { data: T; cachedAt: string } | null). Legacy cachedAt field stripped internally — callers no longer need defensive destructuring.
[0.1.77] — 2026-04-25
Changed
README optimized for Apify Store. Complete restructure aligned with Vivino actor template: added "What is" intro, "Which wine scraper should I use?" cross-selling table, "Quick Start — Test in 60 seconds", "Why scrape Wine-Searcher?" use cases, data extraction table with "Always included" column, configuration table with JSON examples, "Tips for best results", "Troubleshooting" (5 scenarios), "Privacy & Security", "Resources", "License". Pricing reformatted with tiers. FAQ enriched (10 questions). Changelog limited to 10 most recent versions. "Related Wine Scrapers" moved up and expanded.
[0.1.76] — 2026-04-22
Added
POS/inventory wine name cleaning. Wine names from POS systems (e.g. Champagne, Dom Perignon Brut, 2013, Champagne, France) are now automatically cleaned before searching Wine-Searcher. The pipeline strips category prefixes (Champagne, Port, Dessert Wine, Red Blend, Sauvignon Blanc…), bottle sizes in parentheses ((375ml), (Split 187ml), (1.5L)…), and replaces commas with spaces. The original input is preserved in inputValue — only the search URL is cleaned. This dramatically improves match rates for clients sending POS/inventory-formatted wine lists.
New exported function cleanWineName() in src/input.ts with 27 unit tests.
Changed
170 total tests (was 139).
[0.1.74] — 2026-04-21
Fixed
Numeric LWIN codes no longer crash the actor. REST API and integration clients (n8n, Make, Python, etc.) sending LWIN codes as numbers ([1067130]) instead of strings (["1067130"]) caused an immediate fatal error. normalizeLwinEntry now accepts number entries and converts them to strings before validation. Affects LWIN7, LWIN11 and longer codes.
Missing inputType no longer crashes the actor. API clients omitting the inputType field (which is only auto-filled by the Apify Console UI) caused an immediate fatal error. validateInput now auto-detects the input type from whichever array field is populated (lwins, urls, or wineNames).
Changed
LwinEntry type extended to accept number in addition to string and object formats.
10 new tests covering both fixes (139 total, was 129).
[0.1.73] — 2026-04-20
Changed
Unified pipeline. Merged 2-phase architecture (Phase 1: wine pages → Phase 2: winery pages) into a single pipeline where each task chains wine scrape → parse → winery scrape → push without waiting for other tasks. Eliminates idle time between phases (~15-20% throughput improvement).
Analytics: phase1DurationMs + phase2DurationMs replaced by single scrapingDurationMs.
Scraping response metrics (recordResponseMetrics) downgraded from log.info to log.debug — reduces log noise on large batches while keeping the end-of-run summary in log.info.
Removed
WinePhaseResult intermediate interface (no longer needed with unified pipeline).
clearWineryCache() call between phases (cache is empty at startup).
[0.1.71] — 2026-04-20
Removed
Cache completely hidden from users. All user-facing cache references removed: useCache and cacheTtlDays input fields, cachedAt output field, KV Store schema in actor.json, 30-day cache key feature, cache FAQ, cache pricing row, and all cache-related log.info messages. Cache remains fully functional internally — only external visibility is removed.
Changed
Cache-related log messages downgraded from log.info to log.debug (invisible at default log level).
CLAUDE.md: added "cache is INVISIBLE to the user" as an absolute constraint.
[0.1.69] — 2026-04-19
Added
API Integration guide. README now documents synchronous (run-sync-get-dataset-items) and asynchronous (/runs) API calls with cURL, Node.js and Python examples. Dataset export formats table (JSON, CSV, Excel, XML, JSONL) with field filtering.
Workflow & database integration guide. New "Integrate into Your Workflow" section: scheduled runs (cron examples), webhooks (Flask → PostgreSQL example), full database integration examples (Node.js + PostgreSQL, Python + SQLite), no-code integrations table (Google Sheets, Airtable, Zapier, Make, n8n), and large catalog batching pattern (>500 wines).
Changed
FAQ "Can I integrate this with my existing tools?" now links to the new integration section instead of a generic answer.
[0.1.67] — 2026-04-19
Changed
Cache hits are now billed. Updated all marketing copy (README, input schema) to reflect that cached wines carry the standard $0.025 PPE charge. The code already billed cache hits — this aligns the documentation with actual behavior. Removed "free cache" mentions from key features, pricing table, FAQ, and input field descriptions.
[0.1.63] — 2026-04-19
Added
Run analytics. Structured metrics are now persisted to the KV Store at the end of every run (key analytics-{runId}). Includes batch size, input type, cache hit/miss/partial counters, success/error/not-found counts, phase durations, and full scraping retry distribution. Enables data-driven monitoring of actor health and usage patterns.
Timeout guidance. New informational timeoutSecs field in the Apify Console input UI with recommended values. The actor now warns at startup if the allocated run timeout looks too low for the batch size (formula: max(120, batchSize × 8) seconds). README FAQ enriched with a batch-size-to-timeout recommendation table.
Review solicitation. End-of-run logs now include a visible call-to-action with the Apify Store review link (with affiliate tag). The actor status message shows the wine count on completion.
Fixed
Missing analytics tracking on 4 error paths: HTML-level 404 detection, search-results redirect failures, and Phase 2 inner catch fallback were not counted in notFoundCount / errorCount.
Changed
Finalization logic (logScrapingSummary, analytics persist) moved to a finally block — ensures metrics are always saved, even on fatal errors.
Extracted cloneScrapingMetrics() helper in scraper.ts to eliminate duplicated deep-copy logic between getScrapingMetrics() and getAnalyticsSnapshot().
[0.1.62] — 2026-04-19
Added
LWIN16/LWIN18 support. Longer LWIN codes (12+ digits) are now automatically truncated to the first 11 digits (LWIN11) before URL construction. Previously, these codes caused a validation error — now they work seamlessly for users whose wine management software exports extended LWIN formats.
[0.1.61] — 2026-04-18
Changed
maxConcurrency removed from the Apify Console input UI — concurrency is now fixed at 30 for all users. The parameter remains functional via the REST API for power users.
README FAQ updated: concurrency no longer advertised as a tunable setting.
[0.1.60] — 2026-04-17
Fixed
Duplicate dataset entries on long runs. When Apify migrates the actor container mid-run, the script restarts from scratch on the same dataset — previously producing duplicate entries (up to 1.5× the expected item count). Now: existing dataset items are read at startup to detect restarts, already-pushed wines are filtered from the task list, and a Set-based guard in pushAndCharge prevents any double push within the same execution.
[0.1.59] — 2026-04-17
Changed
Default maxConcurrency bumped from 10 to 30 — the actor now scrapes up to 30 wines in parallel (was 10), tripling throughput for large batches. This leverages the provider's 100 concurrent slots while keeping ~70 slots in reserve.
Input schema description updated with realistic timing (500 wines ≈ 30-60 min) and consistent guidance.
README FAQ updated to reflect new defaults and validated performance data.
Removed
Provider cost/credit information no longer appears in actor logs (business confidentiality). Logs now show request count + concurrency observed only.
Provider brand references scrubbed from JSDoc comments and log lines — all naming is now generic (scraping-api).
[0.1.57] — 2026-04-16
Fixed
HTTP 403 no longer kills the entire run. Previously, a single 403 from the scraping provider was classified as permanent_auth (dead API key) and triggered Actor.exit(1) — dropping all successfully parsed wines from the dataset. Now: 401 = permanent_auth (fatal), 403 = blocked (retryable with 30s/60s backoff + jitter). A 403 isolated to one URL is retried; if all attempts fail, that wine is marked as error and the run continues.
[0.1.56] — 2026-04-16
Added
Differentiated retry strategy with 7 failure categories: permanent_auth, not_found, rate_limit, transient_api, transient_net, blocked, timeout. Each category has its own retry policy (0-5 retries, custom backoff, jitter ±30%).
Retry distribution stats in end-of-run log: attempt-1=N (X%), attempt-2=M (Y%), failed=K (Z%) + failure categories breakdown.
Retry-After header respected on HTTP 429 responses.
ScrapeResult discriminated union type replaces raw string | null returns from scraper — callers get structured success/failure data.
Changed
scrapeWithRetry refactored: classifyError pure function determines category, RETRY_POLICY table drives retry/backoff per category, exponential backoff with ±30% jitter prevents thundering herd.
not_found (HTTP 404) is no longer retried (0 retries) — saves provider budget on wines that don't exist.
Removed
Legacy MAX_RETRIES constant and uniform retry logic.
[0.1.55] — 2026-04-16
Changed
maxConcurrency upper limit raised from 15 to 50 (leveraging provider Business 300 plan: 100 concurrent slots).
RECOMMENDED_MIN_CONCURRENCY raised from 10 to 20 — runtime warning triggers for large batches with concurrency below 20.
Added
README FAQ: "Which run timeout should I set?" (prompted by an external user's TIMED-OUT run with timeoutSecs: 15).
[0.1.54] — 2026-04-16
Added
Scraping instrumentation: per-request log with concurrency remaining/limit and request ID. End-of-run synthesis with total request count and min concurrency observed.
Hidden zenrowsAdaptiveMode flag (REST API only, not in input schema) for A/B testing provider's mode=auto vs forced configuration.
Tested
A/B test on 20 diverse wines: mode=auto offered zero cost savings (same $/req) but doubled runtime due to 6× more HTTP 403 retries. Decision: keep forced as default.
[0.1.53] — 2026-04-16
Changed
Rate limiter reduced from 2s to 250ms between scraping requests — major throughput improvement.
Default maxConcurrency raised from 3 to 10, upper limit from 10 to 15.
Maximum wines per run capped at 500 (was 1000) with input validation — prevents runs that are mathematically impossible to complete within the timeout.
Default run timeout set explicitly to 2 hours in actor.json.
Fixed
TIMED-OUT runs reduced from ~9.3% to near zero (root cause: 2s rate limiter + concurrency 3 + no batch cap).
[0.1.38–0.1.52] — 2026-03-23 → 2026-04-16
Changed
Scraping backend migrated from Firecrawl to ZenRows (js_render + premium_proxy). Transparent to users — same input, same output.
All code references to Firecrawl renamed to generic naming (ScraperClient, SCRAPING_API_BASE_URL).
parseWinePage(html) consolidated to a single cheerio.load() call (was 3× per wine).
Various parser optimizations: regex on raw HTML instead of $('body').text(), toAbsoluteUrl() helper, isProTeaserCard() predicate.
buildWineResult() and pushAndCharge() helpers eliminate Phase 2 code duplication.
Apify quality score improvements: all input field descriptions rewritten (what + why + how + example format), minItems/maxItems added to arrays.
Missing API key no longer crashes with Failed status — now exits cleanly via Actor.exit() with a user-friendly message.
Firecrawl mention removed from README footer.
[0.1.34] — 2026-03-19
Fixed
LWIN object format support: {"lwin7": "1131644", "vintage": 2021} and {"lwin11": "11316442021"} now work correctly (was stringified as [object Object]).
Phase 2 errors no longer crash the actor — failed winery scrapes push results with winePopularity: null instead.
Added
normalizeLwinEntry() function with validation (7-digit LWIN7, 10-11 digit LWIN11).
README Field Reference table completed (14 → 18 fields).
[0.1.32] — 2026-03-19
Added
Key-Value Store schema declared in actor.json for the 30-day wine cache.
[0.1.31] — 2026-03-19
Changed
Pricing model changed: from BYOK (user provides Firecrawl key) to Firecrawl included at $0.025/wine. Zero setup required for users.
firecrawlApiKey removed from user input — API key managed via Apify secret.
README completely rewritten for new pricing model.
[0.1.30] — 2026-03-19
Added
Output schema (.actor/output_schema.json) with full JSON Schema for all 18 dataset fields.
Fixed
Dataset push crash on null fields: winePopularity and cachedAt can be null — schema updated to ["string", "null"] types.
[0.1.29] — 2026-03-19
Added
SEO metadata and Apify Store categories in actor.json.
[0.1.27] — 2026-03-18
Fixed
Winery name fallback: bottle size suffix ( - 75cl) no longer pollutes extracted name; vintage years (19xx/20xx) are skipped as candidates.
[0.1.23–0.1.26] — 2026-03-17
Added
Smart cache retry: cached results with missing winery data (winePopularity: null but wineryUrl present) are automatically retried in Phase 2 instead of serving stale nulls forever.
Winery name fallback via offer-card consensus: when a producer has no dedicated Wine-Searcher page (no /merchant/ link), the winery name is extracted from offer descriptions using frequency-based voting across cards.
Changed
Complete marketing rewrite: README with SEO structure (6 key features, 4 use cases, pricing table, 7 FAQ), input schema descriptions enriched, actor.json SEO description.
Global rate limiter (2s between requests) prevents burst patterns.
Winery-specific backoff increased to 5s/15s/30s (was 2s/4s).
In-memory Promise-based winery cache — duplicate winery requests share a single scrape.
Phase 1 (wine pages) and Phase 2 (winery pages) now run sequentially instead of interleaved.
[0.1.16] — 2026-03-17
Changed
PPE pricing set to $0.008/wine (later raised to $0.025 in 0.1.31).
Firecrawl API key configured via Apify secret (no longer required in input).
[0.1.13] — 2026-03-16
Added
Search results page detection: when Wine-Searcher returns a search results page instead of a wine profile (common with ambiguous wine names), the actor detects it and follows the first wine link automatically.
HTML scan window extended to 100k characters (Wine-Searcher header/nav occupies 70k+ chars before content).
[0.1.7–0.1.12] — 2026-03-16
Changed
Migrated from Node.js 20 to Node.js 22.
proxyCountry parameter implemented (controls which merchant offers and prices are displayed).
Concurrency pool hardened.
[0.1.1–0.1.6] — 2026-03-07
Added
Initial release: Apify Actor extracting wine scores, prices, winery info and popularity from Wine-Searcher.com.