All notable changes to Wine Searcher Scraper — Wine Data from List are documented here.
Format based on Keep a Changelog . Versions follow the Apify build numbering (0.1.XX).
[0.1.77] — 2026-04-25
Changed
README optimized for Apify Store. Complete restructure aligned with Vivino actor template: added "What is" intro, "Which wine scraper should I use?" cross-selling table, "Quick Start — Test in 60 seconds", "Why scrape Wine-Searcher?" use cases, data extraction table with "Always included" column, configuration table with JSON examples, "Tips for best results", "Troubleshooting" (5 scenarios), "Privacy & Security", "Resources", "License". Pricing reformatted with tiers. FAQ enriched (10 questions). Changelog limited to 10 most recent versions. "Related Wine Scrapers" moved up and expanded.
[0.1.76] — 2026-04-22
Added
POS/inventory wine name cleaning. Wine names from POS systems (e.g. Champagne, Dom Perignon Brut, 2013, Champagne, France) are now automatically cleaned before searching Wine-Searcher. The pipeline strips category prefixes (Champagne, Port, Dessert Wine, Red Blend, Sauvignon Blanc…), bottle sizes in parentheses ((375ml), (Split 187ml), (1.5L)…), and replaces commas with spaces. The original input is preserved in inputValue — only the search URL is cleaned. This dramatically improves match rates for clients sending POS/inventory-formatted wine lists.
New exported function cleanWineName() in src/input.ts with 27 unit tests.
Changed
170 total tests (was 139).
[0.1.74] — 2026-04-21
Fixed
Numeric LWIN codes no longer crash the actor. REST API and integration clients (n8n, Make, Python, etc.) sending LWIN codes as numbers ([1067130]) instead of strings (["1067130"]) caused an immediate fatal error. normalizeLwinEntry now accepts number entries and converts them to strings before validation. Affects LWIN7, LWIN11 and longer codes.
Missing inputType no longer crashes the actor. API clients omitting the inputType field (which is only auto-filled by the Apify Console UI) caused an immediate fatal error. validateInput now auto-detects the input type from whichever array field is populated (lwins, urls, or wineNames).
Changed
LwinEntry type extended to accept number in addition to string and object formats.
10 new tests covering both fixes (139 total, was 129).
[0.1.73] — 2026-04-20
Changed
Unified pipeline. Merged 2-phase architecture (Phase 1: wine pages → Phase 2: winery pages) into a single pipeline where each task chains wine scrape → parse → winery scrape → push without waiting for other tasks. Eliminates idle time between phases (~15-20% throughput improvement).
Analytics: phase1DurationMs + phase2DurationMs replaced by single scrapingDurationMs.
Scraping response metrics (recordResponseMetrics) downgraded from log.info to log.debug — reduces log noise on large batches while keeping the end-of-run summary in log.info.
Removed
WinePhaseResult intermediate interface (no longer needed with unified pipeline).
clearWineryCache() call between phases (cache is empty at startup).
[0.1.71] — 2026-04-20
Removed
Cache completely hidden from users. All user-facing cache references removed: useCache and cacheTtlDays input fields, cachedAt output field, KV Store schema in actor.json, 30-day cache key feature, cache FAQ, cache pricing row, and all cache-related log.info messages. Cache remains fully functional internally — only external visibility is removed.
Changed
Cache-related log messages downgraded from log.info to log.debug (invisible at default log level).
CLAUDE.md: added "cache is INVISIBLE to the user" as an absolute constraint.
[0.1.69] — 2026-04-19
Added
API Integration guide. README now documents synchronous (run-sync-get-dataset-items) and asynchronous (/runs) API calls with cURL, Node.js and Python examples. Dataset export formats table (JSON, CSV, Excel, XML, JSONL) with field filtering.
Workflow & database integration guide. New "Integrate into Your Workflow" section: scheduled runs (cron examples), webhooks (Flask → PostgreSQL example), full database integration examples (Node.js + PostgreSQL, Python + SQLite), no-code integrations table (Google Sheets, Airtable, Zapier, Make, n8n), and large catalog batching pattern (>500 wines).
Changed
FAQ "Can I integrate this with my existing tools?" now links to the new integration section instead of a generic answer.
[0.1.67] — 2026-04-19
Changed
Cache hits are now billed. Updated all marketing copy (README, input schema) to reflect that cached wines carry the standard $0.025 PPE charge. The code already billed cache hits — this aligns the documentation with actual behavior. Removed "free cache" mentions from key features, pricing table, FAQ, and input field descriptions.
[0.1.63] — 2026-04-19
Added
Run analytics. Structured metrics are now persisted to the KV Store at the end of every run (key analytics-{runId}). Includes batch size, input type, cache hit/miss/partial counters, success/error/not-found counts, phase durations, and full scraping retry distribution. Enables data-driven monitoring of actor health and usage patterns.
Timeout guidance. New informational timeoutSecs field in the Apify Console input UI with recommended values. The actor now warns at startup if the allocated run timeout looks too low for the batch size (formula: max(120, batchSize × 8) seconds). README FAQ enriched with a batch-size-to-timeout recommendation table.
Review solicitation. End-of-run logs now include a visible call-to-action with the Apify Store review link (with affiliate tag). The actor status message shows the wine count on completion.
Fixed
Missing analytics tracking on 4 error paths: HTML-level 404 detection, search-results redirect failures, and Phase 2 inner catch fallback were not counted in notFoundCount / errorCount.
Changed
Finalization logic (logScrapingSummary, analytics persist) moved to a finally block — ensures metrics are always saved, even on fatal errors.
Extracted cloneScrapingMetrics() helper in scraper.ts to eliminate duplicated deep-copy logic between getScrapingMetrics() and getAnalyticsSnapshot().
[0.1.62] — 2026-04-19
Added
LWIN16/LWIN18 support. Longer LWIN codes (12+ digits) are now automatically truncated to the first 11 digits (LWIN11) before URL construction. Previously, these codes caused a validation error — now they work seamlessly for users whose wine management software exports extended LWIN formats.
[0.1.61] — 2026-04-18
Changed
maxConcurrency removed from the Apify Console input UI — concurrency is now fixed at 30 for all users. The parameter remains functional via the REST API for power users.
README FAQ updated: concurrency no longer advertised as a tunable setting.
[0.1.60] — 2026-04-17
Fixed
Duplicate dataset entries on long runs. When Apify migrates the actor container mid-run, the script restarts from scratch on the same dataset — previously producing duplicate entries (up to 1.5× the expected item count). Now: existing dataset items are read at startup to detect restarts, already-pushed wines are filtered from the task list, and a Set-based guard in pushAndCharge prevents any double push within the same execution.
[0.1.59] — 2026-04-17
Changed
Default maxConcurrency bumped from 10 to 30 — the actor now scrapes up to 30 wines in parallel (was 10), tripling throughput for large batches. This leverages the provider's 100 concurrent slots while keeping ~70 slots in reserve.
Input schema description updated with realistic timing (500 wines ≈ 30-60 min) and consistent guidance.
README FAQ updated to reflect new defaults and validated performance data.
Removed
Provider cost/credit information no longer appears in actor logs (business confidentiality). Logs now show request count + concurrency observed only.
Provider brand references scrubbed from JSDoc comments and log lines — all naming is now generic (scraping-api).
[0.1.57] — 2026-04-16
Fixed
HTTP 403 no longer kills the entire run. Previously, a single 403 from the scraping provider was classified as permanent_auth (dead API key) and triggered Actor.exit(1) — dropping all successfully parsed wines from the dataset. Now: 401 = permanent_auth (fatal), 403 = blocked_px (retryable with 30s/60s backoff + jitter). A 403 isolated to one URL is retried; if all attempts fail, that wine is marked as error and the run continues.
[0.1.56] — 2026-04-16
Added
Differentiated retry strategy with 7 failure categories: permanent_auth, not_found, rate_limit, transient_api, transient_net, blocked_px, timeout. Each category has its own retry policy (0-5 retries, custom backoff, jitter ±30%).
Retry distribution stats in end-of-run log: attempt-1=N (X%), attempt-2=M (Y%), failed=K (Z%) + failure categories breakdown.
Retry-After header respected on HTTP 429 responses.
ScrapeResult discriminated union type replaces raw string | null returns from scraper — callers get structured success/failure data.
Changed
scrapeWithRetry refactored: classifyError pure function determines category, RETRY_POLICY table drives retry/backoff per category, exponential backoff with ±30% jitter prevents thundering herd.
not_found (HTTP 404) is no longer retried (0 retries) — saves provider budget on wines that don't exist.
Removed
Legacy MAX_RETRIES constant and uniform retry logic.
[0.1.55] — 2026-04-16
Changed
maxConcurrency upper limit raised from 15 to 50 (leveraging provider Business 300 plan: 100 concurrent slots).
RECOMMENDED_MIN_CONCURRENCY raised from 10 to 20 — runtime warning triggers for large batches with concurrency below 20.
Added
README FAQ: "Which run timeout should I set?" (prompted by an external user's TIMED-OUT run with timeoutSecs: 15).
[0.1.54] — 2026-04-16
Added
Scraping instrumentation: per-request log with concurrency remaining/limit and request ID. End-of-run synthesis with total request count and min concurrency observed.
Hidden zenrowsAdaptiveMode flag (REST API only, not in input schema) for A/B testing provider's mode=auto vs forced configuration.
Tested
A/B test on 20 diverse wines: mode=auto offered zero cost savings (same $/req) but doubled runtime due to 6× more HTTP 403 retries. Decision: keep forced as default.
[0.1.53] — 2026-04-16
Changed
Rate limiter reduced from 2s to 250ms between scraping requests — major throughput improvement.
Default maxConcurrency raised from 3 to 10, upper limit from 10 to 15.
Maximum wines per run capped at 500 (was 1000) with input validation — prevents runs that are mathematically impossible to complete within the timeout.
Default run timeout set explicitly to 2 hours in actor.json.
Fixed
TIMED-OUT runs reduced from ~9.3% to near zero (root cause: 2s rate limiter + concurrency 3 + no batch cap).
[0.1.38–0.1.52] — 2026-03-23 → 2026-04-16
Changed
Scraping backend migrated from Firecrawl to ZenRows (js_render + premium_proxy). Transparent to users — same input, same output.
All code references to Firecrawl renamed to generic naming (ScraperClient, SCRAPING_API_BASE_URL).
parseWinePage(html) consolidated to a single cheerio.load() call (was 3× per wine).
Various parser optimizations: regex on raw HTML instead of $('body').text(), toAbsoluteUrl() helper, isProTeaserCard() predicate.
buildWineResult() and pushAndCharge() helpers eliminate Phase 2 code duplication.
Apify quality score improvements: all input field descriptions rewritten (what + why + how + example format), minItems/maxItems added to arrays.
Missing API key no longer crashes with Failed status — now exits cleanly via Actor.exit() with a user-friendly message.
Firecrawl mention removed from README footer.
[0.1.34] — 2026-03-19
Fixed
LWIN object format support: {"lwin7": "1131644", "vintage": 2021} and {"lwin11": "11316442021"} now work correctly (was stringified as [object Object]).
Phase 2 errors no longer crash the actor — failed winery scrapes push results with winePopularity: null instead.
Added
normalizeLwinEntry() function with validation (7-digit LWIN7, 10-11 digit LWIN11).
README Field Reference table completed (14 → 18 fields).
[0.1.32] — 2026-03-19
Added
Key-Value Store schema declared in actor.json for the 30-day wine cache.
[0.1.31] — 2026-03-19
Changed
Pricing model changed: from BYOK (user provides Firecrawl key) to Firecrawl included at $0.025/wine. Zero setup required for users.
firecrawlApiKey removed from user input — API key managed via Apify secret.
README completely rewritten for new pricing model.
[0.1.30] — 2026-03-19
Added
Output schema (.actor/output_schema.json) with full JSON Schema for all 18 dataset fields.
Fixed
Dataset push crash on null fields: winePopularity and cachedAt can be null — schema updated to ["string", "null"] types.
[0.1.29] — 2026-03-19
Added
SEO metadata and Apify Store categories in actor.json.
[0.1.27] — 2026-03-18
Fixed
Winery name fallback: bottle size suffix ( - 75cl) no longer pollutes extracted name; vintage years (19xx/20xx) are skipped as candidates.
[0.1.23–0.1.26] — 2026-03-17
Added
Smart cache retry: cached results with missing winery data (winePopularity: null but wineryUrl present) are automatically retried in Phase 2 instead of serving stale nulls forever.
Winery name fallback via offer-card consensus: when a producer has no dedicated Wine-Searcher page (no /merchant/ link), the winery name is extracted from offer descriptions using frequency-based voting across cards.
Changed
Complete marketing rewrite: README with SEO structure (6 key features, 4 use cases, pricing table, 7 FAQ), input schema descriptions enriched, actor.json SEO description.
Global rate limiter (2s between requests) prevents burst patterns.
Winery-specific backoff increased to 5s/15s/30s (was 2s/4s).
In-memory Promise-based winery cache — duplicate winery requests share a single scrape.
Phase 1 (wine pages) and Phase 2 (winery pages) now run sequentially instead of interleaved.
[0.1.16] — 2026-03-17
Changed
PPE pricing set to $0.008/wine (later raised to $0.025 in 0.1.31).
Firecrawl API key configured via Apify secret (no longer required in input).
[0.1.13] — 2026-03-16
Added
Search results page detection: when Wine-Searcher returns a search results page instead of a wine profile (common with ambiguous wine names), the actor detects it and follows the first wine link automatically.
HTML scan window extended to 100k characters (Wine-Searcher header/nav occupies 70k+ chars before content).
[0.1.7–0.1.12] — 2026-03-16
Changed
Migrated from Node.js 20 to Node.js 22.
proxyCountry parameter implemented (controls which merchant offers and prices are displayed).
Concurrency pool hardened.
[0.1.1–0.1.6] — 2026-03-07
Added
Initial release: Apify Actor extracting wine scores, prices, winery info and popularity from Wine-Searcher.com.