Extract wine ratings, prices, taste profiles, reviews, and grape varieties from Vivino. Search by wine name or URL. Fast HTTP-only approach with no browser needed. Export JSON, CSV, or Excel.
All notable changes to Vivino Wine Data Scraper are documented here.
v0.4.13 (2026-06-01)
Fixed (SEV-4 false-positive match)
"Chateau Margaux" (and similar queries where the only non-prefix word is a region name) no longer falls onto a wrong winery. The slug-generation single-word fallback now skips well-known wine region names (Margaux, Pomerol, Beaune, Pauillac, etc. - 40+ entries). Previously, Chateau Margaux had its chateau prefix stripped, then the residual margaux was used as a fallback winery slug - which on Vivino resolves to an unrelated winery whose explore matches passed the relevance probe by sharing only chateau and margaux words with the query.
New constant REGION_NAMES_BLOCKLIST in src/constants.ts: lowercase, accent-stripped region names that must not be used as winery-slug fallbacks. Expandable on observation of similar false-positives.
Strategy 1 relevance probe now uses min(2, qw.length) as the per-match threshold instead of a hard-coded >= 2. Previously, when buildQueryWords returned a single distinctive word (e.g. "Chateau Margaux" -> qw = ['margaux'] because chateau is filtered as a WINERY_PREFIX stop word), the probe was mathematically impossible to pass, so the correct winery slug (chateau-margaux -> winery 1319) was rejected and the actor fell through to Strategy 2, which surfaced "Andrena Margaux" (winery "Château Le Coteau") as the documented false positive. The relaxed threshold allows single-word matches to land on the right producer while keeping the protective behavior for multi-word queries (e.g. the T12 "Jean Other Specific Cuvee" path still requires 2/4 matching words).
v0.4.12 (2026-06-01)
Fixed (SEV-5 silent data loss)
Strategy 2 paths (s2-entries, s2-shortened) no longer drop 9 dataset
fields on output. The actor previously synthesized a stub
VivinoExploreMatch from a fully-parsed ScrapedWine, then re-mapped it
via exploreMatchToScrapedWine — losing appellation, grape_varieties,
food_pairings, wine_description, alcohol, image_url, vintage,
wine_type ('Unknown'), and emitting vintageId=0 instead of null. The
cyclic round-trip is replaced by passing the parsed ScrapedWine
directly through SearchWineByNameResult.scrapedWine. All HTML-derived
fields now propagate to the dataset rows.
vintageId nullable contract restored (no more 0 sentinel for unknown
vintage; aligns with dataset_schema.json declared [integer, null]).
Root cause: Apify SDK v3.7.2 pushDataAndCharge (apify/dist/charging.js:407-412) auto-schedules BOTH the explicit event (wine-result) AND the synthetic apify-default-dataset-item event when pushing to the default dataset. mergeChargeResults sums chargedCount across both. So every successful Actor.pushData(item, 'wine-result') returned chargedCount = 2 (1 wine-result + 1 synthetic). The pre-fix counter accumulated raw chargedCount, producing charged = pushed * 2 on every run.
Fix: src/run.ts:336 now collapses chargedCount > 0 to a +1 increment. This restores the original observability intent (track "rows actually billed" for revenue-leak detection — Sprint 5 goal) by ignoring the synthetic dataset event.
No revenue impact: the Apify Platform only bills configured-price events (charging.js:290-294 skips apify- synthetic events). Only wine-result × $0.003 was ever charged.
Verified by replaying the build 0.4.10 parity run: 10 pushes returned chargedCount = 2 each, summary showed charged: 20. After fix the same scenario will report charged: 10.
155 unit tests (+1 explicit "SDK doubling collapse" test that mocks chargedCount = 2 per call and asserts charged === pushed).
v0.4.9 (2026-05-28)
Sprint 6 of the post-audit roadmap: Penfolds Grange Strategy 2 deep-dive. Closes a known parity gap where the user query "Penfolds Grange Bin 95 Shiraz" produced a wrong-wine row (Bin 128 Shiraz) instead of the correct Penfolds Grange.
Root cause (confirmed empirically via Firecrawl probe of the live Vivino search page): Vivino's search ranker, when fed the long historical name, returns ~22 Penfolds Bin variants and ZERO Grange. The same search with the shorter form Penfolds Grange correctly surfaces Grange (wineId 1136930).
New: candidateMatchesQueryCuvee(name, winery, query) in scoring.ts - softer than distinctiveWordsFilter, asserts only that the user query's first distinctive word (the cuvee kernel, e.g., 'grange') appears in the candidate's name. Used to reject candidates that pass score thresholds but miss the cuvee.
New: Strategy 2.2 (processWineEntries on search-page entries) now runs the cuvee check on its top result. Candidates that fail fall through to Strategy 2.3 instead of being emitted as wrong-wine rows.
New: Strategy 2.3 - shortened-query retry. When Strategy 2.2 returns a candidate without the cuvee word, the search page is re-queried with progressively shorter forms (drop trailing word, max 3 attempts, minimum 2 words). Candidates are still scored against the ORIGINAL query so we never demote a valid full-query match.
New: _strategy value s2-shortened for rows surfaced via Strategy 2.3; counted in last-run-summary.output.strategies and in the Strategy 2 fallback aggregate.
Verified non-regressing on all 5 audit cases: Chateau Margaux, Domaine Leflaive Puligny-Montrachet, Chateau Petrus, Sassicaia Tenuta San Guido, Tignanello Antinori. The new cuvee check only fires when distinctive words exist after the winery is excluded.
Regression closed: the TS port was emitting a false-positive Penfolds row (Bin 128) where the monolith correctly returned no row. Sprint 6 restores the parity floor AND goes one row above (Grange surfaced via 2.3).
Sprint 5 of the post-audit roadmap: schema-level cleanups + observability honesty.
Internal: _strategy diagnostic field stripped from every public dataset row at push time (was previously emitted but undeclared in dataset_schema.json). The aggregate strategy distribution stays available in last-run-summary.output.strategies.
Internal: status message and last-run-summary now distinguish pushed (rows in the dataset) from charged (rows successfully billed via PPE). On runs with zero charge failures these two numbers are identical; when they diverge it is a real revenue-leak signal worth investigating.
Schema: dataset_schema.json:fields now declares nullable types (["string","null"] etc.) for every field that the runtime can actually emit as null (winery, vivino_url, appellation, average_rating, price, alcohol, image_url, wineId, vintageId, inputSource, shipTo, vintage, wine_description, taste_profile, reviews, searchQuery). Clients that validate the schema against the rows no longer get type-mismatch warnings.
Schema: key_value_store_schema.json now embeds a full jsonSchema for the last-run-summary blob so the Apify Console auto-documents the structure and any future shape drift in the code is caught.
147 unit tests (no count change; rewrote 1 test to assert _strategy absence + 1 test to assert pushed/charged split).
v0.4.7 (2026-05-28)
Sprint 4 of the post-audit roadmap: observability + documentation.
New: per-run JSON summary written to Key-Value Store under key last-run-summary at the end of every run. Captures timing, input shape, strategy distribution, cache stats, and resilience counters. Useful for offline drift detection and capacity planning.
New: key_value_store_schema.json declares the last-run-summary collection so the Apify Console surfaces it next to OUTPUT.
Docs: README now documents the error / errorMessage fields, the diagnostic _strategy field, and the Key-Value Store run summary. Includes a sample error row and a sample summary blob.
147 unit tests (+1 covering the per-run summary KV write).
Note: adoption of the speculative /api/wines/{id} JSON endpoint (originally on the Sprint 4 plan) was deferred to a dedicated research sprint - the endpoint shape needs validation against the live Vivino API with debug-tagged probes before going to production.
v0.4.5 (2026-05-28)
Sprint 3 of the post-audit roadmap.
New: atomic Actor.pushData(item, 'wine-result') shortcut (Apify v3.4+). Closes the brief push-without-charge gap that existed when push and charge were separate calls.
New: run-level counters for HTTP 429 trips and Strategy 2 fallbacks now surface in the final status message alongside cache hits/misses and charge failures.
Internal: hoisted WINERY_SUFFIXES constant out of the per-entry loop in processWineEntries.
Cleanup: deleted the previously dead exploreToScrapedWine export and its 11 tests; deleted the unused SearchPageResult interface.
Tooling: new scripts/canary-vivino.sh runs the live Actor against a 3-wine canary input (one per strategy) and asserts that core fields are still present. Designed to be wired as a daily Apify scheduled task for early detection of Vivino API/HTML drift.
146 unit tests (down from 157 after removing tests covering the deleted dead export; net delta: removed 11 dead-code tests).
v0.4.3 (2026-05-28)
Sprint 2 of the post-audit roadmap: Phase 4 hygiene + 2 quality fixes + diagnostic field.
Build: Dockerfile now sets NODE_ENV=production and uses npm ci against a committed package-lock.json (reproducible installs).
Build: actor.json metadata updated post TypeScript migration (templateId is now ts-empty, minMemoryMbytes bumped from 128 to 256 for Node 22 + cheerio reliability, defaultRunOptions now pins build: latest and memoryMbytes: 512).
UI: dataset now exposes a third view "Errors" with columns searchQuery, inputSource, error, errorMessage, scrapedAt.
Fix: winery now returns null (instead of the placeholder 'Unknown') when extraction fails on a successful row. This restores parity with the JS monolith and stops the downstream quality filter from treating 'Unknown' as a real producer name. Error rows still use 'Unknown' for display.
Fix: Strategy 2.1 slug discovery now uses the chosen wine-entries set (script entries when present, link entries otherwise) instead of always reading link entries only. Closes a recall gap on winery prefixes that only surface in React data-props.
Diag: every dataset row now carries a _strategy field (url, s1, s2-discovered-slug, s2-entries, error) so operators can trace which code path produced each result. Diagnostic-only - safe to ignore in client integrations.
New: 3-strategy name search restored end-to-end (winery slug fan-out, HTML search page, wine entries fallback)
New: discoverWinerySlugFromSearchEntries retries Strategy 1 with a slug discovered from the search page
New: error rows now produced for failed lookups instead of silent drops (every dataset row is billable, per the pay_per_event schema)
New: /wines/{id} URLs follow Vivino's redirect to recover the canonical seoName
New: shared rate-limiter across API + HTML fetches (a 429 on one channel now backs off the other)
New: 500-1500 ms jittered sleep between wines reduces 429 risk on bursts
New: duplicate inputs (same wineId + vintage) produce one row + one charge instead of N
New: cross-strategy non-winery-word coverage check falls through to the search page when Strategy 1 returns a weak match
New: charge failure count surfaced in the final status message
151 unit tests across cache, parser, api-client, html-client, scoring, search, wine-entries, run
No changes to input/output schemas: existing customer integrations continue to work unchanged
v0.2.120 (2026-05-22)
Fixed
average_rating is no longer silently 0 when Vivino has rated the wine. The HTML rating extractor was looking for a field name (average_rating) that Vivino does not use in its inline data; the actual field is ratings_average (plus wine_ratings_average for wine-level fallback). When extraction failed, the output reported average_rating: 0 together with a non-zero ratings_count, suggesting a real "zero star" rating that did not exist. The scraper now scans the correct field names (with the legacy spelling as a last-chance fallback) and prefers wine-level rating when the vintage-level rating is unavailable. Fields with no rating data return average_rating: null so consumers can distinguish "unrated" from "actually zero".
v0.2.119 (2026-05-21)
Fixed
Common French wine abbreviations are now expanded before matching. Catalog entries like NSG, Gds, VV, BLC, RGE, 1er, P.C., G.C., CDP, CDR, SGN, VT, MC (monopole) are expanded to their full form (Nuits-Saint-Georges, Grandes, Vieilles Vignes, Blanc, Rouge, Premier, Premier Cru, Grand Cru, Chateauneuf-du-Pape, Cotes du Rhone, Selection de Grains Nobles, Vendanges Tardives, Monopole) so the quality filter recognizes them in Vivino result names. Recovers previously-rejected wines such as Mugneret Gerard NSG Aux Cras (now matches Nuits-Saint-Georges 1er Cru 'Aux Cras').
v0.2.118 (2026-05-21)
Fixed
Appellation markers (AOC, AOP, IGP, VDF, VDP, DOC, IGT, and others) and 4-digit years are now stripped from the search query. They are derived from the vintage field already, so leaving them in the search text just polluted Vivino's matcher and confused the quality filter into rejecting good results. Knock-on effect: previously-rejected wines like Clos de la Hutte (Thibaud Boudignon Savennières) and Marc Kreydenweiss Clos du Val D'Eléon should now match.
Legal-entity descendants of producers (Fils, Fille, Frères, Père) are now treated as non-distinctive when comparing the query against a match. This stops false rejections on wines like Boulard & Fille Les Murgiers and Dehours & Fils Grande Réserve.
v0.2.117 (2026-05-21)
Fixed
Catalog region prefixes are now stripped before searching. Inputs that begin with a regional or category label followed by a period — e.g. Champagne Blanc., L'Alsace., La Corse., Le Rhône Septentrional., La Côte Chalonnaise et le Mâconnais. — have that prefix removed automatically. The remaining text is what gets searched on Vivino, so the producer and wine name become the leading tokens (e.g. Franck Pascal Fluence Brut Nature instead of Champagne Blanc. S.A. Franck Pascal "Fluence" Brut Nature 50). Substantially improves match rate on wine-list-style catalog exports.
v0.2.116 (2026-05-21)
Fixed
Catalog-style 2-column tab inputs (e.g. Champagne Blanc. S.A. Franck Pascal "Fluence" Brut Nature 50<TAB>NV) are now parsed correctly: the description column is used as the search query, the vintage column accepts both years and NV, and stray catalog elements are stripped before searching: curly quotes (""''), legal-entity markers (S.A., SCEA, SARL, SAS, EARL, GAEC`), and a trailing price.
v0.2.115 (2026-05-21)
Fixed
Common first names no longer cause wrong producer matches. For queries like Jean-Claude Bachelet, Chassagne-Montrachet, La Boudriotte Rouge, the scraper could previously return a wine from Jean-Claude Ramonet because the first two words ("jean", "claude") matched. The validator now also requires the producer's family name (last significant word in the winery) to appear in the query.
v0.2.114 (2026-05-21)
Fixed
Match accuracy on queries with punctuation: wine names containing commas or other punctuation (e.g. Domaine des Comtes Lafon, Meursault Premier Cru, Charmes) are now correctly tokenized, restoring matches that were previously rejected by the validator due to trailing characters attached to words. Expected effect: significantly higher extraction success rate on real catalog inputs.
v0.2.113 (2026-05-21)
Fixed
Mismatched wines no longer leak through with the search query as winery name. When the wine page extraction did not yield a producer, the previous code substituted the search query into the winery field, which also disabled the downstream quality filter and let unrelated wines be returned. The filter now activates correctly on these cases and rejects results whose distinctive query words do not appear in the matched wine. Side effect: fewer null prices in the output, since most of the missing prices came from these mismatched results.
v0.2.112 (2026-05-21)
Changed
Include Taste Profile is now unchecked by default. Both enrichment options (Include Taste Profile and Include Reviews) start unchecked so the cheapest, fastest configuration is the first thing users see. Enable either checkbox to fetch the corresponding data.
v0.2.111 (2026-05-20)
Changed
Output tab simplified: results remain available in the default dataset, accessible via the Dataset tab or the standard Apify dataset API.
v0.2.110 (2026-05-19)
Added
Result caching: taste profiles and user reviews are cached so repeat runs on the same wines are faster and place less load on Vivino. Cache statistics appear in the final run status message.
v0.2.108 (2026-05-19)
Changed
README now documents the ACTOR_MAX_TOTAL_CHARGE_USD setting users can configure in Run options to cap per-run spending.
v0.2.107 (2026-05-19)
Added
Output tab populated with direct access patterns for the dataset (overview, detailed, full JSON, CSV).
v0.2.106 (2026-05-19)
Changed
Store description rewritten for broader feature coverage.
README restructured and shortened for clarity.
Input form field descriptions reworded with consistent phrasing.
error and errorMessage dataset fields documented as null on success.
Fixed
Memory recommendation is now consistent across all README sections (512 MB default, 1024 MB for very large batches of 5,000+ wines).
v0.2.105 (2026-05-19)
Changed
wines field prefill in the Console form now showcases the 6 accepted input shapes pedagogically: wine name only, name with vintage, Vivino URL (canonical), Vivino URL with ?year= parameter, scheme-less Vivino URL (auto-prefixed with https://), and Vivino URL with locale path (/fr/). Helps users discover the breadth of accepted formats without reading the docs.
v0.2.104 (2026-05-19)
Changed
Input schema redesign: replaced wineUrls + wineNames with a single wines array that auto-routes URLs and names by shape. Mix freely; the actor handles routing.
Removed maxResultsPerSearch (now hardcoded to 1 -- best match only) and proxyConfig (was unused).
Flat Console layout: removed all sectionCaption/sectionDescription keys. All inputs are visible without expanding sections.
Backwards compatibility
Legacy wineUrls and wineNames keys still work silently. Existing scheduled tasks and API integrations continue running without change. No deprecation warning.
v0.2.103 (2026-05-18)
Added
## Is it legal to scrape Vivino? canonical H2 (hiQ Labs reference, GDPR caveat for EU review data, Apify blog link)
## Support & feedback closing block at end of README with Issues tab link
FAQ entry "Can I use this scraper via MCP?"
Changed
Canonical name unified to "Vivino Wine Data Scraper" across README H1, actor.json title, and seoTitle (was 3 different forms)
Wine emoji 🍷 removed from seoTitle
Pricing: Starter plan $49/month → $29/month (~9,600 wines/month)
H2 "Which Vivino actor should I use?" → "Which Vivino scraper should I use?"
Console URL console.apify.com/settings/integrations now includes ?fpr=mrbridge
Removed etc. from L49 wine-type listing and from dataset_schema.json wine_type description
Removed all em-dashes (zero em-dash policy 2026-05-18)
v0.2.102 (2026-05-13)
Documentation: corrected throughput claim (5-50 wines/min depending on enrichment, was "50-100"), increased recommended memory to 512 MB default / 1024 MB for large batches (was 256/512), bumped Actor default memory to 512 MB
v0.2.101 (2026-05-12)
Resilience: replaced Promise.all with Promise.allSettled in dual-search (with/without shipTo) and enrichment (taste+reviews) so a transient Vivino failure on one fetch no longer drops the whole wine. Charge errors now logged as warnings with a per-run counter exposed in the final status message (silent revenue loss becomes observable).
v0.2.99 (2026-05-07)
Remove optional fields from output when not requested: reviews field omitted when includeReviews: false, taste_profile and food_pairings omitted when includeTasteProfile: false. Eliminates empty columns in exported data.
v0.2.98 (2026-05-07)
Add rate limit handling: retry+backoff on HTTP 429 for HTML pages (fetchHtml), global rate limit cooldown shared across all concurrent requests, break winery slug loop on persistent 429. Fixes large batch runs (34+ wines) where queries 26+ failed due to Vivino rate limiting (29/34 vs 18/34 without fix).
v0.2.97 (2026-05-07)
Change default maxResultsPerSearch from 5 to 1 (best match only). Users pay per result, so default should minimize cost; set higher explicitly for alternatives.
v0.2.96 (2026-05-07)
Fix duplicates & maxResults: global dedup by wineId+vintage (same wine different vintage = distinct result, same wine same vintage = deduped) + maxResults alias takes priority over schema default + slice results after double-search merge.
v0.2.95 (2026-05-07)
Fix missing prices: fetch vintage-specific pages (?year=XXXX) + carry explore API prices through wineEntries fallback (HTML pages often lack pricing in JSON-LD).
v0.2.94 (2026-04-27)
Reorder export fields: searchQuery/name/winery/vintage/type first, then region/country/url, metadata last.
v0.2.93 (2026-04-27)
Winery validation: require ≥50% word match instead of any single word (fixes false positives on common first names like "Jean").