Whiskyfass.de Whisky Scraper avatar

Whiskyfass.de Whisky Scraper

Pricing

Pay per usage

Go to Apify Store
Whiskyfass.de Whisky Scraper

Whiskyfass.de Whisky Scraper

Scrapes 40+ fields per whisky product from whiskyfass.de — Germany's leading whisky retailer. Extracts pricing, ABV, age, cask type, tasting notes, distillery, region, and availability. Fast HTTP crawler, no login required.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

ScrapySpider

ScrapySpider

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

2

Monthly active users

10 days ago

Last modified

Categories

Share

Whiskyfass.de Data Scraper + Shopify Integration 🥃 → 🛒

This Actor scrapes comprehensive whisky product data from whiskyfass.de, a German whisky retailer, and optionally syncs products directly to your Shopify store via the Admin API.

🆕 Shopify Integration Features

Automatic Product Creation - Scraped products are automatically created in Shopify
Smart Updates - Updates existing products by SKU (no duplicates)
Inventory Management - Sets inventory levels automatically
Product Mapping - Maps whisky data to Shopify format (prices, images, tags)
Error Handling - Graceful error recovery with detailed sync reports
Flexible Modes - Create-only, update-only, or upsert (create or update)

Features

  • Comprehensive Product Data - Extracts 40+ fields per product including pricing, availability, images, and metadata
  • Tasting Notes Extraction - Intelligent parser for nose, palate, finish, and appearance descriptions with flavor descriptors
  • Multi-language Support - Handles both German and English content with language detection
  • Smart Data Normalization - Converts units (ml, cl, l), normalizes ABV percentages, and standardizes pricing
  • Flavor Pattern Detection - Uses regex patterns to identify 100+ flavor keywords across fruits, spices, wood, and more

🚀 Quick Start with Shopify

1. Create Shopify Custom App

  1. Go to Shopify Admin → SettingsApps and sales channels
  2. Click "Develop apps""Create an app"
  3. Enable these API scopes: write_products, read_products, write_inventory, read_inventory
  4. Install the app and copy your Admin API Access Token

2. Test Your Connection

Edit test-shopify.js with your credentials and run:

$node test-shopify.js

This will verify your connection and create/delete a test product.

3. Run with Shopify Enabled

apify run --input '{
"startUrls": [{"url": "https://whiskyfass.de/whisky"}],
"enableShopify": true,
"shopifyShopName": "your-store.myshopify.com",
"shopifyAccessToken": "shpat_xxxxx",
"shopifyAction": "upsert"
}'

📖 Detailed guide: See stratgy.md for complete Shopify documentation.

Technical Stack

  • BasicCrawler - HTTP-based crawler using axios + Cheerio for fast static HTML parsing
  • Cheerio - Fast, flexible HTML parsing library (jQuery-like API)
  • Router Pattern - Organized request handling with labeled routes
  • Apify SDK - Dataset storage and Actor lifecycle management
  • Input Schema - Validated Actor configuration

How it works

1. Input Configuration

The Actor accepts startUrls array (defaults to https://whiskyfass.de/whisky) and begins crawling from the main whisky category page.

2. Three-Stage Crawling Process

Stage 1: Category Discovery (Default Handler)

  • Fetches the main whisky page
  • Extracts subcategory links from the navigation menu
  • Enqueues all subcategory URLs with label 'products'

Stage 2: Product Listing Pages (Products Handler)

  • Processes each subcategory page
  • Extracts all product links from the listing
  • Handles pagination to capture all products
  • Enqueues product detail URLs with label 'detail'

Stage 3: Product Detail Extraction (Detail Handler)

  • Fetches and parses individual product pages
  • Extracts 40+ fields including:
    • Basic Info: Name, brand, description, breadcrumbs, images
    • Pricing: Current price, old price, price per liter, currency
    • Product Details: EAN/SKU, bottle size, ABV, availability
    • Whisky Specs: Age, distillery, bottler, region, country, spirit type
    • Maturation: Cask type, cask finish, cask strength, chill filtration status -Project Structure
src/
├── main.js # Actor entry point, crawler initialization
├── routes.js # Router with 3 handlers (category, products, detail)
├── helpers.js # Data extraction & normalization utilities (675 lines)
├── selectors.js # CSS selectors for page elements
└── flavor.js # Regex patterns for flavor keyword detection
.actor/
├── actor.json # Actor metadata and configuration
└── input_schema.json # Input validation schema

Output Schema

Each product record contains these fields:

Prerequisites

  • Node.js 18+ installed
  • Apify CLI installed (npm install -g apify-cli)

Local Development

  1. Install dependencies:
$npm install
  1. Run the Actor locally:
$apify run
  1. Configure input in storage/key_value_stores/default/INPUT.json:
{
"startUrls": [
{ "url": "https://whiskyfass.de/whisky" }
]
}
  1. View results in `storage/datasets/default/product_url: "https://whiskyfass.de/...", canonical_url: "https://whiskyfass.de/...", subcategory_url: "https://whiskyfass.de/category",

// Identifiers product_id: "4012345678901", // EAN or URL hash shop_sku: "12345",

// Basic Info name: "Glenfiddich 12 Year Old", brand: "Glenfiddich", description: "...", breadcrumbs: ["Whisky", "Scotch", "Single Malt"],

// Images main_image: "https://...", images: ["https://...", "https://..."],

// Pricing & Availability price_current: 39.90, price_old: 45.00, price_per_liter: 57.00, currency: "EUR", availability_enum: "in_stock",

// Product Specs bottle_size_ml: 700, abv_percent: 40.0, age_years: 12, cask_strength: false, non_chill_filtered: true,

// Origin & Classification distillery: "Glenfiddich", country: "Scotland", region: "Speyside", spirit_type: "Single Malt Whisky",

// Maturation cask_type: "Ex-Bourbon & Sherry Casks",

// Tasting Notes nose: "Fresh pear, subtle oak...", palate: "Creamy with notes of...", finish: "Long and smooth...", nose_descriptors: ["pear", "oak", "vanilla"], palate_descriptors: ["cream", "malt", "spice"], tasting_notes_confidence: 1.0,

// Metadata timestamp_utc: "2026-02-15T10:30:00.000Z", scrape_run_id: "abc123" }

## Resources
- [Crawlee + Apify Platform guide](https://crawlee.dev/docs/guides/apify-platform)
- [BasicCrawler Documentation](https://crawlee.dev/api/core/class/BasicCrawler)
- [Cheerio Documentation](https://cheerio.js.org/)
- [Apify SDK for JavaScript](https://docs.apify.com/sdk/js)
- [Node.js tutorials](https://docs.apify.com/academy/node-js) in Academy
// Detects and extracts structured tasting notes
nose: "Fresh citrus, vanilla, light oak"
palate: "Honey sweetness, spice notes, dried fruits"
finish: "Long and warming with hints of smoke"
// Auto-extracts flavor keywords
nose_descriptors: ["citrus", "vanilla", "oak"]

Flavor Pattern Detection

The Actor uses regex patterns in src/flavor.js to identify 100+ flavor keywords across categories:

  • Fruits (citrus, apple, berry, etc.)
  • Spices (vanilla, cinnamon, pepper, etc.)
  • Sweet notes (honey, caramel, chocolate, etc.)
  • Wood & smoke (oak, peat, etc.)
  • Nuts & grains
  • Floral & fresh notes

Resources

If you're looking for examples or want to learn more visit:

Getting started

For complete information see this article. To run the Actor use the following command:

$apify run

Deploy to Apify Platform

Option 1: Deploy from Local Machine

  1. Log in to Apify (requires API Token):
$apify login
  1. Deploy your Actor:
$apify push

Your Actor will be built and deployed to Actors -> My Actors.

Option 2: Connect Git Repository

  1. Go to Actor creation page
  2. Click Link Git Repository
  3. Connect your repository and configure build settings

Configuration

Input Parameters

{
"startUrls": [
{ "url": "https://whiskyfass.de/whisky" }
]
}
  • startUrls (required): Array of URLs to start crawling from. Defaults to the main whisky category page.

Crawler Settings

Configured in src/main.js:

  • maxConcurrency: 5 - Maximum parallel requests
  • Uses BasicCrawler with axios for HTTP requests (no browser needed)

Selectors

All CSS selectors are defined in src/selectors.js. Update these if the website structure changes.

Maintenance

Updating Flavor Patterns

Edit src/flavor.js to add or modify flavor detection keywords:

export const flavorPatterns = [
// Add new patterns
/\b(your|new|keywords)\b/gi,
];

Updating Selectors

If the website HTML structure changes, update src/selectors.js:

export const selectors = {
detail: {
productTitle: 'h1.product-title', // Update selectors here
price: '[class*="price h3"]',
// ...
}
};

Troubleshooting

No products found

  • Check if website structure has changed
  • Verify selectors in selectors.js
  • Check logs for HTTP errors or blocked requests

Missing tasting notes

  • Verify the product page has tasting note sections
  • Check tasting_notes_confidence field (1.0 = high confidence, 0.3 = ambiguous)
  • Some products may not have detailed tasting notes

Incorrect data extraction

  • Review helper functions in helpers.js
  • Check regex patterns for units, prices, and ABV
  • Verify language detection is working correctly

Performance Notes

  • Speed: BasicCrawler with Cheerio is ~10x faster than browser-based crawlers for static HTML
  • Efficiency: No browser automation overhead, uses simple HTTP requests
  • Concurrency: Set to 5 parallel requests for optimal balance between speed and server load
  • Resource Usage: Minimal memory and CPU usage compared to Puppeteer/Playwright

License

ISC

Author

It's not you it's me