EAN/GTIN Image extractor - Extract multiple images from any EAN avatar
EAN/GTIN Image extractor - Extract multiple images from any EAN

Pricing

$10.00 / 1,000 images

Go to Store
EAN/GTIN Image extractor - Extract multiple images from any EAN

EAN/GTIN Image extractor - Extract multiple images from any EAN

Developed by

SR

SR

Maintained by Community

This image extractor searches for products using EAN codes across multiple e-commerce platforms, downloads product images, and exports comprehensive product data with stored images.

0.0 (0)

Pricing

$10.00 / 1,000 images

0

Total users

3

Monthly users

3

Runs succeeded

>99%

Last modified

2 days ago

EAN Image Scraper - Apify Actor

This Apify actor searches for products using EAN codes across multiple e-commerce platforms, downloads product images, and exports product information with stored images.

Features

  • Smart country detection: Pre-analyzes products using Google Shopping to optimize search
  • Multi-source scraping: Searches across three platforms with intelligent fallback (UPC DB → Klarna → Bigshopper)
  • Batch processing: Process multiple EAN codes in a single run
  • Image management: Downloads and stores product images with configurable limits
  • Multi-country support: Concurrent search across optimized country lists based on product availability
  • Focused output: Extracts product title and stores images with metadata
  • Retry mechanism: Built-in retry logic with exponential backoff
  • Concurrent processing: Efficient async operations for URL resolution and image downloads
  • Availability detection: Skips expensive searches for products with limited availability (<3 sources)

Output and Dataset

The actor creates one dataset record for each image found:

  • Dataset Structure: One record per extracted image
  • Example: If 10 EANs are processed and 5 products are found with 3 images each, the dataset will contain 15 records
  • Summary Statistics: View total images extracted in the final summary logs

Billing

This actor uses standard Apify platform billing based on compute units consumed during execution. The cost depends on:

  • Number of EANs processed
  • Time taken for searches
  • Resources used for image downloads

Setup and Development

Local Development

# Create virtual environment
python3 -m venv venv
# Activate virtual environment
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run locally
python -m src

Running with Apify CLI

# Install Apify CLI
npm i -g apify-cli
# Run the actor locally
apify run

Input Configuration

The actor accepts the following input parameters:

ParameterTypeRequiredDescriptionDefault
eansArray[String]YesList of EAN codes to process-
maxImagesPerProductIntegerNoMax images to download per product (0 = unlimited)5
skipSecondarySourceIfFoundBooleanNoSkip secondary sources if product found in primary sourcetrue

Example Input

{
"eans": ["5702017814674", "5702017814681", "5702017583488"],
"maxImagesPerProduct": 5,
"skipSecondarySourceIfFound": true
}

Output

The actor stores results in two ways:

Important: Only successfully extracted images are saved to the dataset (one record per image). Products not found or errors are tracked in statistics but not included in dataset results.

1. Dataset

Each extracted image is stored as a separate record with the following structure:

{
"ean": "5702017814674",
"title": "LEGO 10348 Japanse Esdoorn Bonsaiboompje",
"country_found": "NL",
"image_filename": "ean_5702017814674_01.jpg",
"image_url": "https://api.apify.com/v2/key-value-stores/.../records/ean_5702017814674_01.jpg",
"original_url": "https://owp.klarna.com/product/3212417179",
"content_type": "image/jpeg",
"size_bytes": 123456,
"width": 1000,
"height": 1000,
"image_index": 1,
"scraped_at": "2024-01-15 10:30:00"
}

2. Key-Value Store

Images are stored with filenames in the format: ean_{EAN}_{INDEX}.{EXT}

Example: ean_5702017814674_01.jpg

3. Actor Output (Statistics)

The actor also creates an OUTPUT record with processing statistics:

{
"status": "completed",
"statistics": {
"total_eans": 10,
"processed": 25,
"found_primary": 5,
"found_secondary": 2,
"found_tertiary": 1,
"not_found": 2,
"errors": 0,
"total_images": 25
},
"scraped_at": "2024-01-15 10:30:00"
}

Note: In this example, 10 EANs were attempted, 8 products were found, and 25 images were extracted. Each of the 25 images appears as a separate dataset record. The 2 "not_found" items are tracked in statistics only.

Supported Countries

Supported Countries by Source

Primary Source (Klarna) - 10 Countries Concurrent

CountryCodeCurrency
NetherlandsNLEUR
BelgiumBEEUR
GermanyDEEUR
FranceFREUR
United KingdomUKGBP
SwedenSESEK
DenmarkDKDKK
NorwayNONOK
ItalyITEUR
SpainESEUR

Secondary Source (Bigshopper) - 10 Countries Concurrent

CountryCodeCurrency
NetherlandsNLEUR
BelgiumBEEUR
GermanyDEEUR
FranceFREUR
United KingdomUKGBP
ItalyITEUR
SpainESEUR
PolandPLPLN
SwedenSESEK
DenmarkDKDKK

Tertiary Source (UPC Database)

  • Global coverage (country-independent)
  • No country-specific search required

How It Works

The actor implements a three-tier search strategy with sequential fallback:

  1. First Source (UPC Database): Global product database using go-upc.com

    • Global UPC/EAN database (country-independent)
    • Provides basic product information and images
    • Fast initial search for product existence
  2. Second Source (Klarna): Multi-country e-commerce platform search

    • Searches 10 countries concurrently: NL, BE, DE, FR, UK, SE, DK, NO, IT, ES
    • Provides multiple high-quality product images
    • Includes detailed product information and specifications
    • Returns best match based on image count and data completeness
  3. Third Source (Bigshopper): Alternative platform if others fail

    • Searches 10 countries concurrently: NL, BE, DE, FR, UK, IT, ES, PL, SE, DK
    • Extensive product database with shop offers
    • Decodes proxy image URLs to return original sources
    • Only searched if skipSecondarySourceIfFound is false

Search Flow

EAN Input
Google Shopping Analysis → Determine preferred country
Klarna (preferred country or UK+DE for .com) → Found? → Return result
↓ Not found
UPC Database (global) → Found? → Return result
↓ Not found
Bigshopper (preferred country) → Found? → Return result
↓ Not found
No results

The actor searches sources sequentially with smart country selection, stopping on the first successful match.

Smart Search Optimization

Before searching, the actor performs intelligent country detection:

  1. Pre-analysis: Uses Google Shopping to detect product availability
  2. Country detection: Identifies which countries have the product based on merchant URLs
  3. Search optimization: Prioritizes countries where the product is available
  4. Skip logic: If fewer than 3 sources are found, skips Klarna/Bigshopper to save time

This optimization significantly improves search speed by:

  • Searching the most relevant countries first
  • Avoiding searches in countries where the product isn't sold
  • Skipping expensive searches for products with very limited availability

Architecture

The actor uses a highly optimized, fully asynchronous architecture with parallel fallback search:

Search Strategy

  1. Parallel Fallback: All sources are searched simultaneously for each EAN
  2. First-to-Finish: The first successful result is used, other searches are cancelled
  3. 10x Performance: Native async implementation eliminates subprocess overhead

Technical Implementation

  • Batch Processing: EANs are processed in batches of 50 with full concurrency
  • Native Async Scrapers: Direct API calls using aiohttp, no subprocess overhead
  • Concurrent Search: 10 concurrent country searches for primary source
  • Parallel Downloads: Up to 10 concurrent image downloads
  • Connection Pooling: Single aiohttp session with connection reuse
  • Smart Country Detection: Uses Google Shopping to determine preferred search countries
  • Image Deduplication: MD5 hash-based duplicate detection
  • Error Handling: Comprehensive error handling with exponential backoff
apify-image-ean-scraper/
├── .actor/
│ ├── actor.json # Actor configuration
│ └── input_schema.json # Input schema definition
├── src/
│ ├── __main__.py # Entry point
│ ├── main.py # Optimized main actor logic
│ ├── scrapers/
│ │ ├── klarna_scraper_native.py # Native async Klarna scraper
│ │ ├── bigshopper_scraper_async.py # Async Bigshopper scraper
│ │ └── go_upc_scraper_async.py # Async UPC scraper
│ └── utils/
│ ├── ean_validator.py # EAN validation utilities
│ └── image_downloader.py # Concurrent image download
├── Dockerfile # Docker configuration
├── requirements.txt # Python dependencies
└── README.md # This file

Error Handling

The actor implements comprehensive error handling:

  • Invalid EAN codes are logged and skipped
  • Network errors trigger automatic retries (3 attempts)
  • Failed image downloads don't stop the scraping process
  • Each EAN is processed independently to prevent cascading failures
  • Detailed error logging for debugging

Performance

  • Highly concurrent processing: Process EANs in batches of 50 with full parallelism
  • Optimized batch processing flow:
    1. Analyze batch of 50 EANs concurrently
    2. Search all 50 EANs in parallel:
      • All three sources searched simultaneously per EAN
      • First successful result wins, others cancelled
      • 10 concurrent country searches for Klarna
    3. Download all images concurrently (10 parallel downloads)
    4. Move to next batch
  • 10x faster: Native async eliminates subprocess overhead
  • Memory efficient: Connection pooling and resource reuse
  • Smart country selection: Uses only the best country per source based on analysis
  • Tested with 50+ products: Optimized for large batches
  • Request timeout: 30 seconds (reduced for faster failures)
  • Image download timeout: 60 seconds

Performance Tips

  1. Batch Processing: Process multiple EANs in a single run for best performance
  2. Image Limits: Set reasonable maxImagesPerProduct to avoid unnecessary downloads
  3. Skip Secondary: Enable skipSecondarySourceIfFound to avoid redundant searches

Development Notes

  • The actor uses apify SDK version 1.7.0
  • Python 3.11 runtime
  • Modular architecture for easy maintenance
  • Follows security best practices
  • No credentials or sensitive data in code