
EAN/GTIN Image extractor - Extract multiple images from any EAN
Pricing
$10.00 / 1,000 images

EAN/GTIN Image extractor - Extract multiple images from any EAN
This image extractor searches for products using EAN codes across multiple e-commerce platforms, downloads product images, and exports comprehensive product data with stored images.
0.0 (0)
Pricing
$10.00 / 1,000 images
0
Total users
3
Monthly users
3
Runs succeeded
>99%
Last modified
2 days ago
EAN Image Scraper - Apify Actor
This Apify actor searches for products using EAN codes across multiple e-commerce platforms, downloads product images, and exports product information with stored images.
Features
- Smart country detection: Pre-analyzes products using Google Shopping to optimize search
- Multi-source scraping: Searches across three platforms with intelligent fallback (UPC DB → Klarna → Bigshopper)
- Batch processing: Process multiple EAN codes in a single run
- Image management: Downloads and stores product images with configurable limits
- Multi-country support: Concurrent search across optimized country lists based on product availability
- Focused output: Extracts product title and stores images with metadata
- Retry mechanism: Built-in retry logic with exponential backoff
- Concurrent processing: Efficient async operations for URL resolution and image downloads
- Availability detection: Skips expensive searches for products with limited availability (<3 sources)
Output and Dataset
The actor creates one dataset record for each image found:
- Dataset Structure: One record per extracted image
- Example: If 10 EANs are processed and 5 products are found with 3 images each, the dataset will contain 15 records
- Summary Statistics: View total images extracted in the final summary logs
Billing
This actor uses standard Apify platform billing based on compute units consumed during execution. The cost depends on:
- Number of EANs processed
- Time taken for searches
- Resources used for image downloads
Setup and Development
Local Development
# Create virtual environmentpython3 -m venv venv# Activate virtual environmentsource venv/bin/activate# Install dependenciespip install -r requirements.txt# Run locallypython -m src
Running with Apify CLI
# Install Apify CLInpm i -g apify-cli# Run the actor locallyapify run
Input Configuration
The actor accepts the following input parameters:
Parameter | Type | Required | Description | Default |
---|---|---|---|---|
eans | Array[String] | Yes | List of EAN codes to process | - |
maxImagesPerProduct | Integer | No | Max images to download per product (0 = unlimited) | 5 |
skipSecondarySourceIfFound | Boolean | No | Skip secondary sources if product found in primary source | true |
Example Input
{"eans": ["5702017814674", "5702017814681", "5702017583488"],"maxImagesPerProduct": 5,"skipSecondarySourceIfFound": true}
Output
The actor stores results in two ways:
Important: Only successfully extracted images are saved to the dataset (one record per image). Products not found or errors are tracked in statistics but not included in dataset results.
1. Dataset
Each extracted image is stored as a separate record with the following structure:
{"ean": "5702017814674","title": "LEGO 10348 Japanse Esdoorn Bonsaiboompje","country_found": "NL","image_filename": "ean_5702017814674_01.jpg","image_url": "https://api.apify.com/v2/key-value-stores/.../records/ean_5702017814674_01.jpg","original_url": "https://owp.klarna.com/product/3212417179","content_type": "image/jpeg","size_bytes": 123456,"width": 1000,"height": 1000,"image_index": 1,"scraped_at": "2024-01-15 10:30:00"}
2. Key-Value Store
Images are stored with filenames in the format: ean_{EAN}_{INDEX}.{EXT}
Example: ean_5702017814674_01.jpg
3. Actor Output (Statistics)
The actor also creates an OUTPUT record with processing statistics:
{"status": "completed","statistics": {"total_eans": 10,"processed": 25,"found_primary": 5,"found_secondary": 2,"found_tertiary": 1,"not_found": 2,"errors": 0,"total_images": 25},"scraped_at": "2024-01-15 10:30:00"}
Note: In this example, 10 EANs were attempted, 8 products were found, and 25 images were extracted. Each of the 25 images appears as a separate dataset record. The 2 "not_found" items are tracked in statistics only.
Supported Countries
Supported Countries by Source
Primary Source (Klarna) - 10 Countries Concurrent
Country | Code | Currency |
---|---|---|
Netherlands | NL | EUR |
Belgium | BE | EUR |
Germany | DE | EUR |
France | FR | EUR |
United Kingdom | UK | GBP |
Sweden | SE | SEK |
Denmark | DK | DKK |
Norway | NO | NOK |
Italy | IT | EUR |
Spain | ES | EUR |
Secondary Source (Bigshopper) - 10 Countries Concurrent
Country | Code | Currency |
---|---|---|
Netherlands | NL | EUR |
Belgium | BE | EUR |
Germany | DE | EUR |
France | FR | EUR |
United Kingdom | UK | GBP |
Italy | IT | EUR |
Spain | ES | EUR |
Poland | PL | PLN |
Sweden | SE | SEK |
Denmark | DK | DKK |
Tertiary Source (UPC Database)
- Global coverage (country-independent)
- No country-specific search required
How It Works
The actor implements a three-tier search strategy with sequential fallback:
-
First Source (UPC Database): Global product database using go-upc.com
- Global UPC/EAN database (country-independent)
- Provides basic product information and images
- Fast initial search for product existence
-
Second Source (Klarna): Multi-country e-commerce platform search
- Searches 10 countries concurrently: NL, BE, DE, FR, UK, SE, DK, NO, IT, ES
- Provides multiple high-quality product images
- Includes detailed product information and specifications
- Returns best match based on image count and data completeness
-
Third Source (Bigshopper): Alternative platform if others fail
- Searches 10 countries concurrently: NL, BE, DE, FR, UK, IT, ES, PL, SE, DK
- Extensive product database with shop offers
- Decodes proxy image URLs to return original sources
- Only searched if skipSecondarySourceIfFound is false
Search Flow
EAN Input↓Google Shopping Analysis → Determine preferred country↓Klarna (preferred country or UK+DE for .com) → Found? → Return result↓ Not foundUPC Database (global) → Found? → Return result↓ Not foundBigshopper (preferred country) → Found? → Return result↓ Not foundNo results
The actor searches sources sequentially with smart country selection, stopping on the first successful match.
Smart Search Optimization
Before searching, the actor performs intelligent country detection:
- Pre-analysis: Uses Google Shopping to detect product availability
- Country detection: Identifies which countries have the product based on merchant URLs
- Search optimization: Prioritizes countries where the product is available
- Skip logic: If fewer than 3 sources are found, skips Klarna/Bigshopper to save time
This optimization significantly improves search speed by:
- Searching the most relevant countries first
- Avoiding searches in countries where the product isn't sold
- Skipping expensive searches for products with very limited availability
Architecture
The actor uses a highly optimized, fully asynchronous architecture with parallel fallback search:
Search Strategy
- Parallel Fallback: All sources are searched simultaneously for each EAN
- First-to-Finish: The first successful result is used, other searches are cancelled
- 10x Performance: Native async implementation eliminates subprocess overhead
Technical Implementation
- Batch Processing: EANs are processed in batches of 50 with full concurrency
- Native Async Scrapers: Direct API calls using aiohttp, no subprocess overhead
- Concurrent Search: 10 concurrent country searches for primary source
- Parallel Downloads: Up to 10 concurrent image downloads
- Connection Pooling: Single aiohttp session with connection reuse
- Smart Country Detection: Uses Google Shopping to determine preferred search countries
- Image Deduplication: MD5 hash-based duplicate detection
- Error Handling: Comprehensive error handling with exponential backoff
apify-image-ean-scraper/├── .actor/│ ├── actor.json # Actor configuration│ └── input_schema.json # Input schema definition├── src/│ ├── __main__.py # Entry point│ ├── main.py # Optimized main actor logic│ ├── scrapers/│ │ ├── klarna_scraper_native.py # Native async Klarna scraper│ │ ├── bigshopper_scraper_async.py # Async Bigshopper scraper│ │ └── go_upc_scraper_async.py # Async UPC scraper│ └── utils/│ ├── ean_validator.py # EAN validation utilities│ └── image_downloader.py # Concurrent image download├── Dockerfile # Docker configuration├── requirements.txt # Python dependencies└── README.md # This file
Error Handling
The actor implements comprehensive error handling:
- Invalid EAN codes are logged and skipped
- Network errors trigger automatic retries (3 attempts)
- Failed image downloads don't stop the scraping process
- Each EAN is processed independently to prevent cascading failures
- Detailed error logging for debugging
Performance
- Highly concurrent processing: Process EANs in batches of 50 with full parallelism
- Optimized batch processing flow:
- Analyze batch of 50 EANs concurrently
- Search all 50 EANs in parallel:
- All three sources searched simultaneously per EAN
- First successful result wins, others cancelled
- 10 concurrent country searches for Klarna
- Download all images concurrently (10 parallel downloads)
- Move to next batch
- 10x faster: Native async eliminates subprocess overhead
- Memory efficient: Connection pooling and resource reuse
- Smart country selection: Uses only the best country per source based on analysis
- Tested with 50+ products: Optimized for large batches
- Request timeout: 30 seconds (reduced for faster failures)
- Image download timeout: 60 seconds
Performance Tips
- Batch Processing: Process multiple EANs in a single run for best performance
- Image Limits: Set reasonable
maxImagesPerProduct
to avoid unnecessary downloads - Skip Secondary: Enable
skipSecondarySourceIfFound
to avoid redundant searches
Development Notes
- The actor uses
apify
SDK version 1.7.0 - Python 3.11 runtime
- Modular architecture for easy maintenance
- Follows security best practices
- No credentials or sensitive data in code