# E-Commerce AI Training Dataset from Product Pages

**Use case:** 

Extract high-resolution images and metadata from product pages. Receive detailed datasets for AI training, including SHA-256 hashes and EXIF info.

## Input

```json
{
  "urls": [
    {
      "url": "https://www.rei.com/product/248622/patagonia-mens-nano-puff-jacket"
    },
    {
      "url": "https://www.patagonia.com/product/mens-nano-puff-jacket/84212.html"
    },
    {
      "url": "https://www.rei.com/product/243506/arc-teryx-beta-sl-jacket-mens"
    }
  ],
  "mode": "page",
  "includeSrcset": true,
  "includeOgTags": true,
  "minWidth": 400,
  "minHeight": 400,
  "minSizeBytes": 10000,
  "maxImagesPerUrl": 1000,
  "maxUrls": 10000,
  "dedupByHash": true,
  "stripExif": true,
  "convertFormat": "webp-to-png",
  "filenamePattern": "{source}-{idx}-{hash}.{ext}",
  "outputFormat": [
    "dataset",
    "kv-store",
    "zip"
  ],
  "s3Bucket": "",
  "webhookUrl": "",
  "maxConcurrency": 10,
  "downloadTimeoutMs": 15000,
  "imageCheckMaxRetries": 3,
  "proxyConfiguration": {
    "useApifyProxy": false
  },
  "failFast": false,
  "debugLogging": false
}
```

## Output

```json
{
  "filename": {
    "label": "Filename",
    "format": "string"
  },
  "source_url": {
    "label": "Source Page",
    "format": "string"
  },
  "image_url": {
    "label": "Image URL",
    "format": "string"
  },
  "content_type": {
    "label": "Content-Type",
    "format": "string"
  },
  "format": {
    "label": "Format",
    "format": "string"
  },
  "width": {
    "label": "Width",
    "format": "number"
  },
  "height": {
    "label": "Height",
    "format": "number"
  },
  "size_bytes": {
    "label": "Size (KB)",
    "format": "number"
  },
  "is_duplicate": {
    "label": "Duplicate",
    "format": "boolean"
  },
  "exif_stripped": {
    "label": "EXIF Stripped",
    "format": "boolean"
  },
  "from_srcset": {
    "label": "From srcset",
    "format": "boolean"
  },
  "from_og_tag": {
    "label": "From og:image",
    "format": "boolean"
  },
  "kv_url": {
    "label": "Download Binary",
    "format": "string"
  },
  "error": {
    "label": "Error",
    "format": "string"
  },
  "downloaded_at": {
    "label": "Downloaded",
    "format": "string"
  }
}
```

## About this Actor

This example demonstrates how to use [Bulk Image Downloader: 22-Field Metadata, SHA-256 & ZIP](https://apify.com/getascraper/bulk-image-downloader) with a specific input configuration. Visit the [Actor detail page](https://apify.com/getascraper/bulk-image-downloader) to learn more, explore other use cases, and run it yourself.