E-Commerce AI Training Dataset from Product Pages

Extract high-resolution images and metadata from product pages. Receive detailed datasets for AI training, including SHA-256 hashes and EXIF info.

Try for free
Bulk Image Downloader: 22-Field Metadata, SHA-256 & ZIP
Bulk Image Downloader: 22-Field Metadata, SHA-256 & ZIPgetascraper/bulk-image-downloader
Filename
Source Page
Image URL
Content-Type
+11 fields
Text
Number
Boolean
List
Object

Input

URLs to Process(required)
url:https://www.rei.com/product/248622/patagonia-mens-nano-puff-jacket+2
URL Mode:page
Include srcset / picture:true
Include og:image / twitter:image:true
Min Width (px):400
Min Height (px):400
Min Size (bytes):10000
Max Images per URL:1000
Max URLs to Process:10000
Deduplicate by Content Hash:true
Strip EXIF Metadata (JPEG):true
Format Conversion:webp-to-png
Filename Pattern:{source}-{idx}-{hash}.{ext}
Output Format:dataset+2
Max Concurrency:10
Download Timeout (ms):15000
Max Retries per Image:3
Proxy Configuration
Fail Fast:false
Verbose Debug Logs:false

Output fields

Filename
Source Page
Image URL
Content-Type
Format
Width
Height
Size (KB)
Duplicate
EXIF Stripped
From srcset
From og:image
Download Binary
Error
Downloaded

How it works

Sign up on Apify01

Create your Apify account to access the Bulk Image Downloader: 22-Field Metadata, SHA-256 & ZIP.

Start the run02

The Actor will start running based on the input automatically.

Receive the output03

Monitor the progress in real-time. You will be notified as soon as your dataset is complete and ready for review.

Integrate into your workflow04

The final output is delivered in JSON, CSV, or Excel format, ready to be plugged into your workflow.

ImageImage

Integrate Actor directly into your workflow

Choose from one of 100+ integration options we provide or integrate via API

WebhookWebhook

Webhook

n8n

n8n

Make

Make

Zapier

Zapier

Airbyte

Airbyte

Keboola

Keboola

IFTTTIFTTT

IFTTT

Hubspot

Hubspot

GDrive

GDrive

Gmail

Gmail

Apify MCPApify MCP

Apify MCP

GitHubGitHub

GitHub

Slack

Slack

LangChainLangChain

LangChain

LlamaIndex

LlamaIndex

Flowise

Flowise

PineconePinecone

Pinecone

OpenAIOpenAI

OpenAI

MastraMastra

Mastra

Clay

Clay