Bulk Image Downloader: 22-Field Metadata, SHA-256 & ZIP avatar

Bulk Image Downloader: 22-Field Metadata, SHA-256 & ZIP

Pricing

from $2.00 / 1,000 images

Go to Apify Store
Bulk Image Downloader: 22-Field Metadata, SHA-256 & ZIP

Bulk Image Downloader: 22-Field Metadata, SHA-256 & ZIP

Download every image from any webpage or direct image URL. Smart srcset picks the highest-resolution variant. 22 metadata fields per image: width, height, format, SHA-256, dedup flag, EXIF, provenance. ZIP and S3 outputs, webhooks, MCP-ready. $2.00 per 1k.

Pricing

from $2.00 / 1,000 images

Rating

0.0

(0)

Developer

GetAScraper

GetAScraper

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

22 metadata fields per image, SHA-256 content hash, optional EXIF strip and WebP-to-PNG, ZIP and S3 outputs. $2.00 per 1,000 results. 70% cheaper than the top Store alternative. Download every image from any webpage or direct image URL in one call. 50 images per run are free.

This Actor is a generic image downloader. It works on any public URL. Pass it a list of webpages and it discovers every image via HTML <img>, <picture>, srcset, og:image, and twitter:image. Pass it a list of direct image URLs and it downloads them straight. Picks the highest-resolution variant from any srcset automatically. Hashes every image body with SHA-256 for dedup. Strips EXIF or converts WebP to PNG on demand. Exports as a structured dataset, ZIP archive, or S3 upload. Processes 10,000 URLs per run at up to 10 concurrent downloads.


What can you do with it?

  • You are building an AI training dataset. Pull thousands of product photos, real estate shots, or stock images for CLIP, DINOv2, or SigLIP. Auto-hash for dedup means you never train on the same image twice.
  • You are a scraper developer. Hand the Actor a list of image URLs returned by your catalog scraper (REI, IndiaMART, eBay, Poshmark) and get back a ZIP of the binaries plus a clean metadata dataset. One Actor replaces three.
  • You are an e-commerce operator. Mirror product image catalogs. Detect when a competitor swaps an image. Track pricing-page visual changes over time.
  • You are an archivist or newsroom tool. Grab every image from a story page in one call. Use the per-URL ZIP mode to keep sources separated.
  • You are a research analyst. Pull the full visual corpus of any public site for content analysis, brand tracking, or visual trend reports.
  • You are a builder integrating via webhook. The Actor POSTs a JSON summary on completion. Pipe the dataset URL into your BigQuery, Sheets, or n8n pipeline.

How to use it

  1. Open the Actor in the Apify Store and click "Try for free".
  2. Paste your URLs. Mix webpages (the Actor parses the HTML) and direct image links (it downloads straight) in a single list.
  3. Pick your options. Turn on SHA-256 dedup, EXIF strip, format conversion, or ZIP output as needed.
  4. Click Start. The Actor fetches each URL, discovers or downloads the images, and pushes metadata to the dataset and binaries to the key-value store.
  5. Download your results. Pull the dataset as JSON, CSV, or Excel. Grab the image binaries from the key-value store (links in the dataset's kv_url column). Or use the single-click ZIP download.

Input

FieldTypeRequiredDescription
urlsarrayYesList of URLs. Each can be a webpage (HTML is parsed for images) or a direct image link. Mix freely.
modeenumNoauto (recommended, detects by extension), page (force HTML parse), or direct (force image URL).
includeSrcsetbooleanNoDiscover images from srcset, picture>source, and lazy data-src. Default true.
includeOgTagsbooleanNoDiscover Open Graph and Twitter Card images. Default true.
minWidthintegerNoSkip images narrower than this. Default 0.
minHeightintegerNoSkip images shorter than this. Default 0.
minSizeBytesintegerNoSkip images smaller than this. Filters tracking pixels. Default 0.
maxImagesPerUrlintegerNoCap images per source URL. Default 1000.
maxUrlsintegerNoCap total URLs processed. Default 10000.
dedupByHashbooleanNoCompute SHA-256 of each image body and skip duplicates. Default true.
stripExifbooleanNoRe-encode JPEGs without EXIF metadata. Default false.
convertFormatenumNonone, webp-to-png, or png-to-jpg. Default none.
filenamePatternstringNoTemplated filename using {slug}, {hash}, {ext}, {idx}, {source}. Default {slug}-{hash}.{ext}.
outputFormatarrayNodataset (always), kv-store (binaries), zip (single archive), zipPerUrl (one ZIP per source), s3 (upload to bucket), webhook (POST summary on completion).
s3BucketstringNoRequired when outputFormat includes s3. Uses standard AWS_* env vars for credentials.
webhookUrlstringNoURL to receive a JSON run summary on completion.
maxConcurrencyintegerNoMax parallel image downloads. Default 10.
downloadTimeoutMsintegerNoPer-image fetch timeout. Default 15000.
imageCheckMaxRetriesintegerNoRetries per failed image. Default 3.
proxyConfigurationobjectNoOptional proxy. Default off. Use residential if source sites are hotlink-protected.
failFastbooleanNoStop on first error. Default false.
debugLoggingbooleanNoVerbose per-image tracing. Default false.

Output

The Actor pushes one row to the dataset per downloaded image. Binaries are written to the default key-value store under IMAGES/{filename}. Use the dataset's kv_url column to download each binary.

{
"filename": "picsum-photos-800x600-a1b2c3d4e5f67890.jpg",
"source_url": "https://example.com/gallery",
"image_url": "https://picsum.photos/800/600.jpg",
"kv_store_key": "IMG-picsum-photos-800x600-a1b2c3d4e5f67890.jpg",
"kv_url": "https://api.apify.com/v2/key-value-stores/abc/records/IMG-picsum-photos-800x600-a1b2c3d4e5f67890.jpg",
"content_type": "image/jpeg",
"size_bytes": 54321,
"width": 800,
"height": 600,
"format": "jpeg",
"sha256": "a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456",
"is_duplicate": false,
"exif_stripped": false,
"from_srcset": true,
"from_picture_source": false,
"from_og_tag": false,
"from_twitter_tag": false,
"from_data_attr": false,
"from_direct_url": false,
"downloaded_at": "2026-06-20T12:34:56.000Z",
"duration_ms": 423,
"http_status": 200,
"error": null
}

You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.


Output data fields

FieldDescription
filenameFinal filename (per filenamePattern).
source_urlThe page URL the image was discovered on (or its direct URL).
image_urlFinal resolved image URL (after srcset expansion, redirects).
kv_store_keyKey in the run's key-value store (IMG-...).
kv_urlSigned download URL for the binary (24-hour default).
content_typeMIME type (e.g. image/jpeg, image/webp).
size_bytesDownloaded size.
widthImage width in pixels (from sharp metadata).
heightImage height in pixels (from sharp metadata).
formatNormalized format: jpeg, png, webp, gif, svg, avif, bmp, ico, other.
sha256Content hash (when dedupByHash=true).
is_duplicateTrue if hash matched a previously-seen image in this run.
exif_strippedTrue if JPEG was re-encoded to remove EXIF.
from_srcsetTrue if discovered via srcset / picture / data-srcset.
from_picture_sourceTrue if discovered via <picture><source>.
from_og_tagTrue if discovered via <meta og:image>.
from_twitter_tagTrue if discovered via <meta twitter:image>.
from_data_attrTrue if discovered via lazy data-src / data-srcset.
from_direct_urlTrue if the URL was treated as a direct image (mode=direct/auto).
downloaded_atISO timestamp of the download.
duration_msTime to fetch + process.
http_statusHTTP response code (0 on network error).
errorPer-image error string (404, timeout, below-min-size-N, etc.) or null.

Pricing

$2.00 per 1,000 results. The first 50 results of every run are free. There is no monthly fee and no proxy surcharge.

VolumeWhat you pay
50 images (free trial)$0.00
1,000 images$2.00
10,000 images$20.00
100,000 images$200.00

For comparison, the next-most-popular bulk image downloader on the Store (onescales/bulk-image-downloader) charges $7.00 per 1,000 URLs and only ships image bytes (no width, no height, no hash, no format). We charge 70% less and ship the richest schema in the field.

For scheduled or standby runs, pricing drops to $1.00 per 1,000 results (50% off). Volume runs of more than 50,000 images are eligible for $1.50 per 1,000.


Tips and advanced options

  • Set includeSrcset to false if you only want the page's primary images. This skips lazy data-src and responsive variants, which is faster on heavy pages.
  • Use minSizeBytes to filter tracking pixels. A typical tracking pixel is under 1KB. Set minSizeBytes: 2000 to skip them.
  • Use minWidth and minHeight to focus on useful images. Set minWidth: 400 to skip thumbnails and avatars.
  • Pick the right output mode. zip for a single archive, zipPerUrl to keep source pages separated, s3 to push directly to your training bucket.
  • Pair with a catalog scraper. Run one of our catalog scrapers (REI, IndiaMART, eBay) first, then feed the image URLs to this Actor for a complete e-commerce dataset.
  • Schedule weekly runs to refresh your image corpus. Most product catalogs update slowly; daily is overkill.
  • Use SHA-256 dedup across runs. Hashes are stable, so a daily run that re-discovers the same images will mark them as is_duplicate: true and skip the KV write.

FAQ

Is this Actor legal to use? The Actor downloads images that are publicly accessible. You are responsible for ensuring your use case complies with the source site's Terms of Service and applicable copyright laws. Do not use the Actor to bypass access controls, scrape private content, or violate copyright.

Why does it work on any site? The Actor is generic. It fetches the URL you give it, parses the HTML for image tags, and downloads the images it finds. There is no per-site configuration.

Does it execute JavaScript? No. Single-page apps that render images via React/Vue hydration will return an empty image list. If your target site is a SPA, use a Playwright-based scraper first to get the image URLs, then pass them to this Actor with mode: 'direct'.

Do I need a proxy? No. Most public sites serve images to any client. Default useApifyProxy: false works perfectly. If your source site is hotlink-protected, set residential proxy as an opt-in via the proxyConfiguration field.

What is the largest image it can handle? Sharp auto-streams, so peak memory is around 5x the size of the largest single image. A 50MB image is fine. A 500MB image may cause memory pressure on smaller container sizes.

Does the EXIF strip work on PNG or WebP? No, EXIF strip is JPEG-only. PNG metadata stripping is a v2 feature.

How does the free trial work? Every new Apify user gets $5 of platform credit. That is enough to run this Actor many times. The first 50 results of every run are free, so you can evaluate the data quality before spending anything.

Can I get a single ZIP of all images? Yes. Set outputFormat: ['dataset', 'kv-store', 'zip']. The ZIP is written to OUT-images.zip and is also linked in the dataset summary.

Can I push directly to S3? Yes. Set outputFormat: ['dataset', 's3'], fill in s3Bucket, and set AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION as Apify Secrets. Each image uploads to s3://{bucket}/images/{filename}.

Can I get a webhook on completion? Yes. Set outputFormat: ['dataset', 'webhook'] and fill in webhookUrl. The Actor POSTs a JSON summary with run stats (counts, errors, total size) to the URL when the run finishes.


Disclaimers and support

  • Disclaimer: This Actor retrieves publicly accessible images. Make sure your usage complies with the source site's terms of service and applicable copyright laws. The Actor is a generic utility and does not bypass authentication, paywalls, or access controls.
  • Support: Open an issue from the Issues tab for bug reports or feature requests. Custom scrapers and integration help are available on request.