Bulk Image Downloader avatar

Bulk Image Downloader

Pricing

from $0.70 / 1,000 results

Go to Apify Store
Bulk Image Downloader

Bulk Image Downloader

Download every image from any webpage or direct image URL - at scale. Smart srcset handling picks the highest-resolution variant. Optional sha256 dedup, EXIF stripping for privacy, and minimum size/width filters.

Pricing

from $0.70 / 1,000 results

Rating

0.0

(0)

Developer

Thirdwatch

Thirdwatch

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Save every image from any webpage at scale — full files, dimensions, format, file size, and SHA-256 hash. Paste page URLs or direct image links.

What you get

Give it a list of URLs and get back every image plus rich metadata: width, height, format, content type, byte size, and a SHA-256 hash for dedup. The actor handles both modes — point it at a webpage and it parses the HTML for <img>, <picture>, srcset, and Open Graph tags; or pass a direct image URL and it downloads it straight. Auto-mode picks the right path per URL by inspecting Content-Type. Highest-resolution variants from srcset are picked automatically. Files are saved to the run's key-value store and each dataset row carries a ready-to-use download URL.

Bulk download images from any URL

A single tool for collecting images at scale across hundreds or thousands of pages. No browser, no JavaScript rendering — fast HTTP fetch with up to 10 concurrent connections per run. Supports webpages, direct image URLs, responsive srcset images, and Open Graph / Twitter card images.

Image dataset builder for ML

Built for teams assembling training corpora. Use dedupByHash to skip identical images across pages, minWidth and minSizeBytes to filter out tracking pixels and tiny thumbnails, and maxImagesPerUrl to keep first runs cheap. Every image lands in the run's key-value store with a stable kv_key and a direct download URL on each dataset row, so you can pipe results straight into a downstream training pipeline.

Output fields

FieldDescription
source_urlThe input URL the image was discovered on
image_urlAbsolute URL of the image file
kv_store_keyKey under which the image is stored in the run's key-value store
kv_urlFully formed Apify API URL to download the image
filenameSuggested filename (sequence number + short hash + extension)
content_typeHTTP Content-Type (e.g., image/jpeg, image/png, image/webp)
size_bytesFile size in bytes
sha256SHA-256 hash of the image bytes (used for dedup)
widthImage width in pixels
heightImage height in pixels
formatDecoded image format (JPEG, PNG, WEBP, GIF, etc.)
from_srcsettrue if extracted from a responsive srcset
downloaded_atISO-8601 timestamp when the image was fetched

Example output

{
"source_url": "https://en.wikipedia.org/wiki/Cat",
"image_url": "https://upload.wikimedia.org/wikipedia/commons/3/3a/Cat03.jpg",
"kv_store_key": "000007_3a1f9b2c.jpg",
"kv_url": "https://api.apify.com/v2/key-value-stores/abc123/records/000007_3a1f9b2c.jpg",
"filename": "000007_3a1f9b2c.jpg",
"content_type": "image/jpeg",
"size_bytes": 432109,
"sha256": "3a1f9b2c4d5e6f7081929394a5b6c7d8e9f0a1b2c3d4e5f60718293a4b5c6d7e",
"width": 1280,
"height": 853,
"format": "JPEG",
"from_srcset": true,
"downloaded_at": "2026-05-04T10:11:12.345678+00:00"
}

Input parameters

ParameterRequiredDescription
urlsYesList of webpage URLs to scan for images, or direct image URLs to download.
modeNoauto (default), page, or direct. Auto inspects Content-Type; page parses HTML; direct treats every URL as an image.
includeSrcsetNoPull images from responsive srcset and <picture> elements. Default true.
minWidthNoSkip images smaller than this width when dimensions are known. Default 0 (disabled).
minSizeBytesNoSkip files smaller than this many bytes (filters tracking pixels). Default 0 (disabled).
maxImagesPerUrlNoCap on images downloaded per input URL. Default 100, max 1000.
dedupByHashNoSkip duplicates whose SHA-256 hash matches one already downloaded. Default true.
stripExifNoRe-encode JPEGs without EXIF metadata for privacy. Default false.
proxyConfigurationNoOptional proxy. Most public sites do not require one.

Use cases

  • ML engineers: build image datasets for training and fine-tuning by harvesting category, search, or gallery pages.
  • Site auditors: measure image bloat across a domain — count files, total bytes, and average size per page.
  • E-commerce teams: scrape product image catalogs for migration, redesign, or competitive research.
  • Archivists: snapshot every image from a portfolio, gallery, or news site for offline backup.
  • Privacy-conscious publishers: strip EXIF metadata from user-submitted images before re-publishing.

Limitations

  • stripExif only re-encodes JPEGs (the most common EXIF carrier). Re-encoding is light-touch but technically lossy.
  • Inline data: URIs are skipped; only fetchable URLs are downloaded.
  • Sites that gate images behind a login or short-lived signed URLs cannot be scraped.
  • Maximum image count per page is capped at 1,000 to protect runtime; use maxImagesPerUrl to keep first runs cheap.
  • The actor uses HTTP fetch only — pages that render images via JavaScript after load (some SPAs) may not expose all images.

Compared to alternatives

  • vs. generic Apify image downloaders: this actor returns full per-image metadata (width, height, format, hash, byte size) on the dataset row, plus a ready-to-use kv_url so you don't have to construct download URLs yourself. Built-in dedup, srcset handling, and EXIF stripping save a separate post-processing pass.
  • vs. running wget or a custom script: hosted, retried, deduplicated, and observable in the Apify Console with no infrastructure to maintain.

FAQ

Where are the images stored? Every image is saved as a record in the run's default key-value store. Each dataset row carries a kv_url you can fetch directly, or you can browse the run's "Storage" tab in the Apify Console.

Can I download images directly without scraping a webpage? Yes — set mode: "direct", or leave mode: "auto" and pass image URLs. The actor will skip HTML parsing and download each URL as an image.

Will it follow links to other pages? No. The actor only downloads images from the URLs you provide. To crawl, run a sitemap or link-extraction scraper first and feed the URLs in.

Does it handle responsive images correctly? Yes. With includeSrcset: true (default), the highest-resolution variant from each srcset or <picture> element is selected.

Does it dedupe across pages? Yes. With dedupByHash: true (default), an image already downloaded in this run is skipped on later pages.

How do I keep my first run cheap? Lower maxImagesPerUrl (e.g., 10 or 20) and run on a handful of URLs first to validate output before scaling.

Last verified: 2026-05

More scrapers at thirdwatch.dev.