Bulk Image Downloader
Pricing
from $0.70 / 1,000 results
Bulk Image Downloader
Download every image from any webpage or direct image URL - at scale. Smart srcset handling picks the highest-resolution variant. Optional sha256 dedup, EXIF stripping for privacy, and minimum size/width filters.
Pricing
from $0.70 / 1,000 results
Rating
0.0
(0)
Developer
Thirdwatch
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Save every image from any webpage at scale — full files, dimensions, format, file size, and SHA-256 hash. Paste page URLs or direct image links.
What you get
Give it a list of URLs and get back every image plus rich metadata: width, height, format, content type, byte size, and a SHA-256 hash for dedup. The actor handles both modes — point it at a webpage and it parses the HTML for <img>, <picture>, srcset, and Open Graph tags; or pass a direct image URL and it downloads it straight. Auto-mode picks the right path per URL by inspecting Content-Type. Highest-resolution variants from srcset are picked automatically. Files are saved to the run's key-value store and each dataset row carries a ready-to-use download URL.
Bulk download images from any URL
A single tool for collecting images at scale across hundreds or thousands of pages. No browser, no JavaScript rendering — fast HTTP fetch with up to 10 concurrent connections per run. Supports webpages, direct image URLs, responsive srcset images, and Open Graph / Twitter card images.
Image dataset builder for ML
Built for teams assembling training corpora. Use dedupByHash to skip identical images across pages, minWidth and minSizeBytes to filter out tracking pixels and tiny thumbnails, and maxImagesPerUrl to keep first runs cheap. Every image lands in the run's key-value store with a stable kv_key and a direct download URL on each dataset row, so you can pipe results straight into a downstream training pipeline.
Output fields
| Field | Description |
|---|---|
source_url | The input URL the image was discovered on |
image_url | Absolute URL of the image file |
kv_store_key | Key under which the image is stored in the run's key-value store |
kv_url | Fully formed Apify API URL to download the image |
filename | Suggested filename (sequence number + short hash + extension) |
content_type | HTTP Content-Type (e.g., image/jpeg, image/png, image/webp) |
size_bytes | File size in bytes |
sha256 | SHA-256 hash of the image bytes (used for dedup) |
width | Image width in pixels |
height | Image height in pixels |
format | Decoded image format (JPEG, PNG, WEBP, GIF, etc.) |
from_srcset | true if extracted from a responsive srcset |
downloaded_at | ISO-8601 timestamp when the image was fetched |
Example output
{"source_url": "https://en.wikipedia.org/wiki/Cat","image_url": "https://upload.wikimedia.org/wikipedia/commons/3/3a/Cat03.jpg","kv_store_key": "000007_3a1f9b2c.jpg","kv_url": "https://api.apify.com/v2/key-value-stores/abc123/records/000007_3a1f9b2c.jpg","filename": "000007_3a1f9b2c.jpg","content_type": "image/jpeg","size_bytes": 432109,"sha256": "3a1f9b2c4d5e6f7081929394a5b6c7d8e9f0a1b2c3d4e5f60718293a4b5c6d7e","width": 1280,"height": 853,"format": "JPEG","from_srcset": true,"downloaded_at": "2026-05-04T10:11:12.345678+00:00"}
Input parameters
| Parameter | Required | Description |
|---|---|---|
urls | Yes | List of webpage URLs to scan for images, or direct image URLs to download. |
mode | No | auto (default), page, or direct. Auto inspects Content-Type; page parses HTML; direct treats every URL as an image. |
includeSrcset | No | Pull images from responsive srcset and <picture> elements. Default true. |
minWidth | No | Skip images smaller than this width when dimensions are known. Default 0 (disabled). |
minSizeBytes | No | Skip files smaller than this many bytes (filters tracking pixels). Default 0 (disabled). |
maxImagesPerUrl | No | Cap on images downloaded per input URL. Default 100, max 1000. |
dedupByHash | No | Skip duplicates whose SHA-256 hash matches one already downloaded. Default true. |
stripExif | No | Re-encode JPEGs without EXIF metadata for privacy. Default false. |
proxyConfiguration | No | Optional proxy. Most public sites do not require one. |
Use cases
- ML engineers: build image datasets for training and fine-tuning by harvesting category, search, or gallery pages.
- Site auditors: measure image bloat across a domain — count files, total bytes, and average size per page.
- E-commerce teams: scrape product image catalogs for migration, redesign, or competitive research.
- Archivists: snapshot every image from a portfolio, gallery, or news site for offline backup.
- Privacy-conscious publishers: strip EXIF metadata from user-submitted images before re-publishing.
Limitations
stripExifonly re-encodes JPEGs (the most common EXIF carrier). Re-encoding is light-touch but technically lossy.- Inline
data:URIs are skipped; only fetchable URLs are downloaded. - Sites that gate images behind a login or short-lived signed URLs cannot be scraped.
- Maximum image count per page is capped at 1,000 to protect runtime; use
maxImagesPerUrlto keep first runs cheap. - The actor uses HTTP fetch only — pages that render images via JavaScript after load (some SPAs) may not expose all images.
Compared to alternatives
- vs. generic Apify image downloaders: this actor returns full per-image metadata (width, height, format, hash, byte size) on the dataset row, plus a ready-to-use
kv_urlso you don't have to construct download URLs yourself. Built-in dedup, srcset handling, and EXIF stripping save a separate post-processing pass. - vs. running
wgetor a custom script: hosted, retried, deduplicated, and observable in the Apify Console with no infrastructure to maintain.
FAQ
Where are the images stored?
Every image is saved as a record in the run's default key-value store. Each dataset row carries a kv_url you can fetch directly, or you can browse the run's "Storage" tab in the Apify Console.
Can I download images directly without scraping a webpage?
Yes — set mode: "direct", or leave mode: "auto" and pass image URLs. The actor will skip HTML parsing and download each URL as an image.
Will it follow links to other pages? No. The actor only downloads images from the URLs you provide. To crawl, run a sitemap or link-extraction scraper first and feed the URLs in.
Does it handle responsive images correctly?
Yes. With includeSrcset: true (default), the highest-resolution variant from each srcset or <picture> element is selected.
Does it dedupe across pages?
Yes. With dedupByHash: true (default), an image already downloaded in this run is skipped on later pages.
How do I keep my first run cheap?
Lower maxImagesPerUrl (e.g., 10 or 20) and run on a handful of URLs first to validate output before scaling.
Last verified: 2026-05
More scrapers at thirdwatch.dev.