# Bulk Image Downloader: 22-Field Metadata, SHA-256 & ZIP (`getascraper/bulk-image-downloader`) Actor

Download every image from any webpage or direct image URL. Smart srcset picks the highest-resolution variant. 22 metadata fields per image: width, height, format, SHA-256, dedup flag, EXIF, provenance. ZIP and S3 outputs, webhooks, MCP-ready. $2.00 per 1k.

- **URL**: https://apify.com/getascraper/bulk-image-downloader.md
- **Developed by:** [GetAScraper](https://apify.com/getascraper) (community)
- **Categories:** Automation, Developer tools, Other
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.00 / 1,000 images

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Bulk Image Downloader: 22-Field Metadata, SHA-256 & ZIP

**22 metadata fields per image, SHA-256 content hash, optional EXIF strip and WebP-to-PNG, ZIP and S3 outputs. $2.00 per 1,000 results. 70% cheaper than the top Store alternative.** Download every image from any webpage or direct image URL in one call. 50 images per run are free.

This Actor is a generic image downloader. It works on any public URL. Pass it a list of webpages and it discovers every image via HTML `<img>`, `<picture>`, `srcset`, `og:image`, and `twitter:image`. Pass it a list of direct image URLs and it downloads them straight. Picks the highest-resolution variant from any `srcset` automatically. Hashes every image body with SHA-256 for dedup. Strips EXIF or converts WebP to PNG on demand. Exports as a structured dataset, ZIP archive, or S3 upload. Processes 10,000 URLs per run at up to 10 concurrent downloads.

---

### What can you do with it?

- **You are building an AI training dataset.** Pull thousands of product photos, real estate shots, or stock images for CLIP, DINOv2, or SigLIP. Auto-hash for dedup means you never train on the same image twice.
- **You are a scraper developer.** Hand the Actor a list of image URLs returned by your catalog scraper (REI, IndiaMART, eBay, Poshmark) and get back a ZIP of the binaries plus a clean metadata dataset. One Actor replaces three.
- **You are an e-commerce operator.** Mirror product image catalogs. Detect when a competitor swaps an image. Track pricing-page visual changes over time.
- **You are an archivist or newsroom tool.** Grab every image from a story page in one call. Use the per-URL ZIP mode to keep sources separated.
- **You are a research analyst.** Pull the full visual corpus of any public site for content analysis, brand tracking, or visual trend reports.
- **You are a builder integrating via webhook.** The Actor POSTs a JSON summary on completion. Pipe the dataset URL into your BigQuery, Sheets, or n8n pipeline.

---

### How to use it

1.  **Open the Actor** in the Apify Store and click "Try for free".
2.  **Paste your URLs.** Mix webpages (the Actor parses the HTML) and direct image links (it downloads straight) in a single list.
3.  **Pick your options.** Turn on SHA-256 dedup, EXIF strip, format conversion, or ZIP output as needed.
4.  **Click Start.** The Actor fetches each URL, discovers or downloads the images, and pushes metadata to the dataset and binaries to the key-value store.
5.  **Download your results.** Pull the dataset as JSON, CSV, or Excel. Grab the image binaries from the key-value store (links in the dataset's `kv_url` column). Or use the single-click ZIP download.

---

### Input

| Field | Type | Required | Description |
| --- | --- | :---: | --- |
| `urls` | array | **Yes** | List of URLs. Each can be a webpage (HTML is parsed for images) or a direct image link. Mix freely. |
| `mode` | enum | No | `auto` (recommended, detects by extension), `page` (force HTML parse), or `direct` (force image URL). |
| `includeSrcset` | boolean | No | Discover images from `srcset`, `picture>source`, and lazy `data-src`. Default `true`. |
| `includeOgTags` | boolean | No | Discover Open Graph and Twitter Card images. Default `true`. |
| `minWidth` | integer | No | Skip images narrower than this. Default 0. |
| `minHeight` | integer | No | Skip images shorter than this. Default 0. |
| `minSizeBytes` | integer | No | Skip images smaller than this. Filters tracking pixels. Default 0. |
| `maxImagesPerUrl` | integer | No | Cap images per source URL. Default 1000. |
| `maxUrls` | integer | No | Cap total URLs processed. Default 10000. |
| `dedupByHash` | boolean | No | Compute SHA-256 of each image body and skip duplicates. Default `true`. |
| `stripExif` | boolean | No | Re-encode JPEGs without EXIF metadata. Default `false`. |
| `convertFormat` | enum | No | `none`, `webp-to-png`, or `png-to-jpg`. Default `none`. |
| `filenamePattern` | string | No | Templated filename using `{slug}`, `{hash}`, `{ext}`, `{idx}`, `{source}`. Default `{slug}-{hash}.{ext}`. |
| `outputFormat` | array | No | `dataset` (always), `kv-store` (binaries), `zip` (single archive), `zipPerUrl` (one ZIP per source), `s3` (upload to bucket), `webhook` (POST summary on completion). |
| `s3Bucket` | string | No | Required when `outputFormat` includes `s3`. Uses standard `AWS_*` env vars for credentials. |
| `webhookUrl` | string | No | URL to receive a JSON run summary on completion. |
| `maxConcurrency` | integer | No | Max parallel image downloads. Default 10. |
| `downloadTimeoutMs` | integer | No | Per-image fetch timeout. Default 15000. |
| `imageCheckMaxRetries` | integer | No | Retries per failed image. Default 3. |
| `proxyConfiguration` | object | No | Optional proxy. Default off. Use residential if source sites are hotlink-protected. |
| `failFast` | boolean | No | Stop on first error. Default `false`. |
| `debugLogging` | boolean | No | Verbose per-image tracing. Default `false`. |

---

### Output

The Actor pushes one row to the dataset per downloaded image. Binaries are written to the default key-value store under `IMAGES/{filename}`. Use the dataset's `kv_url` column to download each binary.

```json
{
  "filename": "picsum-photos-800x600-a1b2c3d4e5f67890.jpg",
  "source_url": "https://example.com/gallery",
  "image_url": "https://picsum.photos/800/600.jpg",
  "kv_store_key": "IMG-picsum-photos-800x600-a1b2c3d4e5f67890.jpg",
  "kv_url": "https://api.apify.com/v2/key-value-stores/abc/records/IMG-picsum-photos-800x600-a1b2c3d4e5f67890.jpg",
  "content_type": "image/jpeg",
  "size_bytes": 54321,
  "width": 800,
  "height": 600,
  "format": "jpeg",
  "sha256": "a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456",
  "is_duplicate": false,
  "exif_stripped": false,
  "from_srcset": true,
  "from_picture_source": false,
  "from_og_tag": false,
  "from_twitter_tag": false,
  "from_data_attr": false,
  "from_direct_url": false,
  "downloaded_at": "2026-06-20T12:34:56.000Z",
  "duration_ms": 423,
  "http_status": 200,
  "error": null
}
````

You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.

***

### Output data fields

| Field | Description |
| --- | --- |
| `filename` | Final filename (per `filenamePattern`). |
| `source_url` | The page URL the image was discovered on (or its direct URL). |
| `image_url` | Final resolved image URL (after srcset expansion, redirects). |
| `kv_store_key` | Key in the run's key-value store (`IMG-...`). |
| `kv_url` | Signed download URL for the binary (24-hour default). |
| `content_type` | MIME type (e.g. `image/jpeg`, `image/webp`). |
| `size_bytes` | Downloaded size. |
| `width` | Image width in pixels (from sharp metadata). |
| `height` | Image height in pixels (from sharp metadata). |
| `format` | Normalized format: `jpeg`, `png`, `webp`, `gif`, `svg`, `avif`, `bmp`, `ico`, `other`. |
| `sha256` | Content hash (when `dedupByHash=true`). |
| `is_duplicate` | True if hash matched a previously-seen image in this run. |
| `exif_stripped` | True if JPEG was re-encoded to remove EXIF. |
| `from_srcset` | True if discovered via `srcset` / `picture` / `data-srcset`. |
| `from_picture_source` | True if discovered via `<picture><source>`. |
| `from_og_tag` | True if discovered via `<meta og:image>`. |
| `from_twitter_tag` | True if discovered via `<meta twitter:image>`. |
| `from_data_attr` | True if discovered via lazy `data-src` / `data-srcset`. |
| `from_direct_url` | True if the URL was treated as a direct image (mode=direct/auto). |
| `downloaded_at` | ISO timestamp of the download. |
| `duration_ms` | Time to fetch + process. |
| `http_status` | HTTP response code (0 on network error). |
| `error` | Per-image error string (`404`, `timeout`, `below-min-size-N`, etc.) or `null`. |

***

### Pricing

**$2.00 per 1,000 results.** The first 50 results of every run are free. There is no monthly fee and no proxy surcharge.

| Volume | What you pay |
| :--- | :---: |
| 50 images (free trial) | $0.00 |
| 1,000 images | $2.00 |
| 10,000 images | $20.00 |
| 100,000 images | $200.00 |

For comparison, the next-most-popular bulk image downloader on the Store (`onescales/bulk-image-downloader`) charges $7.00 per 1,000 URLs and only ships image bytes (no width, no height, no hash, no format). We charge 70% less and ship the richest schema in the field.

For scheduled or standby runs, pricing drops to **$1.00 per 1,000 results** (50% off). Volume runs of more than 50,000 images are eligible for **$1.50 per 1,000**.

***

### Tips and advanced options

- **Set `includeSrcset` to false** if you only want the page's primary images. This skips lazy `data-src` and responsive variants, which is faster on heavy pages.
- **Use `minSizeBytes` to filter tracking pixels.** A typical tracking pixel is under 1KB. Set `minSizeBytes: 2000` to skip them.
- **Use `minWidth` and `minHeight` to focus on useful images.** Set `minWidth: 400` to skip thumbnails and avatars.
- **Pick the right output mode.** `zip` for a single archive, `zipPerUrl` to keep source pages separated, `s3` to push directly to your training bucket.
- **Pair with a catalog scraper.** Run one of our catalog scrapers (REI, IndiaMART, eBay) first, then feed the image URLs to this Actor for a complete e-commerce dataset.
- **Schedule weekly runs** to refresh your image corpus. Most product catalogs update slowly; daily is overkill.
- **Use SHA-256 dedup across runs.** Hashes are stable, so a daily run that re-discovers the same images will mark them as `is_duplicate: true` and skip the KV write.

***

### FAQ

**Is this Actor legal to use?**
The Actor downloads images that are publicly accessible. You are responsible for ensuring your use case complies with the source site's Terms of Service and applicable copyright laws. Do not use the Actor to bypass access controls, scrape private content, or violate copyright.

**Why does it work on any site?**
The Actor is generic. It fetches the URL you give it, parses the HTML for image tags, and downloads the images it finds. There is no per-site configuration.

**Does it execute JavaScript?**
No. Single-page apps that render images via React/Vue hydration will return an empty image list. If your target site is a SPA, use a Playwright-based scraper first to get the image URLs, then pass them to this Actor with `mode: 'direct'`.

**Do I need a proxy?**
No. Most public sites serve images to any client. Default `useApifyProxy: false` works perfectly. If your source site is hotlink-protected, set residential proxy as an opt-in via the `proxyConfiguration` field.

**What is the largest image it can handle?**
Sharp auto-streams, so peak memory is around 5x the size of the largest single image. A 50MB image is fine. A 500MB image may cause memory pressure on smaller container sizes.

**Does the EXIF strip work on PNG or WebP?**
No, EXIF strip is JPEG-only. PNG metadata stripping is a v2 feature.

**How does the free trial work?**
Every new Apify user gets $5 of platform credit. That is enough to run this Actor many times. The first 50 results of every run are free, so you can evaluate the data quality before spending anything.

**Can I get a single ZIP of all images?**
Yes. Set `outputFormat: ['dataset', 'kv-store', 'zip']`. The ZIP is written to `OUT-images.zip` and is also linked in the dataset summary.

**Can I push directly to S3?**
Yes. Set `outputFormat: ['dataset', 's3']`, fill in `s3Bucket`, and set `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and `AWS_REGION` as Apify Secrets. Each image uploads to `s3://{bucket}/images/{filename}`.

**Can I get a webhook on completion?**
Yes. Set `outputFormat: ['dataset', 'webhook']` and fill in `webhookUrl`. The Actor POSTs a JSON summary with run stats (counts, errors, total size) to the URL when the run finishes.

***

### Disclaimers and support

- **Disclaimer**: This Actor retrieves publicly accessible images. Make sure your usage complies with the source site's terms of service and applicable copyright laws. The Actor is a generic utility and does not bypass authentication, paywalls, or access controls.
- **Support**: Open an issue from the [Issues tab](https://apify.com/actorstack/bulk-image-downloader/issues) for bug reports or feature requests. Custom scrapers and integration help are available on request.

# Actor input Schema

## `urls` (type: `array`):

List of URLs. Each can be a webpage (the Actor discovers all images on the page via HTML/srcset/og:image/twitter:image) or a direct image URL. Mixed lists allowed.

## `mode` (type: `string`):

'auto' detects by URL extension, 'page' forces HTML parsing, 'direct' treats every URL as an image.

## `includeSrcset` (type: `boolean`):

Discover images from <img srcset>, <picture><source>, and lazy-loaded data-src/data-srcset attributes. Picks the highest-resolution variant by default.

## `includeOgTags` (type: `boolean`):

Discover Open Graph and Twitter Card images from page <meta> tags. Often the best cover image.

## `minWidth` (type: `integer`):

Skip images narrower than this. 0 = no minimum. Filters tracking pixels and thumbnails.

## `minHeight` (type: `integer`):

Skip images shorter than this. 0 = no minimum.

## `minSizeBytes` (type: `integer`):

Skip images smaller than this. Filters tracking pixels and 1x1 placeholders. 0 = no minimum.

## `maxImagesPerUrl` (type: `integer`):

Safety cap on images discovered per source URL. Prevents runaway runs on heavy pages.

## `maxUrls` (type: `integer`):

Cap on total URLs processed.

## `dedupByHash` (type: `boolean`):

Compute SHA-256 of each image body and skip duplicates. Reports is\_duplicate=true on the dataset.

## `stripExif` (type: `boolean`):

Re-encode JPEGs without EXIF (camera model, GPS, timestamps). Privacy-friendly. Slightly lossy.

## `convertFormat` (type: `string`):

Optional format conversion. webp-to-png is the most common (broader compatibility).

## `filenamePattern` (type: `string`):

Templated filename. Tokens: {slug} (sanitized URL slug), {hash} (SHA-256 prefix), {ext} (extension), {idx} (index), {source} (source domain).

## `outputFormat` (type: `array`):

Always emits the structured dataset. Optionally also: ZIP archive (single or per-URL), KV store binaries, S3 upload, or webhook on completion. Valid values: dataset, kv-store, zip, zipPerUrl, s3, webhook.

## `s3Bucket` (type: `string`):

Required when outputFormat includes 's3'. Use AWS\_ACCESS\_KEY\_ID, AWS\_SECRET\_ACCESS\_KEY, and AWS\_REGION env vars (set via Apify Secrets).

## `webhookUrl` (type: `string`):

POST a completion event with the run summary to this URL on run completion.

## `maxConcurrency` (type: `integer`):

Max parallel image downloads. Default 10.

## `downloadTimeoutMs` (type: `integer`):

Per-image fetch timeout in milliseconds.

## `imageCheckMaxRetries` (type: `integer`):

Retries for failed image downloads (5xx, timeouts, transient network errors).

## `proxyConfiguration` (type: `object`):

OPTIONAL. Default is no proxy. Set residential proxy if source sites are hotlink-protected or geo-fenced.

## `failFast` (type: `boolean`):

Stop the run on the first error. Default false (continue and report per-image errors).

## `debugLogging` (type: `boolean`):

Print per-image tracing tags (FETCH, PARSE, HASH, WRITE) to run logs.

## Actor input object example

```json
{
  "urls": [
    {
      "url": "https://apify.com"
    },
    {
      "url": "https://picsum.photos/800/600.jpg"
    }
  ],
  "mode": "auto",
  "includeSrcset": true,
  "includeOgTags": true,
  "minWidth": 0,
  "minHeight": 0,
  "minSizeBytes": 0,
  "maxImagesPerUrl": 1000,
  "maxUrls": 10000,
  "dedupByHash": true,
  "stripExif": false,
  "convertFormat": "none",
  "filenamePattern": "{slug}-{hash}.{ext}",
  "outputFormat": [
    "dataset",
    "kv-store"
  ],
  "s3Bucket": "",
  "webhookUrl": "",
  "maxConcurrency": 10,
  "downloadTimeoutMs": 15000,
  "imageCheckMaxRetries": 3,
  "proxyConfiguration": {
    "useApifyProxy": false
  },
  "failFast": false,
  "debugLogging": false
}
```

# Actor output Schema

## `results` (type: `string`):

One row per image with width, height, format, content hash, provenance (srcset/og:image/direct), and per-image error if any.

## `keyValueStore` (type: `string`):

Each image body stored under IMG-{filename}. Use kv\_url from each dataset row to download. 7-day default retention.

## `zip` (type: `string`):

All images packaged into a single ZIP. Keyed at OUT-images.zip.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        {
            "url": "https://apify.com"
        },
        {
            "url": "https://picsum.photos/800/600.jpg"
        }
    ],
    "includeSrcset": true,
    "includeOgTags": true,
    "minWidth": 0,
    "minHeight": 0,
    "minSizeBytes": 0,
    "maxImagesPerUrl": 1000,
    "maxUrls": 10000,
    "dedupByHash": true,
    "stripExif": false,
    "filenamePattern": "{slug}-{hash}.{ext}",
    "outputFormat": [
        "dataset",
        "kv-store"
    ],
    "s3Bucket": "",
    "webhookUrl": "",
    "maxConcurrency": 10,
    "downloadTimeoutMs": 15000,
    "imageCheckMaxRetries": 3,
    "proxyConfiguration": {
        "useApifyProxy": false
    },
    "failFast": false,
    "debugLogging": false
};

// Run the Actor and wait for it to finish
const run = await client.actor("getascraper/bulk-image-downloader").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "urls": [
        { "url": "https://apify.com" },
        { "url": "https://picsum.photos/800/600.jpg" },
    ],
    "includeSrcset": True,
    "includeOgTags": True,
    "minWidth": 0,
    "minHeight": 0,
    "minSizeBytes": 0,
    "maxImagesPerUrl": 1000,
    "maxUrls": 10000,
    "dedupByHash": True,
    "stripExif": False,
    "filenamePattern": "{slug}-{hash}.{ext}",
    "outputFormat": [
        "dataset",
        "kv-store",
    ],
    "s3Bucket": "",
    "webhookUrl": "",
    "maxConcurrency": 10,
    "downloadTimeoutMs": 15000,
    "imageCheckMaxRetries": 3,
    "proxyConfiguration": { "useApifyProxy": False },
    "failFast": False,
    "debugLogging": False,
}

# Run the Actor and wait for it to finish
run = client.actor("getascraper/bulk-image-downloader").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    {
      "url": "https://apify.com"
    },
    {
      "url": "https://picsum.photos/800/600.jpg"
    }
  ],
  "includeSrcset": true,
  "includeOgTags": true,
  "minWidth": 0,
  "minHeight": 0,
  "minSizeBytes": 0,
  "maxImagesPerUrl": 1000,
  "maxUrls": 10000,
  "dedupByHash": true,
  "stripExif": false,
  "filenamePattern": "{slug}-{hash}.{ext}",
  "outputFormat": [
    "dataset",
    "kv-store"
  ],
  "s3Bucket": "",
  "webhookUrl": "",
  "maxConcurrency": 10,
  "downloadTimeoutMs": 15000,
  "imageCheckMaxRetries": 3,
  "proxyConfiguration": {
    "useApifyProxy": false
  },
  "failFast": false,
  "debugLogging": false
}' |
apify call getascraper/bulk-image-downloader --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=getascraper/bulk-image-downloader",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Bulk Image Downloader: 22-Field Metadata, SHA-256 & ZIP",
        "description": "Download every image from any webpage or direct image URL. Smart srcset picks the highest-resolution variant. 22 metadata fields per image: width, height, format, SHA-256, dedup flag, EXIF, provenance. ZIP and S3 outputs, webhooks, MCP-ready. $2.00 per 1k.",
        "version": "0.1",
        "x-build-id": "Va0iPAKefLSr4KEdm"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/getascraper~bulk-image-downloader/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-getascraper-bulk-image-downloader",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/getascraper~bulk-image-downloader/runs": {
            "post": {
                "operationId": "runs-sync-getascraper-bulk-image-downloader",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/getascraper~bulk-image-downloader/run-sync": {
            "post": {
                "operationId": "run-sync-getascraper-bulk-image-downloader",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "urls"
                ],
                "properties": {
                    "urls": {
                        "title": "URLs to Process",
                        "type": "array",
                        "description": "List of URLs. Each can be a webpage (the Actor discovers all images on the page via HTML/srcset/og:image/twitter:image) or a direct image URL. Mixed lists allowed.",
                        "default": [
                            {
                                "url": "https://apify.com"
                            },
                            {
                                "url": "https://picsum.photos/800/600.jpg"
                            }
                        ],
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "mode": {
                        "title": "URL Mode",
                        "enum": [
                            "auto",
                            "page",
                            "direct"
                        ],
                        "type": "string",
                        "description": "'auto' detects by URL extension, 'page' forces HTML parsing, 'direct' treats every URL as an image.",
                        "default": "auto"
                    },
                    "includeSrcset": {
                        "title": "Include srcset / picture",
                        "type": "boolean",
                        "description": "Discover images from <img srcset>, <picture><source>, and lazy-loaded data-src/data-srcset attributes. Picks the highest-resolution variant by default.",
                        "default": true
                    },
                    "includeOgTags": {
                        "title": "Include og:image / twitter:image",
                        "type": "boolean",
                        "description": "Discover Open Graph and Twitter Card images from page <meta> tags. Often the best cover image.",
                        "default": true
                    },
                    "minWidth": {
                        "title": "Min Width (px)",
                        "minimum": 0,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Skip images narrower than this. 0 = no minimum. Filters tracking pixels and thumbnails.",
                        "default": 0
                    },
                    "minHeight": {
                        "title": "Min Height (px)",
                        "minimum": 0,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Skip images shorter than this. 0 = no minimum.",
                        "default": 0
                    },
                    "minSizeBytes": {
                        "title": "Min Size (bytes)",
                        "minimum": 0,
                        "maximum": 10485760,
                        "type": "integer",
                        "description": "Skip images smaller than this. Filters tracking pixels and 1x1 placeholders. 0 = no minimum.",
                        "default": 0
                    },
                    "maxImagesPerUrl": {
                        "title": "Max Images per URL",
                        "minimum": 1,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Safety cap on images discovered per source URL. Prevents runaway runs on heavy pages.",
                        "default": 1000
                    },
                    "maxUrls": {
                        "title": "Max URLs to Process",
                        "minimum": 1,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Cap on total URLs processed.",
                        "default": 10000
                    },
                    "dedupByHash": {
                        "title": "Deduplicate by Content Hash",
                        "type": "boolean",
                        "description": "Compute SHA-256 of each image body and skip duplicates. Reports is_duplicate=true on the dataset.",
                        "default": true
                    },
                    "stripExif": {
                        "title": "Strip EXIF Metadata (JPEG)",
                        "type": "boolean",
                        "description": "Re-encode JPEGs without EXIF (camera model, GPS, timestamps). Privacy-friendly. Slightly lossy.",
                        "default": false
                    },
                    "convertFormat": {
                        "title": "Format Conversion",
                        "enum": [
                            "none",
                            "webp-to-png",
                            "png-to-jpg"
                        ],
                        "type": "string",
                        "description": "Optional format conversion. webp-to-png is the most common (broader compatibility).",
                        "default": "none"
                    },
                    "filenamePattern": {
                        "title": "Filename Pattern",
                        "type": "string",
                        "description": "Templated filename. Tokens: {slug} (sanitized URL slug), {hash} (SHA-256 prefix), {ext} (extension), {idx} (index), {source} (source domain).",
                        "default": "{slug}-{hash}.{ext}"
                    },
                    "outputFormat": {
                        "title": "Output Format",
                        "type": "array",
                        "description": "Always emits the structured dataset. Optionally also: ZIP archive (single or per-URL), KV store binaries, S3 upload, or webhook on completion. Valid values: dataset, kv-store, zip, zipPerUrl, s3, webhook.",
                        "default": [
                            "dataset",
                            "kv-store"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "s3Bucket": {
                        "title": "S3 Bucket Name",
                        "type": "string",
                        "description": "Required when outputFormat includes 's3'. Use AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION env vars (set via Apify Secrets).",
                        "default": ""
                    },
                    "webhookUrl": {
                        "title": "Webhook URL",
                        "type": "string",
                        "description": "POST a completion event with the run summary to this URL on run completion.",
                        "default": ""
                    },
                    "maxConcurrency": {
                        "title": "Max Concurrency",
                        "minimum": 1,
                        "maximum": 50,
                        "type": "integer",
                        "description": "Max parallel image downloads. Default 10.",
                        "default": 10
                    },
                    "downloadTimeoutMs": {
                        "title": "Download Timeout (ms)",
                        "minimum": 1000,
                        "maximum": 120000,
                        "type": "integer",
                        "description": "Per-image fetch timeout in milliseconds.",
                        "default": 15000
                    },
                    "imageCheckMaxRetries": {
                        "title": "Max Retries per Image",
                        "minimum": 0,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Retries for failed image downloads (5xx, timeouts, transient network errors).",
                        "default": 3
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "OPTIONAL. Default is no proxy. Set residential proxy if source sites are hotlink-protected or geo-fenced.",
                        "default": {
                            "useApifyProxy": false
                        }
                    },
                    "failFast": {
                        "title": "Fail Fast",
                        "type": "boolean",
                        "description": "Stop the run on the first error. Default false (continue and report per-image errors).",
                        "default": false
                    },
                    "debugLogging": {
                        "title": "Verbose Debug Logs",
                        "type": "boolean",
                        "description": "Print per-image tracing tags (FETCH, PARSE, HASH, WRITE) to run logs.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
