Dataset Image Downloader & Uploader

  • lukaskrivka/images-download-upload
  • Modified
  • Users 155
  • Runs 12.9k
  • Created by Author's avatarLukáš Křivka

Download image files from image URLs in your datasets and save them to a Zip file, Key-Value store, or directly your AWS S3 bucket.

Dataset Image Downloader & Uploader

Dataset Id

datasetId

Optional

string

Id of the dataset where the data are located. Image URLs will be extracted from there.

Items

items

Optional

array

Array of items that includes the image URLs.

Path to image URLs

pathToImageUrls

Optional

string

Path from item object to an array or string where the URL(s) is/are located. Provide in "javascript style", e.g. "details[0].images

Filename function

fileNameFunction

Optional

string

Function that specifies how will be image filename created from its URL. If you keep this empty, it will be md5 hash of the URL.

Limit

limit

Optional

integer

Max items to load from the dataset. Use with `offset` to paginate over the data (can reduce memory requirement of large loads).

Offset

offset

Optional

integer

How many items to skip from the dataset. Use with `limit` to paginate over the data (can reduce memory requirement of large loads)

Output to

outputTo

Optional

string

Where to save the data from input after possibly transforming them during the download process.

Options:

"no-output", "key-value-store", "dataset"

Output dataset Name or ID

outputDatasetId

Optional

string

Name or ID of the dataset where the data will be saved. Only relevant if you want to output to dataset!

Key Value store input

storeInput

Optional

string

If you want to input the data from key-value store instead of dataset. Notation: `storeId-recordKey`, e.g. - `kWdGzuXuKfYkrntWw-OUTPUT`

Upload to

uploadTo

Optional

string

Where do you want to upload the image files

Options:

"zip-file", "key-value-store", "s3", "no-upload"

Key-value store name

uploadStoreName

Optional

string

Key-value store name where the images will be upload. Empty field means it will be uploaded to the default key-value store

S3 Bucket

s3Bucket

Optional

string

Only relevant if you want to upload to S3! Name of the bucket where to upload.

S3 Access key id

s3AccessKeyId

Optional

string

Only relevant if you want to upload to S3! You can create these credentials for IAM user.

S3 Secret access key

s3SecretAccessKey

Optional

string

Only relevant if you want to upload to S3! You can create these credentials for IAM user.

Check if key is already on S3

s3CheckIfAlreadyThere

Optional

boolean

This option is useful if you don't want to rewrite the same image. GET requests are also cheaper than PUT requests

Pre-download function

preDownloadFunction

Optional

string

Function that specifies how will be the data transformed before downloading the image. The input and output of the function is the whole data array. You can skip downloading images of any item if you add skipItem: true field to it.

Post-download function

postDownloadFunction

Optional

string

Function that specifies how will be the data transformed before downloading the image. The input and output of the function is the whole data array. By default it adds either the file URL or errors array depending if the download was successfull.

Max retries

imageCheckMaxRetries

Optional

integer

How many times should actor retry if the file it tries to download fails to pass the tests. Setting this too high can lead to unecessary loops.

Image check type

imageCheckType

Optional

string

Type of the image check. If the image will not pass, the download will be retied with proxy and if that doesn't pass, the image is not uploaded.

Options:

"none", "content-type", "image-size"

Min size in KB

imageCheckMinSize

Optional

integer

Minimum size of the image to pass the image check test

Min width

imageCheckMinWidth

Optional

integer

Minimim width of the image in pixels to pass the image check. Works only if the image check type is 'jimp'.

Min height

imageCheckMinHeight

Optional

integer

Minimim height of the image in pixels to pass the image check. Works only if the image check type is 'jimp'.

Proxy configuration

proxyConfiguration

Optional

object

Select proxies to be used.

Max concurrency

maxConcurrency

Optional

integer

You can specify how many maximum parallel downloading/uploading requests will be running. Keep in mind that the limit is here to not overload the host server.

Download timeout in ms

downloadTimeout

Optional

integer

How long we will wait to download each image

Batch Size

batchSize

Optional

integer

Number of items loaded from dataset in one batch.

Convert webp to png

convertWebpToPng

Optional

boolean

If checked, the actor will automatically convert all webp type images to standard png. This increases the size of the image.

State fields

stateFields

Optional

array

You can specify fields that you want in your state to make it more readable and use less memory. By default it uses all.

Run without download

noDownloadRun

Optional

boolean

If checked, the actor will not download and upload the images. Usefull for checking duplicates or transformations.