Dataset Image Downloader & Uploader

  • lukaskrivka/images-download-upload
  • Modified
  • Users 155
  • Runs 12.9k
  • Created by Author's avatarLukáš Křivka

Download image files from image URLs in your datasets and save them to a Zip file, Key-Value store, or directly your AWS S3 bucket.

Dataset Image Downloader & Uploader

To run the code examples, you need to have an Apify account. Replace <YOUR_API_TOKEN> in the code with your API token. For a more detailed explanation, please read about running actors via the API in Apify Docs.

import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with API token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare actor input
const input = {
    "pathToImageUrls": "images",
    "fileNameFunction": ({url, md5}) => md5(url),
    "uploadTo": "zip-file",
    "preDownloadFunction": `/* Example: We get rid of the items with price 0
        ({ data }) => data.filter((item) => item.price > 0)
        */`,
    "postDownloadFunction": `/* Example: We remove items without any successfully uploaded images.
         We also remove any image URLs that were not uploaded
         
         ({ data, state }) => {
            return data.reduce((newData, item) => {
                const downloadedImages = item.images.filter((imageUrl) => {
                    return state[imageUrl] && state[imageUrl].imageUploaded;
                });
                
                if (downloadedImages.length === 0) {
                    return newData;
                }
                
                return newData.concat({ ...item, images: downloadedImages });
            }, []);
        }
        */`,
    "imageCheckType": "content-type",
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

(async () => {
    // Run the actor and wait for it to finish
    const run = await client.actor("lukaskrivka/images-download-upload").call(input);

    // Fetch and print actor results from the run's dataset (if any)
    console.log('Results from dataset');
    const { items } = await client.dataset(run.defaultDatasetId).listItems();
    items.forEach((item) => {
        console.dir(item);
    });
})();