Icon Burglar avatar

Icon Burglar

Try for free

No credit card required

View all Actors
Icon Burglar

Icon Burglar

yuri/icon-burglar
Try for free

No credit card required

This act takes the URLs prepared by a crawler and downloads the images to finally zip them

Dockerfile

1# This is a template for a Dockerfile used to run acts in Actor system.
2# The base image name below is set during the act build, based on user settings.
3# IMPORTANT: The base image must set a correct working directory, such as /usr/src/app or /home/user
4FROM apify/actor-node-basic:v0.21.10
5
6# Second, copy just package.json and package-lock.json since it should be
7# the only file that affects "npm install" in the next step, to speed up the build
8COPY package*.json ./
9
10# Install NPM packages, skip optional and development dependencies to
11# keep the image small. Avoid logging too much and print the dependency
12# tree for debugging
13RUN npm --quiet set progress=false \
14 && npm install --only=prod --no-optional \
15 && echo "Installed NPM packages:" \
16 && (npm list --all || true) \
17 && echo "Node.js version:" \
18 && node --version \
19 && echo "NPM version:" \
20 && npm --version
21
22# Copy source code to container
23# Do this in the last step, to have fast build if only the source code changed
24COPY  . ./
25
26# NOTE: The CMD is already defined by the base image.
27# Uncomment this for local node inspector debugging:
28# CMD [ "node", "--inspect=0.0.0.0:9229", "main.js" ]

package.json

1{
2    "name": "apify-project",
3    "version": "0.0.1",
4    "description": "",
5    "author": "It's not you it's me",
6    "license": "ISC",
7    "dependencies": {
8        "apify": "0.21.10",
9        "underscore": "latest",
10        "request-promise": "latest",
11        "bluebird": "latest"
12    },
13    "scripts": {
14        "start": "node main.js"
15    }
16}

main.js

1// Get all the libraries this crawler uses
2const Apify = require('apify');
3const _ = require('underscore');
4const rp = require('request-promise');
5const Promise = require('bluebird');
6
7Apify.main(async () => {
8    // Get act input and validate it
9    const input = await Apify.getValue('INPUT');
10    console.log('Input:')
11    console.dir(input);
12    if (!input || !input._id) {
13        throw new Error('Input is missing the "_id" attribute. Did you start it from crawler finish webhook?');
14    }
15    const executionId = input._id;
16    
17    // Print info about crawler run
18    const crawlerRunDetails = await Apify.client.crawlers.getExecutionDetails({ executionId });
19    if (!crawlerRunDetails) {
20        throw new Error(`There is no crawler run with ID: "${executionId}"`);
21    }
22    console.log(`Details of the crawler run (ID: ${executionId}):`);
23    console.dir(crawlerRunDetails);
24    
25    // Iterate through all crawler results and count them
26    // Here is the place where you can add something more adventurous :)
27    console.log(`Counting results from crawler run...`);
28    
29    const limit = 100;
30    let offset = 0;
31    let totalItems = 0;
32    let results;
33
34    results = await Apify.client.crawlers.getExecutionResults({ 
35        executionId,
36        limit,
37        offset
38    });
39    
40    // Prepare each result for downloading
41    await Promise.each(results.items[0].pageFunctionResult, async function(value, i){
42           
43           console.log('url', value.url)
44           
45           // Move each result in the key-value store
46           const file = await rp ({
47               url : value.url, encoding : null
48           });
49           
50           // Attach the right filename and encoding
51           await Apify.setValue(value.file+'.png', file, { contentType : 'image/png'})
52    });
53    
54    // Take the key-value store and zip data (this uses a different Actor (https://www.apify.com/jaroslavhejlek/zip-key-value-store))
55    const run = await Apify.call('jaroslavhejlek/zip-key-value-store', { "keyValueStoreId": process.env.APIFY_DEFAULT_KEY_VALUE_STORE_ID, "filesPerZipFile": 2000 });
56    console.dir(run);
57    
58});
Developer
Maintained by Community

Actor Metrics

  • 1 monthly user

  • 1 star

  • Created in May 2018

  • Modified 2 years ago

Categories