Example Process Crawl Results avatar
Example Process Crawl Results
Try for free

No credit card required

View all Actors
Example Process Crawl Results

Example Process Crawl Results

apify/example-process-crawl-results
Try for free

No credit card required

Iterates through all results from a crawler run and count them. Needs to be called from the crawler's finish webhook by adding an URL to finish the webhook of your crawler. Use this actor as a starting point to develop custom post-processing of data from the crawler.

Dockerfile

1# This is a template for a Dockerfile used to run acts in Actor system.
2# The base image name below is set during the act build, based on user settings.
3# IMPORTANT: The base image must set a correct working directory, such as /usr/src/app or /home/user
4FROM apify/actor-node-basic:v0.21.10
5
6# Second, copy just package.json and package-lock.json since it should be
7# the only file that affects "npm install" in the next step, to speed up the build
8COPY package*.json ./
9
10# Install NPM packages, skip optional and development dependencies to
11# keep the image small. Avoid logging too much and print the dependency
12# tree for debugging
13RUN npm --quiet set progress=false \
14 && npm install --only=prod --no-optional \
15 && echo "Installed NPM packages:" \
16 && (npm list --all || true) \
17 && echo "Node.js version:" \
18 && node --version \
19 && echo "NPM version:" \
20 && npm --version
21
22# Copy source code to container
23# Do this in the last step, to have fast build if only the source code changed
24COPY  . ./
25
26# NOTE: The CMD is already defined by the base image.
27# Uncomment this for local node inspector debugging:
28# CMD [ "node", "--inspect=0.0.0.0:9229", "main.js" ]

package.json

1{
2    "name": "apify-project",
3    "version": "0.0.1",
4    "description": "",
5    "author": "It's not you it's me",
6    "license": "ISC",
7    "dependencies": {
8        "apify": "0.21.10",
9        "underscore": "latest"
10    },
11    "scripts": {
12        "start": "node main.js"
13    }
14}

main.js

1const Apify = require('apify');
2const _ = require('underscore');
3
4Apify.main(async () => {
5    // Get act input and validate it
6    const input = await Apify.getValue('INPUT');
7    console.log('Input:')
8    console.dir(input);
9    if (!input || !input._id) {
10        throw new Error('Input is missing the "_id" attribute. Did you start it from crawler finish webhook?');
11    }
12    const executionId = input._id;
13    
14    // Print info about crawler run
15    const crawlerRunDetails = await Apify.client.crawlers.getExecutionDetails({ executionId });
16    if (!crawlerRunDetails) {
17        throw new Error(`There is no crawler run with ID: "${executionId}"`);
18    }
19    console.log(`Details of the crawler run (ID: ${executionId}):`);
20    console.dir(crawlerRunDetails);
21    
22    // Iterate through all crawler results and count them
23    // Here is the place where you can add something more adventurous :)
24    console.log(`Counting results from crawler run...`);
25    
26    const limit = 100;
27    let offset = 0;
28    let totalItems = 0;
29    let results;
30    
31    do {
32        results = await Apify.client.crawlers.getExecutionResults({ 
33            executionId,
34            limit,
35            offset
36        });
37        
38        offset += results.count;
39        totalItems += results.items.length;
40    } while (results.count > 0);
41    
42    // Save results
43    console.log(`Found ${totalItems} records`);
44    await Apify.setValue('OUTPUT', {
45        crawlerRunDetails,
46        totalItems
47    });
48    
49});
Developer
Maintained by Apify
Actor metrics
  • 2 monthly users
  • 0.0 days response time
  • Created in Nov 2017
  • Modified over 1 year ago