Example Process Crawl Results avatar
Example Process Crawl Results

Pricing

Pay per usage

Go to Store
Example Process Crawl Results

Example Process Crawl Results

Developed by

Apify

Apify

Maintained by Apify

Iterates through all results from a crawler run and count them. Needs to be called from the crawler's finish webhook by adding an URL to finish the webhook of your crawler. Use this actor as a starting point to develop custom post-processing of data from the crawler.

4.5 (2)

Pricing

Pay per usage

5

Total users

17

Monthly users

1

Last modified

a year ago

Dockerfile

# This is a template for a Dockerfile used to run acts in Actor system.
# The base image name below is set during the act build, based on user settings.
# IMPORTANT: The base image must set a correct working directory, such as /usr/src/app or /home/user
FROM apify/actor-node-basic:v0.21.10
# Second, copy just package.json and package-lock.json since it should be
# the only file that affects "npm install" in the next step, to speed up the build
COPY package*.json ./
# Install NPM packages, skip optional and development dependencies to
# keep the image small. Avoid logging too much and print the dependency
# tree for debugging
RUN npm --quiet set progress=false \
&& npm install --only=prod --no-optional \
&& echo "Installed NPM packages:" \
&& (npm list --all || true) \
&& echo "Node.js version:" \
&& node --version \
&& echo "NPM version:" \
&& npm --version
# Copy source code to container
# Do this in the last step, to have fast build if only the source code changed
COPY . ./
# NOTE: The CMD is already defined by the base image.
# Uncomment this for local node inspector debugging:
# CMD [ "node", "--inspect=0.0.0.0:9229", "main.js" ]

package.json

{
"name": "apify-project",
"version": "0.0.1",
"description": "",
"author": "It's not you it's me",
"license": "ISC",
"dependencies": {
"apify": "0.21.10",
"underscore": "latest"
},
"scripts": {
"start": "node main.js"
}
}

main.js

1const Apify = require('apify');
2const _ = require('underscore');
3
4Apify.main(async () => {
5 // Get act input and validate it
6 const input = await Apify.getValue('INPUT');
7 console.log('Input:')
8 console.dir(input);
9 if (!input || !input._id) {
10 throw new Error('Input is missing the "_id" attribute. Did you start it from crawler finish webhook?');
11 }
12 const executionId = input._id;
13
14 // Print info about crawler run
15 const crawlerRunDetails = await Apify.client.crawlers.getExecutionDetails({ executionId });
16 if (!crawlerRunDetails) {
17 throw new Error(`There is no crawler run with ID: "${executionId}"`);
18 }
19 console.log(`Details of the crawler run (ID: ${executionId}):`);
20 console.dir(crawlerRunDetails);
21
22 // Iterate through all crawler results and count them
23 // Here is the place where you can add something more adventurous :)
24 console.log(`Counting results from crawler run...`);
25
26 const limit = 100;
27 let offset = 0;
28 let totalItems = 0;
29 let results;
30
31 do {
32 results = await Apify.client.crawlers.getExecutionResults({
33 executionId,
34 limit,
35 offset
36 });
37
38 offset += results.count;
39 totalItems += results.items.length;
40 } while (results.count > 0);
41
42 // Save results
43 console.log(`Found ${totalItems} records`);
44 await Apify.setValue('OUTPUT', {
45 crawlerRunDetails,
46 totalItems
47 });
48
49});