
Example Process Crawl Results
Pricing
Pay per usage
Go to Store

Example Process Crawl Results
Iterates through all results from a crawler run and count them. Needs to be called from the crawler's finish webhook by adding an URL to finish the webhook of your crawler. Use this actor as a starting point to develop custom post-processing of data from the crawler.
4.5 (2)
Pricing
Pay per usage
5
Total users
17
Monthly users
1
Last modified
a year ago
Dockerfile
# This is a template for a Dockerfile used to run acts in Actor system.# The base image name below is set during the act build, based on user settings.# IMPORTANT: The base image must set a correct working directory, such as /usr/src/app or /home/userFROM apify/actor-node-basic:v0.21.10
# Second, copy just package.json and package-lock.json since it should be# the only file that affects "npm install" in the next step, to speed up the buildCOPY package*.json ./
# Install NPM packages, skip optional and development dependencies to# keep the image small. Avoid logging too much and print the dependency# tree for debuggingRUN npm --quiet set progress=false \ && npm install --only=prod --no-optional \ && echo "Installed NPM packages:" \ && (npm list --all || true) \ && echo "Node.js version:" \ && node --version \ && echo "NPM version:" \ && npm --version
# Copy source code to container# Do this in the last step, to have fast build if only the source code changedCOPY . ./
# NOTE: The CMD is already defined by the base image.# Uncomment this for local node inspector debugging:# CMD [ "node", "--inspect=0.0.0.0:9229", "main.js" ]
package.json
{ "name": "apify-project", "version": "0.0.1", "description": "", "author": "It's not you it's me", "license": "ISC", "dependencies": { "apify": "0.21.10", "underscore": "latest" }, "scripts": { "start": "node main.js" }}
main.js
1const Apify = require('apify');2const _ = require('underscore');3
4Apify.main(async () => {5 // Get act input and validate it6 const input = await Apify.getValue('INPUT');7 console.log('Input:')8 console.dir(input);9 if (!input || !input._id) {10 throw new Error('Input is missing the "_id" attribute. Did you start it from crawler finish webhook?');11 }12 const executionId = input._id;13 14 // Print info about crawler run15 const crawlerRunDetails = await Apify.client.crawlers.getExecutionDetails({ executionId });16 if (!crawlerRunDetails) {17 throw new Error(`There is no crawler run with ID: "${executionId}"`);18 }19 console.log(`Details of the crawler run (ID: ${executionId}):`);20 console.dir(crawlerRunDetails);21 22 // Iterate through all crawler results and count them23 // Here is the place where you can add something more adventurous :)24 console.log(`Counting results from crawler run...`);25 26 const limit = 100;27 let offset = 0;28 let totalItems = 0;29 let results;30 31 do {32 results = await Apify.client.crawlers.getExecutionResults({ 33 executionId,34 limit,35 offset36 });37 38 offset += results.count;39 totalItems += results.items.length;40 } while (results.count > 0);41 42 // Save results43 console.log(`Found ${totalItems} records`);44 await Apify.setValue('OUTPUT', {45 crawlerRunDetails,46 totalItems47 });48 49});