Pricing

Pay per usage

Try for free

Go to Apify Store

Example Process Crawl Results

Try for free

Developed by

Apify

Iterates through all results from a crawler run and count them. Needs to be called from the crawler's finish webhook by adding an URL to finish the webhook of your crawler. Use this actor as a starting point to develop custom post-processing of data from the crawler.

4.5 (2)

Pricing

Pay per usage

Last modified

a year ago

Developer tools

Open source

Dockerfile

# This is a template for a Dockerfile used to run acts in Actor system.
# The base image name below is set during the act build, based on user settings.
# IMPORTANT: The base image must set a correct working directory, such as /usr/src/app or /home/user
FROM apify/actor-node-basic:v0.21.10

# Second, copy just package.json and package-lock.json since it should be
# the only file that affects "npm install" in the next step, to speed up the build
COPY package*.json ./

# Install NPM packages, skip optional and development dependencies to
# keep the image small. Avoid logging too much and print the dependency
# tree for debugging
RUN npm --quiet set progress=false \
 && npm install --only=prod --no-optional \
 && echo "Installed NPM packages:" \
 && (npm list --all || true) \
 && echo "Node.js version:" \
 && node --version \
 && echo "NPM version:" \
 && npm --version

# Copy source code to container
# Do this in the last step, to have fast build if only the source code changed
COPY  . ./

# NOTE: The CMD is already defined by the base image.
# Uncomment this for local node inspector debugging:
# CMD [ "node", "--inspect=0.0.0.0:9229", "main.js" ]

package.json

{
    "name": "apify-project",
    "version": "0.0.1",
    "description": "",
    "author": "It's not you it's me",
    "license": "ISC",
    "dependencies": {
        "apify": "0.21.10",
        "underscore": "latest"
    },
    "scripts": {
        "start": "node main.js"
    }
}

main.js

1const Apify = require('apify');
2const _ = require('underscore');
3
4Apify.main(async () => {
5    // Get act input and validate it
6    const input = await Apify.getValue('INPUT');
7    console.log('Input:')
8    console.dir(input);
9    if (!input || !input._id) {
10        throw new Error('Input is missing the "_id" attribute. Did you start it from crawler finish webhook?');
11    }
12    const executionId = input._id;
13    
14    // Print info about crawler run
15    const crawlerRunDetails = await Apify.client.crawlers.getExecutionDetails({ executionId });
16    if (!crawlerRunDetails) {
17        throw new Error(`There is no crawler run with ID: "${executionId}"`);
18    }
19    console.log(`Details of the crawler run (ID: ${executionId}):`);
20    console.dir(crawlerRunDetails);
21    
22    // Iterate through all crawler results and count them
23    // Here is the place where you can add something more adventurous :)
24    console.log(`Counting results from crawler run...`);
25    
26    const limit = 100;
27    let offset = 0;
28    let totalItems = 0;
29    let results;
30    
31    do {
32        results = await Apify.client.crawlers.getExecutionResults({ 
33            executionId,
34            limit,
35            offset
36        });
37        
38        offset += results.count;
39        totalItems += results.items.length;
40    } while (results.count > 0);
41    
42    // Save results
43    console.log(`Found ${totalItems} records`);
44    await Apify.setValue('OUTPUT', {
45        crawlerRunDetails,
46        totalItems
47    });
48    
49});

Send Legacy PhantomJS Crawler Results

drobnikj/send-crawler-results

This actor downloads results from Legacy PhantomJS Crawler task and sends them to email as attachments. It is designed to run from finish webhook.

Jakub Drobník

Web Crawler

rigelbytes/webcrawler

This web crawler is designed to provide users with complete flexibility by allowing them to use their **own proxies**. The scraper collects all pages from the website and returns extracts the **MetaData**, **Title**, and **Content** of the page in MarkDown.

Rigel Bytes

Bandcamp Crawler

service-paradis/bandcamp-crawler

The Bandcamp.com crawler is a web scraping tool that allows you to extract data from the Bandcamp music platform. With this crawler, you can get information about albums, tracks, and much more. The crawler is built on top of Apify SDK, and you can run it both on the Apify platform and locally.

Alexandre Paradis

Forward dataset as POST data

anchor/forward-dataset-webhook

This actor forwards the results of an Actor to an endpoint, instead of having to fetch the results manually. It will download the dataset and attach it to the body of a POST request you will specify. It acts as a new webhook. Simplify your Actor process !!!

Anchor

Kijiji Crawler

service-paradis/kijiji-crawler

The Kijiji crawler is a web scraping tool that allows you to extract data from the Kijiji selling platform. With this crawler, you can get information search results and local advertisement details.

Alexandre Paradis

Google Maps Crawler

netleo/google-maps-crawler

Google Maps Crawler extracts data about places and reviews from Google Maps. The actor can be used to find related places and reviews for a given starting place.

netleo

1.0

Facebook Post Scraper : Webhook Support

devninja/facebook-post-scraper-webhook-support

This tool helps you collect Facebook posts quickly. You can set it to run automatically (every hour, day, or month) through Apify Schedule, and it will send the collected data straight to your server using a webhook—no need to check manually.

Devinja

5.0

Ulta Review Crawler

webscrapewizard/my-actor

WebScraperWizard

YouTube Search Term Channel Crawler

tcz/youtube-search-term-channel-crawler

You can use this crawler to define a list of search terms, crawl Youtube search results for those terms, and extract channel information, including email if written in the channel description.

Zoltan

514

Scraper Results Checker

drobnikj/check-crawler-results

This actor checks results from Apify's scrapers or any other actor that stores its result to a dataset, and sends a notification if there are errors. It's designed to run from webhook.

Jakub Drobník