Actor picture

Crawler Results Checker

drobnikj/check-crawler-results

This actor checks crawler/scraper default dataset items and sends a notification if finds some errors. It is designed to run from webhook.

Crawler results checker for Apify

This actor checks crawler/scraper default dataset items and sends a notification if finds some errors. It is designed to run from webhook.

Usage

Actor or Task webhook

You can set up webhook for Actor or Task with URL:

https://api.apify.com/v2/acts/drobnikj~check-crawler-results/runs?token=APIFY_API_TOKEN

Then you need to set up payload template for webhook data like:

{
    "actId": {{resource.actId}},
    "runId": {{resource.id}},
    "options": {
      "notifyTo": "jakub.drobnik@apify.com",
      "minOutputtedPages": 10
    }
}

Actor

You can call it from other Actor, for example:

await Apify.call('drobnikj/check-crawler-results', {
    actId: 's7Jj8ik07gfV',
    runId: 'sd86hGfHk0Uh6gF',
    options: {
        minOutputtedPages: 1000,
    }
});

Legacy Crawler DEPRECATED

For a specific crawler set the following parameters:

Finish webhook URL (finishWebhookUrl)

https://api.apify.com/v2/acts/drobnikj~check-crawler-results/runs?token=APIFY_API_TOKEN

Finish webhook data

You can set up fields from options to finish webhook data.

Fields

actId

  • String
  • Act ID you want to check

runId

  • String
  • Run ID of actor you want to check

datasetId

  • String
  • Dataset ID

options

  • Object

options.sampleCount

  • Number
  • Number of results that actor checks
  • Default is 100000

options.minOutputtedPages

  • Number
  • Indicates minimum outputted pages of crawler to checks if an attribute is set.

options.jsonSchema

  • Object
  • If jsonSchema is set actor check all sample results against schema.

options.compareWithPreviousExecution

  • Boolean
  • If compareWithPreviousExecution is set to true actor compare results with a previous execution.
  • If tag for execution is set compare actor result from previous results with the same tag.
  • It works only for the legacy crawler.

options.notifyTo

  • String
  • Mail where actor send notification if found error

options.runActOnSuccess

  • Object
  • If actor found errors runs this actor.
  • Example:
    {
      "id": "apify/send-mail",
      "input": {
          "to": "jakub.drobnik@apify.com",
          "subject": "test on success",
          "text": "No errors in crawler Amazon"
      }
    }
    NOTE: If you didn't set input, it set from input of main actor and errors output.

options.runActOnError

  • Object
  • If didn't find any errors runs this actor.
  • Same format as runActOnSuccess

Output

All found errors will be in the errors field.

Example output

{
  "errors": [
    "Run is not in SUCCEEDED status, act status: ABORTED",
    "Crawler returns only 0 outputted pages and minumum is 100"
  ],
  "executionAttrs": []
}