Data Change Monitoring avatar
Data Change Monitoring

Deprecated

Pricing

$5.00/month + usage

Go to Store
Data Change Monitoring

Data Change Monitoring

Deprecated

Developed by

Juro Oravec

Juro Oravec

Maintained by Community

Monitor data changes between scraper runs or other datasets. Get a report on what fields changed. This actor takes two datasets, and verifies that a sample of entries that are common to both datasets are identical. Output is a list of discrepancies between the two datasets.

0.0 (0)

Pricing

$5.00/month + usage

1

Total users

1

Monthly users

1

Last modified

a year ago

Run type (actor or task)

runTypeEnumOptional

Whether to call an actor or a task

Value options:

"ACTOR": string"TASK": string

Default value of this property is "ACTOR"

Actor or Task ID

actorOrTaskIdstringOptional

Actor or task to call. Allowed formats are username/actor-name, userId/actor-name or actor ID.

Can be omitted if you already have an existing Dataset and you don't need to run an Acor to generate the Dataset.

Either actorOrTaskId or actorOrTaskDatasetIdOrName MUST be given.

Actor or Task build

actorOrTaskBuildstringOptional

Tag or number of the actor build to run (e.g. beta or 1.2.345).

If not provided, the run uses build tag or number from the default actor run configuration (typically latest).

Actor or Task input

actorOrTaskInputobjectOptional

Input for the actor. An object is expected, which will be stringified to JSON and its content type set to application/json; charset=utf-8.

Actor or Task output Dataset ID

actorOrTaskDatasetIdOrNamestringOptional

ID or name of the dataset that stores entries scraped by the given actor or task.

Either actorOrTaskId or actorOrTaskDatasetIdOrName MUST be given.

Default: Run's default dataset.

NOTE: Dataset name can only contain letters 'a' through 'z', the digits '0' through '9', and the hyphen ('-') but only in the middle of the string (e.g. 'my-value-1'). Learn more

Comparison Dataset ID

comparisonDatasetIdOrNamestringOptional

ID or name of the dataset that stores entries from previous runs used for comparison. Learn more

NOTE: Dataset name can only contain letters 'a' through 'z', the digits '0' through '9', and the hyphen ('-') but only in the middle of the string (e.g. 'my-value-1')

Comparison - Primary keys

comparisonDatasetPrimaryKeysarrayOptional

Define fields used for matching entries between scraped and comparison datasets.

NOTE: If not set, the entries are hashed based on all fields

Comparison - Replace stale entries

comparisonDatasetRemoveStaleEntriesbooleanOptional

Scraped entries naturally get stale (e.g. a job offer is closed and removed from website). In such case, the entries in the comparison dataset can no longer be found in the scraped dataset, so we can't use them for comparison anymore.

Instead, we can replace these "stale" entries, so that the next time we run the comparison, we will again be able to find all entries.

If true, stale entries are automatically replaced if detected.

You might want to set this to false if you have a referential dataset that you want to update manually.

Default value of this property is true

Comparison - Max entries

comparisonDatasetMaxEntriesintegerOptional

How many entries should be stored in the comparison dataset.

Even with a dataset of thousands of entries, you should need only lower tens of entries to test the data integrity (assuming that these entries are well-diverse).

Default value of this property is 20

Comparison - Ignored fields

comparisonFieldsIgnorearrayOptional

Some fields may change with every run (e.g. extraction timestamp). Such fields should be ignored from the data integrity check to avoid false alerts.

Comparison - Warning fields

comparisonFieldsWarnarrayOptional

Some fields are either not as important, or their value may change more often than other fields (e.g. a job ad description may be corrected a few times over its lifetimes). You can mark such fields to be classified as "warnings" instead of "errors".

Pick dataset fields

outputPickFieldsarrayOptional

Select a subset of fields of an entry that will be pushed to the dataset.

If not set, all fields on an entry will be pushed to the dataset.

This is done before outputRenameFields.

Keys can be nested, e.g. "someProp.value[0]". Nested path is resolved using Lodash.get().

Rename dataset fields

outputRenameFieldsobjectOptional

Rename fields (columns) of the output data.

If not set, all fields will have their original names.

This is done after outputPickFields.

Keys can be nested, e.g. "someProp.value[0]". Nested path is resolved using Lodash.get().

Transform entries

outputTransformstringOptional

Freely transform the output data object using a custom function.

If not set, the data will remain as is.

This is done after outputPickFields and outputRenameFields.

The function has access to Apify's Actor class, and actor's input and a shared state in the second argument.

async (entry, { Actor, input, state, itemCacheKey }) => { ... }

Transform entries - Setup

outputTransformBeforestringOptional

Use this if you need to run one-time initialization code before outputTransform.

The function has access to Apify's Actor class, and actor's input and a shared state in the first argument.

async ({ Actor, input, state, itemCacheKey }) => { ... }

Transform entries - Teardown

outputTransformAfterstringOptional

Use this if you need to run one-time teardown code after outputTransform.

The function has access to Apify's Actor class, and actor's input and a shared state in the first argument.

async ({ Actor, input, state, itemCacheKey }) => { ... }

Filter entries

outputFilterstringOptional

Decide which scraped entries should be included in the output by using a custom function.

If not set, all scraped entries will be included.

This is done after outputPickFields, outputRenameFields, and outputTransform.

The function has access to Apify's Actor class, and actor's input and a shared state in the second argument.

async (entry, { Actor, input, state, itemCacheKey }) => boolean

Filter entries - Setup

outputFilterBeforestringOptional

Use this if you need to run one-time initialization code before outputFilter.

The function has access to Apify's Actor class, and actor's input and a shared state in the first argument.

async (entry, { Actor, input, state, itemCacheKey }) => boolean

Filter entries - Teardown

outputFilterAfterstringOptional

Use this if you need to run one-time teardown code after outputFilter.

The function has access to Apify's Actor class, and actor's input and a shared state in the first argument.

async ({ Actor, input, state, itemCacheKey }) => boolean

Dataset ID or name

outputDatasetIdOrNamestringOptional

By default, data is written to Default dataset. Set this option if you want to write data to non-default dataset. Learn more

NOTE: Dataset name can only contain letters 'a' through 'z', the digits '0' through '9', and the hyphen ('-') but only in the middle of the string (e.g. 'my-value-1')

Cache ID or name

outputCacheStoreIdOrNamestringOptional

Set this option if you want to cache scraped entries in Apify's Key-value store.

This is useful for example when you want to scrape only NEW entries. In such case, you can use the outputFilter option to define a custom function to filter out entries already found in the cache. Learn more

NOTE: Cache name can only contain letters 'a' through 'z', the digits '0' through '9', and the hyphen ('-') but only in the middle of the string (e.g. 'my-value-1')

Cache primary keys

outputCachePrimaryKeysarrayOptional

Specify fields that uniquely identify entries (primary keys), so entries can be compared against the cache.

NOTE: If not set, the entries are hashed based on all fields

Cache action on result

outputCacheActionOnResultEnumOptional

Specify whether scraped results should be added to, removed from, or overwrite the cache.

- add - Adds scraped results to the cache

- remove - Removes scraped results from the cache

- set - First clears all entries from the cache, then adds scraped results to the cache

NOTE: No action happens when this field is empty.

Value options:

"add": string"remove": string"overwrite": string

Metamorph actor ID - metamorph to another actor at the end

metamorphActorIdstringOptional

Use this option if you want to run another actor with the same dataset after this actor has finished (AKA metamorph into another actor). Learn more

New actor is identified by its ID, e.g. "apify/web-scraper".

Metamorph actor build

metamorphActorBuildstringOptional

Tag or number of the target actor build to metamorph into (e.g. 'beta' or '1.2.345')

Metamorph actor input

metamorphActorInputobjectOptional

Input object passed to the follow-up (metamorph) actor. Learn more