
Data Change Monitoring
Deprecated
Pricing
$5.00/month + usage

Data Change Monitoring
Deprecated
Monitor data changes between scraper runs or other datasets. Get a report on what fields changed. This actor takes two datasets, and verifies that a sample of entries that are common to both datasets are identical. Output is a list of discrepancies between the two datasets.
0.0 (0)
Pricing
$5.00/month + usage
1
Total users
1
Monthly users
1
Last modified
a year ago
Run type (actor or task)
runType
EnumOptional
Whether to call an actor or a task
Value options:
"ACTOR": string"TASK": string
Default value of this property is "ACTOR"
Actor or Task ID
actorOrTaskId
stringOptional
Actor or task to call. Allowed formats are username/actor-name
, userId/actor-name
or actor ID.
Can be omitted if you already have an existing Dataset and you don't need to run an Acor to generate the Dataset.
Either actorOrTaskId
or actorOrTaskDatasetIdOrName
MUST be given.
Actor or Task build
actorOrTaskBuild
stringOptional
Tag or number of the actor build to run (e.g. beta
or 1.2.345
).
If not provided, the run uses build tag or number from the default actor run configuration (typically latest
).
Actor or Task input
actorOrTaskInput
objectOptional
Input for the actor. An object is expected, which will be stringified to JSON and its content type set to application/json; charset=utf-8
.
Actor or Task output Dataset ID
actorOrTaskDatasetIdOrName
stringOptional
ID or name of the dataset that stores entries scraped by the given actor or task.
Either actorOrTaskId
or actorOrTaskDatasetIdOrName
MUST be given.
Default: Run's default dataset.
NOTE: Dataset name can only contain letters 'a' through 'z', the digits '0' through '9', and the hyphen ('-') but only in the middle of the string (e.g. 'my-value-1').
Learn more
Comparison Dataset ID
comparisonDatasetIdOrName
stringOptional
ID or name of the dataset that stores entries from previous runs used for comparison.
Learn more
NOTE: Dataset name can only contain letters 'a' through 'z', the digits '0' through '9', and the hyphen ('-') but only in the middle of the string (e.g. 'my-value-1')
Comparison - Primary keys
comparisonDatasetPrimaryKeys
arrayOptional
Define fields used for matching entries between scraped and comparison datasets.
NOTE: If not set, the entries are hashed based on all fields
Comparison - Replace stale entries
comparisonDatasetRemoveStaleEntries
booleanOptional
Scraped entries naturally get stale (e.g. a job offer is closed and removed from website). In such case, the entries in the comparison dataset can no longer be found in the scraped dataset, so we can't use them for comparison anymore.
Instead, we can replace these "stale" entries, so that the next time we run the comparison, we will again be able to find all entries.
If true
, stale entries are automatically replaced if detected.
You might want to set this to false
if you have a referential dataset that you want to update manually.
Default value of this property is true
Comparison - Max entries
comparisonDatasetMaxEntries
integerOptional
How many entries should be stored in the comparison dataset.
Even with a dataset of thousands of entries, you should need only lower tens of entries to test the data integrity (assuming that these entries are well-diverse).
Default value of this property is 20
Comparison - Ignored fields
comparisonFieldsIgnore
arrayOptional
Some fields may change with every run (e.g. extraction timestamp). Such fields should be ignored from the data integrity check to avoid false alerts.
Comparison - Warning fields
comparisonFieldsWarn
arrayOptional
Some fields are either not as important, or their value may change more often than other fields (e.g. a job ad description may be corrected a few times over its lifetimes). You can mark such fields to be classified as "warnings" instead of "errors".
Pick dataset fields
outputPickFields
arrayOptional
Select a subset of fields of an entry that will be pushed to the dataset.
If not set, all fields on an entry will be pushed to the dataset.
This is done before outputRenameFields
.
Keys can be nested, e.g. "someProp.value[0]"
.
Nested path is resolved using Lodash.get().
Rename dataset fields
outputRenameFields
objectOptional
Rename fields (columns) of the output data.
If not set, all fields will have their original names.
This is done after outputPickFields
.
Keys can be nested, e.g. "someProp.value[0]"
.
Nested path is resolved using Lodash.get().
Transform entries
outputTransform
stringOptional
Freely transform the output data object using a custom function.
If not set, the data will remain as is.
This is done after outputPickFields
and outputRenameFields
.
The function has access to Apify's Actor class, and actor's input and a shared state in the second argument.
async (entry, { Actor, input, state, itemCacheKey }) => { ... }
Transform entries - Setup
outputTransformBefore
stringOptional
Use this if you need to run one-time initialization code before outputTransform
.
The function has access to Apify's Actor class, and actor's input and a shared state in the first argument.
async ({ Actor, input, state, itemCacheKey }) => { ... }
Transform entries - Teardown
outputTransformAfter
stringOptional
Use this if you need to run one-time teardown code after outputTransform
.
The function has access to Apify's Actor class, and actor's input and a shared state in the first argument.
async ({ Actor, input, state, itemCacheKey }) => { ... }
Filter entries
outputFilter
stringOptional
Decide which scraped entries should be included in the output by using a custom function.
If not set, all scraped entries will be included.
This is done after outputPickFields
, outputRenameFields
, and outputTransform
.
The function has access to Apify's Actor class, and actor's input and a shared state in the second argument.
async (entry, { Actor, input, state, itemCacheKey }) => boolean
Filter entries - Setup
outputFilterBefore
stringOptional
Use this if you need to run one-time initialization code before outputFilter
.
The function has access to Apify's Actor class, and actor's input and a shared state in the first argument.
async (entry, { Actor, input, state, itemCacheKey }) => boolean
Filter entries - Teardown
outputFilterAfter
stringOptional
Use this if you need to run one-time teardown code after outputFilter
.
The function has access to Apify's Actor class, and actor's input and a shared state in the first argument.
async ({ Actor, input, state, itemCacheKey }) => boolean
Dataset ID or name
outputDatasetIdOrName
stringOptional
By default, data is written to Default dataset.
Set this option if you want to write data to non-default dataset.
Learn more
NOTE: Dataset name can only contain letters 'a' through 'z', the digits '0' through '9', and the hyphen ('-') but only in the middle of the string (e.g. 'my-value-1')
Cache ID or name
outputCacheStoreIdOrName
stringOptional
Set this option if you want to cache scraped entries in Apify's Key-value store.
This is useful for example when you want to scrape only NEW entries. In such case, you can use the outputFilter
option to define a custom function to filter out entries already found in the cache.
Learn more
NOTE: Cache name can only contain letters 'a' through 'z', the digits '0' through '9', and the hyphen ('-') but only in the middle of the string (e.g. 'my-value-1')
Cache primary keys
outputCachePrimaryKeys
arrayOptional
Specify fields that uniquely identify entries (primary keys), so entries can be compared against the cache.
NOTE: If not set, the entries are hashed based on all fields
Cache action on result
outputCacheActionOnResult
EnumOptional
Specify whether scraped results should be added to, removed from, or overwrite the cache.
- add - Adds scraped results to the cache
- remove - Removes scraped results from the cache
- set - First clears all entries from the cache, then adds scraped results to the cache
NOTE: No action happens when this field is empty.
Value options:
"add": string"remove": string"overwrite": string
Metamorph actor ID - metamorph to another actor at the end
metamorphActorId
stringOptional
Use this option if you want to run another actor with the same dataset after this actor has finished (AKA metamorph into another actor). Learn more
New actor is identified by its ID, e.g. "apify/web-scraper".
Metamorph actor build
metamorphActorBuild
stringOptional
Tag or number of the target actor build to metamorph into (e.g. 'beta' or '1.2.345')
Metamorph actor input
metamorphActorInput
objectOptional
Input object passed to the follow-up (metamorph) actor. Learn more