Deprecated

Pricing

$5.00/month + usage

See alternative Actors

Go to Store

Data Change Monitoring

Deprecated

See alternative Actors

Developed by

Juro Oravec

Monitor data changes between scraper runs or other datasets. Get a report on what fields changed. This actor takes two datasets, and verifies that a sample of entries that are common to both datasets are identical. Output is a list of discrepancies between the two datasets.

0.0 (0)

Pricing

$5.00/month + usage

Total users

Monthly users

Last modified

a year ago

Developer tools

Automation

Monitor data changes between scraper runs or other datasets. Get a report on what fields changed.

What is Data Change Monitoring and how it works?

Do you use or manage a scraper that runs regularly (daily, weekly, monthly...)?

How do you ensure that the scraper is still collecting the data correctly? How do you monitor changes?

Usually the strategies go like this:

No monitoring
Check for errors
Validate results for correct shape and type.

However, this still doesn't save you from unexpected changes. Subtler changes, especially in textual data, are harder to track. Examples:

Maybe a website starts to prefix post titles with tags - From 'Some title' to '[meta] Some title'.
Maybe a change in the website HTML means that instead of a product description, you will start receiving a string '-'.

Anticipating these change in advance is almost impossible, and writing elaborate regexes to catch these changes can get prohibitevely complex.

This is where Data Change Monitoring comes to help.

How it works

The Data Change Monitoring actor can be used in two ways:

Static testing - You define a static dataset (updated manually) of entries you expect. After scraping a website, you can compare the scraped data against the static dataset to detect changes.
Change monitoring - Instead of a static dataset, you compare the scraper entries against entries from a previous scraper run.
- Example: Each day, the data change monitoring is run against a scraper that produces a dataset. If the entries sampled from today's dataset are different from yesterday's run, this will be picked up and reported. (And after the actor is done, a sample from today's entries are saved to be used for tomorrow's comparison.)
- When an entry stops being available in the scraped dataset, it is considered stale, and it's replaced with other entry. This way, the data change monitoring can be run regularly over long periods of time without any manual intervention.

This actor takes two datasets, and verifies that a sample of entries that are common to both datasets are indeed identical. Output is a list of discrepancies that occurred between the two datasets.

The two datasets are:

Reference dataset - The source of truth of what entries we expect.
Tested dataset - Incoming (unknown) dataset that we want to check against the reference dataset.

The entries common to both datasets are identified by primary keys - combination of fields that together uniquely identify the entries.

For example, the combination of keys 'firstName' and 'lastName' uniquely identifies the entries in following dataset:

[{
    firstName: 'John',
    lastName: 'Doe',
    hobbies: ['skiing', 'hiking', 'travel'],
  }, {
    firstName: 'Ann',
    lastName: 'Doe',
    hobbies: ['crossfit', 'running', 'gardening'],
  }]

The Tested dataset can be specified either as an Apify Dataset, or as an Apify Actor (or Task) run. In other words, another actor can be triggered to generate the dataset to be tested. This way, you can run another actor that obtains scraped data in real time, and once it finishes, the data change monitoring actor will check if the scraped data matches the expected dataset.

See the outputs section for a detailed description.

The data can be downloaded in JSON, JSONL, XML, CSV, Excel, or HTML formats.

Features

This actor is a robust production-grade solution suitable for businesses and those that need reliability.

Hence, beside its primary function, the Data Change Monitoring actor comes packing with the following:

Integrated data filtering and transformation
- Filter and modify the output dataset entries out of the box from within Apify UI (via custom JavaScript functions), without needing other tools.
Integrated cache
- You can use cache together with custom filtering to e.g. save only NEW entries to the output dataset. Save time and reduce cost.
- Cache automatically stores which entries were already scraped. Cache can persist between different scraper runs.
Tested daily for high reliability
- The actor is regularly tested end-to-end to minimize the risk of a broken integration.
Metamorphing - Pass result dataset to other actors
- Automatically trigger another actor when this one is done to process the resulting dataset.
- Metamorphing means that the dataset and key-value store is passed to another actor.
- Actor metamorph can be configure via actor input. No need to define custom actors just for that.

How to use Data Change Monitoring actor

Create a free Apify account using your email
Open Data Change Monitoring actor
In Input, select the datasets to compare.
Click "Start" and wait for the report to be generated.
Download your data in JSON, JSONL, XML, CSV, Excel, or HTML format.

Input options

For details and examples for all input fields, please visit the Input tab.

Input examples

Example 1

Compare two datasets.
Entries are identified by first and last name.
Discrepancies in timestamps are ignored.
Comparison (reference) dataset is static (needs to be updated manually).

{
  "actorOrTaskDatasetIdOrName": "njkdaldawhd",
  "comparisonDatasetIdOrName": "u8p93qf3w8",
  "comparisonDatasetPrimaryKeys": ["firstName", "lastName"],
  "comparisonDatasetRemoveStaleEntries": false,
  "comparisonFieldsIgnore": ["timestamp"],
  "comparisonFieldsWarn": ["someLessImportantField"],
}

Example 2

Compare a dataset with results from an actor run with a given input.
Entries are identified by "id" field.
The comparison (reference) dataset should include up to 50 entries, and stale entries are replaced.

{
  "runType": "ACTOR",
  "actorOrTaskId": "jurooravec/actor-name",
  "actorOrTaskBuild": "1.2.3",
  "actorOrTaskInput": {
    "inputField1": "a",
    "inputField2": [],
  },
  "comparisonDatasetIdOrName": "u8p93qf3w8",
  "comparisonDatasetPrimaryKeys": ["id"],
  "comparisonDatasetRemoveStaleEntries": true,
  "comparisonDatasetMaxEntries": 50,
}

Outputs

Once the actor is done, you can see the overview of results in the Output tab.

To export the data, head over to the Storage tab.

Data Change Monitoring actor dataset overview

Sample output from Data Change Monitoring

{
  // Field where the change occured
  "fieldName": "description",
  // Whether the change in this field is considered an error or a warning
  "severity": "ERROR",
  // Whether there was a type mismatch on the field level
  "fieldTypeMismatch": false,
  // Value of the field on the reference (comparison) entry
  "fieldValueReference": ["skiing", "hiking", "travel"],
  // Value of the field on the tested entry
  "fieldValueTested": ["skiing (sport)", "hiking (sport)", "travel (lifestyle)"],
  // Whether there was a type mismatch on the item level
  "itemTypeMismatch": false,
  // Fields and their values that were used as primary keys
  "itemKeys": {
    "firstName": "John",
    "lastName": "Doe",
  },
  // JSON of the reference (comparison) entry
  "itemValueReference": {
    "firstName": "John",
    "lastName": "Doe",
    "hobbies": ["skiing", "hiking", "travel"],
  },
  // JSON of the tested entry
  "itemValueReference": {
    "firstName": "John",
    "lastName": "Doe",
    "hobbies": ["skiing (sport)", "hiking (sport)", "travel (lifestyle)"],
  },
}

How to integrate Data Change Monitoring with other services, APIs or Actors

You can connect the actor with many of the integrations on the Apify platform. You can integrate with Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and more. Or you can use webhooks to carry out an action whenever an event occurs, e.g. get a notification whenever Instagram API Scraper successfully finishes a run.

Use Data Change Monitoring with Apify API

The Apify API gives you programmatic access to the Apify platform. The API is organized around RESTful HTTP endpoints that enable you to manage, schedule and run Apify actors. The API also lets you access any datasets, monitor actor performance, fetch results, create and update versions, and more.

To access the API using Node.js, use the apify-client NPM package. To access the API using Python, use the apify-client PyPI package.

Check out the Apify API reference docs for full details or click on the API tab for code examples.

Who can I contact for issues with Data Change Monitoring actor?

To report issues and find help, head over to the Discord community, or email me at juraj[dot]oravec[dot]josefson[at]gmail[dot]com

On this page

- Data Change Monitoring

Share Actor:

Salesforce AppExchange Scraper

jungle_synthesizer/salesforce-appexchange-scraper

Scrape listing details from Salesforce AppExchange. It extracts relevant metadata, developer details, reviews, and more.

BowTiedRacoon

Github Marketplace Scraper

jungle_synthesizer/github-marketplace-scraper

Scrape GitHub Marketplace listings. Extracts relevant metadata, developer details, pricing information, and even hidden fields like emails.

BowTiedRacoon

Website Change Detector

eloquent_mountain/website-change-detector

Monitors websites for changes. Detects modifications to HTML structure and visual differences via screenshots. Provides detailed change reports including HTML diff. Track multiple URLs. Use tasks for recurring runs. Integrate as API

Paco

Twitter/X Hashtag Scraper: Support Sentiment&Tone Analyzer 2025

fastcrawler/twitter-x-hashtag-scraper-support-sentiment-tone-analyzer-2025

Get 1,000 results for just $0.01! Introducing the Twitter Hashtag Fast Scraper, your go-to solution for scraping Twitter hashtags. This powerful tool combines blazing-fast speed with advanced data extraction capabilities, making it perfect for social media analysts, marketers, and researchers.

fastcrawler

267

Monitor Stock & Crypto Market Sentiment on Twitter/X

fastcrawler/monitor-stock-crypto-market-sentiment-on-twitter-x

Gain trading insights for stocks and crypto by analyzing real-time market sentiment from Twitter/X. Our tool monitors cashtags ($SYMBOL) at high speed, extracting precise bullish or bearish sentiment data to help you identify trends and make informed decisions.

fastcrawler

102

4.1

Twitter User Profile Fast&Cheapest Scraper 2025

fastcrawler/twitter-user-profile-fast-cheapest-scraper-2025

1000 results only cost $0.01. , this scraper allowing you to gather valuable data on any Twitter user in seconds. With Twitter User Fast Scraper, you can streamline your data collection process and obtain specific details about user profiles, including follower counts, bio descriptions.

fastcrawler

3.6

Tweet(X)/Twitter scraper|$0.35/1K| Pay-Per Result v2

fastcrawler/tweet-x-twitter-scraper-0-35-1k-pay-per-result-v2

Need a no-code, online Twitter (X)/Tweet scraper? Get real-time data for just $0.35 per 1K results with our pay-per-result model—no hidden fees! Extract tweets, profiles, and engagement metrics instantly. Fast, reliable, and cost-effective. Start scraping now! 🚀

fastcrawler

131

X/Twitter Trends Scraper｜ 2025

fastcrawler/x-twitter-trends-scraper-2025

1000 results only cost 0.01$. Monitor real-time X (Twitter) trends with this scraper. Ideal for social media analysis, content creation, and trend tracking. Provides topic and tweet volume data. no-code，online

fastcrawler

5.0

Twitter Following&Followers&BlueVerified Fast&Cheapest Scraper

fastcrawler/twitter-following-followers-blueverified-fast-cheapest-scraper

1000 followers only cost 0.01$. With the Twitter Following&Followers&BlueVerified Fast Scraper, you can enhance your data collection process, streamline your workflows, and gain a competitive edge by acquiring accurate, up-to-date information.

fastcrawler

136

Twitter Reply Scraper | $0.4/1K Tweets | Pay-Per Result ｜ 2025

fastcrawler/twitter-reply-scraper-0-4-1k-tweets-pay-per-result-2025

Get 1,000 Twitter replies(comments) for just $0.40! 🚀 Instantly download Twitter replies and comments. Our tool provides precise and rapid data extraction, no coding required. Pay only for the data you need, and gain valuable insights for market analysis, sentiment monitoring, and more.

fastcrawler

132

Tweet Fast&Cheapest Scraper V2 2025 | $7/month

fastcrawler/Tweet-Fast-Scraper

1000 tweets only cost 0.01$. Unlock the full potential of Twitter data with Tweet Fast Scraper! Our cutting-edge tool offers lightning-fast search, all enhanced by customizable filters to suit your needs.

fastcrawler