Website Changes Detector avatar
Website Changes Detector

Pricing

Pay per usage

Go to Store
Website Changes Detector

Website Changes Detector

Developed by

Tri⟁angle

Tri⟁angle

Maintained by Apify

This actor uses Apify’s Website Content Crawler to track website changes by comparing new and previous crawl results, highlighting only relevant updates to save time and resources.

0.0 (0)

Pricing

Pay per usage

0

Total users

3

Monthly users

3

Runs succeeded

83%

Last modified

3 days ago

Website Change Monitoring Orchestrator

Efficiently detect content changes (NEW, UPDATED, REMOVED, SAME) on websites.

This actor orchestrates Apify's Website Content Crawler (WCC) to perform website crawls and then intelligently compares the latest crawl results against the previous run. It identifies pages that are new, updated, removed, or unchanged, allowing you to focus downstream processing (like LLM analysis, notifications, etc.) only on relevant changes, saving significant time and cost.

Features

  • Orchestrates WCC: Automatically triggers runs of apify/website-content-crawler with your specified configuration.
  • Efficient Change Detection: Compares the latest crawl results against the previous successful run for the same configuration prefix.
  • Identifies Change Types: Classifies each page found as NEW, UPDATED (content changed), REMOVED (present in previous, absent in latest), or SAME (content identical).
  • Detailed Output: Provides both the current page data (from the latest crawl) and the previous page data (from the prior crawl) in the output records for context. Includes a text diff for UPDATED items.
  • Keyword Filtering: Optionally filter the output dataset to include only changes where the page text contains specified keywords.
  • Skip Crawl Option: Allows comparing the two most recent existing WCC datasets without performing a new crawl, useful for re-analysis or debugging.
  • Dataset Management: Automatically names and manages historical WCC datasets based on a user-defined prefix and timestamp, with configurable retention limits.

Input

The actor requires the following input configuration:

FieldTypeDescriptionDefaultRequired
wccInputObjectThe full JSON input configuration for the Website Content Crawler actor. Paste your WCC settings here.(Prefilled WCC default input)Yes
websiteContentDatasetNamePrefixStringA prefix used for naming the datasets created by WCC runs (e.g., myproject-prod). Format: <prefix>-WCC-<timestamp>. If empty, only WCC-<timestamp> is used.wcc-outputNo
returnChangeTypesArraySelect which types of changes (NEW, UPDATED, REMOVED, SAME) should be included in the output dataset.["NEW", "UPDATED", "REMOVED", "SAME"]Yes
filterKeywordsArrayOptional array of keywords. If provided, only output items whose text content contains at least one keyword (case-insensitive) will be included. Leave empty to disable.[]No
skipCrawlBooleanIf true, skip running a new WCC crawl and compare the two most recent existing datasets with the matching prefix. Requires at least two previous datasets.falseNo
websiteContentDatasetMaxCountIntegerMaximum number of historical WCC datasets (with the matching prefix) to keep. Older datasets will be deleted. Minimum value is 2.5Yes

Example Input:

{
"wccInput": {
"startUrls": [{ "url": "https://www.example.com/" }],
"maxPages": 500,
"crawlerType": "cheerio"
// ... other WCC settings ...
},
"websiteContentDatasetNamePrefix": "example-monitor",
"returnChangeTypes": ["NEW", "UPDATED", "REMOVED"],
"filterKeywords": ["high frequency", "spectrum auction"],
"skipCrawl": false,
"websiteContentDatasetMaxCount": 10
}

Example Output:

{
"change": {
"kind": "SAME",
"matchedKeywords": [
"scraping"
],
"createdAt": "2025-04-07T18:12:04.021Z",
"textDiff": null
},
"currentPage": {
// WCC output record, or null object if REMOVED
},
"previousPage": {
// WCC output record, or null object if NEW
}
}