Website Changes Detector
Pricing
Pay per usage
Website Changes Detector
This actor uses Apify’s Website Content Crawler to track website changes by comparing new and previous crawl results, highlighting only relevant updates to save time and resources.
0.0 (0)
Pricing
Pay per usage
0
Total users
3
Monthly users
3
Runs succeeded
83%
Last modified
3 days ago
Website Change Monitoring Orchestrator
Efficiently detect content changes (NEW, UPDATED, REMOVED, SAME) on websites.
This actor orchestrates Apify's Website Content Crawler (WCC) to perform website crawls and then intelligently compares the latest crawl results against the previous run. It identifies pages that are new, updated, removed, or unchanged, allowing you to focus downstream processing (like LLM analysis, notifications, etc.) only on relevant changes, saving significant time and cost.
Features
- Orchestrates WCC: Automatically triggers runs of
apify/website-content-crawler
with your specified configuration. - Efficient Change Detection: Compares the latest crawl results against the previous successful run for the same configuration prefix.
- Identifies Change Types: Classifies each page found as
NEW
,UPDATED
(content changed),REMOVED
(present in previous, absent in latest), orSAME
(content identical). - Detailed Output: Provides both the current page data (from the latest crawl) and the previous page data (from the prior crawl) in the output records for context. Includes a text diff for
UPDATED
items. - Keyword Filtering: Optionally filter the output dataset to include only changes where the page text contains specified keywords.
- Skip Crawl Option: Allows comparing the two most recent existing WCC datasets without performing a new crawl, useful for re-analysis or debugging.
- Dataset Management: Automatically names and manages historical WCC datasets based on a user-defined prefix and timestamp, with configurable retention limits.
Input
The actor requires the following input configuration:
Field | Type | Description | Default | Required |
---|---|---|---|---|
wccInput | Object | The full JSON input configuration for the Website Content Crawler actor. Paste your WCC settings here. | (Prefilled WCC default input) | Yes |
websiteContentDatasetNamePrefix | String | A prefix used for naming the datasets created by WCC runs (e.g., myproject-prod ). Format: <prefix>-WCC-<timestamp> . If empty, only WCC-<timestamp> is used. | wcc-output | No |
returnChangeTypes | Array | Select which types of changes (NEW , UPDATED , REMOVED , SAME ) should be included in the output dataset. | ["NEW", "UPDATED", "REMOVED", "SAME"] | Yes |
filterKeywords | Array | Optional array of keywords. If provided, only output items whose text content contains at least one keyword (case-insensitive) will be included. Leave empty to disable. | [] | No |
skipCrawl | Boolean | If true , skip running a new WCC crawl and compare the two most recent existing datasets with the matching prefix. Requires at least two previous datasets. | false | No |
websiteContentDatasetMaxCount | Integer | Maximum number of historical WCC datasets (with the matching prefix) to keep. Older datasets will be deleted. Minimum value is 2 . | 5 | Yes |
Example Input:
{"wccInput": {"startUrls": [{ "url": "https://www.example.com/" }],"maxPages": 500,"crawlerType": "cheerio"// ... other WCC settings ...},"websiteContentDatasetNamePrefix": "example-monitor","returnChangeTypes": ["NEW", "UPDATED", "REMOVED"],"filterKeywords": ["high frequency", "spectrum auction"],"skipCrawl": false,"websiteContentDatasetMaxCount": 10}
Example Output:
{"change": {"kind": "SAME","matchedKeywords": ["scraping"],"createdAt": "2025-04-07T18:12:04.021Z","textDiff": null},"currentPage": {// WCC output record, or null object if REMOVED},"previousPage": {// WCC output record, or null object if NEW}}