Website Change Detector avatar

Website Change Detector

Under maintenance
Try for free

Pay $1.60 for 1,000 results

Go to Store
This Actor is under maintenance.

This Actor may be unreliable while under maintenance. Would you like to try a similar Actor instead?

See alternative Actors
Website Change Detector

Website Change Detector

eloquent_mountain/website-change-detector
Try for free

Pay $1.60 for 1,000 results

Monitors websites for changes. Detects modifications to HTML structure and visual differences via screenshots. Provides detailed change reports including HTML diff. Track multiple URLs. Use tasks for recurring runs. Integrate as API

Website Change Detector

This Website Change Detector is a tool for monitoring changes in web pages over time. It detects modifications to both the HTML structure and visual differences via screenshots, and generates detailed reports including an HTML diff. This Actor is ideal for tracking changes on competitor websites, important pages, or any web content that needs constant monitoring.

What Does This Actor Do?

This actor automates the process of web page monitoring, it offers a comprehensive solution for detecting modifications in web content by:

  • HTML Structure Change Detection: It identifies changes to a webpage's underlying HTML structure by calculating a signature of the page and comparing it to the previous version.
  • Visual Change Detection: It captures screenshots of the web page and identifies visual changes to the page by comparing the current screenshots with the previous ones.
  • Detailed Change Reports: Provides a granular report of HTML changes, using an HTML diffing library.Every run is stored
  • Multi-URL Support: Can monitor changes in multiple URLs of different websites.
  • Avoids Bot Detection: Takes several measures to avoid bot detection (using realistic user-agents and headless browser settings).

How to Use the Website Change Detector

Using this actor is straightforward:

  1. Create an Apify Account: Start with a free Apify account using your email.
  2. Open the Website Change Detector: Go to the actor page.
  3. Provide URLs: Input a list of URLs you want to monitor.
  4. Run the Actor: Click the "Start" button and wait for the checks to complete.
  5. Download Your Data: Retrieve the change reports in JSON format or access all historical runs in Storage > Key-value stores > webpage-snapshots > history..
  6. OPTIONAL: Schedule tasks: Create a scheduled task for this actor to run automatically on any given time interval e.g cron job. Navigate to "Schedules" in the Apify UI.

Input

To start monitoring websites, the actor accepts the following input parameters:

  • urls: An array of URLs of the web pages you want to monitor.
  • save_screenshot: A boolean flag to determine whether to take a screenshot of the web pages or not (default is True).

Here’s an example of an input configuration in JSON format:

1{
2    "urls": [
3       "https://apify.com",
4        "https://www.example.com",
5        "https://www.ikea.com/nl/nl/p/onsevig-vloerkleed-laagpolig-veelkleurig-60497078/"
6    ]
7    "save_screenshot": true
8}

Output

The output from this Actor is stored in a dataset. You can view this data in the Apify UI or download it in JSON, CSV, or other formats.

The output is a JSON object with the following structure for each URL:

1{
2  "url": "https://apify.com",
3  "hasChanged": true,
4  "screenshotChange": false,
5  "date": 1704000000.000,
6  "htmlDiff": "<div class=\"diff\">...</div>",
7  "previous_screenshot": "base64_encoded_previous_screenshot",
8  "current_screenshot": "base64_encoded_current_screenshot",
9  "signature": "sha256_hash_of_the_html",
10  "html": "<html>...</html>"
11}
  • url: The URL of the web page that was monitored.
  • hasChanged: A boolean that indicates if there were changes to the HTML structure since the last run
  • screenshotChange: A boolean that indicates if there were changes to the screenshot since the last run.
  • date: The timestamp of when the check was made
  • htmlDiff: A string that contains the differences in the HTML if hasChanged is true, it contains null otherwise.
  • previous_screenshot: Base64-encoded string of the previous screenshot, if available, null otherwise
  • current_screenshot: Base64-encoded string of the current screenshot, if available, null otherwise
  • signature: SHA256 hash of the page's HTML (used for change detection).
  • html: The current HTML of the page

How Can I Use the Data from the Website Change Detector?

  • Competitor Monitoring: Track changes on competitor websites, to identify changes in layout, pricing, or marketing.
  • Website Monitoring: Monitor your own websites for unwanted changes or defacement.
  • Content Tracking: Track changes on news websites, research pages, or any other information source.
  • Alerting: Use the API and webhooks to send alerts when changes are detected.

How Does the Website Change Detector Work?

This actor fetches, analyzes, and compares web page data using a series of steps:

  1. Input Configuration:
    • The actor takes a list of URLs and parameters such as check_interval and save_screenshot.
  2. HTML Content Fetching:
    • The actor uses httpx to fetch the HTML content of the specified URLs.
  3. HTML Signature Generation:
    • The HTML content is processed and a hash of its content (signature) is generated after removing dynamic attributes, comments, and normalizing whitespaces.
    • The signature is used to detect changes in the HTML structure.
  4. Screenshot Capturing (optional):
    • If save_screenshot is true, it will create a full-page screenshot of the web page using Selenium.
  5. Change Detection:
    • The current HTML signature is compared with the previous one, if it is different then hasChanged will be set to true.
    • The current screenshot is compared to the previous one (if available), if they are different then screenshotChange will be set to true.
    • If changes are detected, an HTML diff of the previous and current versions of the HTML is generated.
  6. Data Storage:
    • All results, including the URL, hasChanged, screenshotChange, timestamps, HTML diff, screenshots, and the current HTML, are stored in Apify's Key-value store.
    • The current HTML signature and screenshots are also saved in the Key-Value store for future comparisons.
    • You can find your data in Apify: Storage > Key-value stores > webpage-snapshots. Data is stored indefinitely unless manually removed.

Integrations

This Actor integrates with other Apify platform components and other external services:

  • Webhooks: Automatically notify you when the scraping is complete or send the data to another application.
  • API: Control the Actor programmatically using the Apify API.
  • Cloud Services: Use Apify integrations to automatically store the data in services like Google Sheets, Google Drive, Slack, and others.

Track Changes on Any Webpage with This Dynamic Monitor

This Website Change Detector will allow you to track changes in the websites you need by efficiently monitoring for changes in the structure, and also tracking the screenshots.

Not What You Need? Build Your Own!

If this actor doesn't exactly meet your needs, you can use one of the scraper templates available in Python, JavaScript, and TypeScript to get started or check out our open-source library Crawlee.

You can also request a custom scraping solution from us.

Your Feedback

Your feedback is valuable to us. If you have any suggestions or find a bug, please create an issue on the Actor's Issues tab in the Apify Console.

FAQ

How much does Website Change Detector cost?

This actor uses Apify's Pay-per-result pricing model. Apify also provides you with free monthly usage credits.

How can I use Website Change Detector with the Apify API?

You can access the Apify API programmatically via RESTful HTTP endpoints or SDKs (apify-client NPM package for JavaScript, apify-client PyPI package for Python) to run, manage, and get the data out of any actor.

This actor only extracts data that is publicly available. Please ensure that you comply with the terms and conditions of websites you scrape, and you are responsible for ensuring your compliance with data privacy regulations such as GDPR.

Developer
Maintained by Community

Actor Metrics

  • 1 monthly user

  • 1 star

  • 50% runs succeeded

  • Created in Dec 2024

  • Modified 2 days ago