Pricing

from $0.05 / 1,000 results

Schema Drift Detector

Detect website structure changes before your scrapers break. Monitors pages with a headless browser, compares DOM fingerprints or watched selectors across runs, and alerts on drift via dataset or webhook. Built for reliable scraping pipelines.

Pricing

from $0.05 / 1,000 results

Rating

0.0

(0)

Developer

Hayder Al-Khalissi

Actor stats

Bookmarked

Total users

Monthly active users

4 days ago

Last modified

What does Schema Drift Detector do?

Schema Drift Detector is an Apify Actor that monitors any website for structural changes (schema drift) that would break your scrapers. It takes a list of Start URLs, loads each page with a headless browser, builds a DOM fingerprint—and optionally tracks specific selectors you care about—then compares the result to the previous run. You get a drift report in the Dataset and optional webhook alerts when something changes. Input is configured in the Actor’s Input tab: Start URLs, mode (dom-fingerprint or selector-watch), optional watch selectors, fingerprint and diff options, proxy, and alerts.

Schema Drift Detector does not scrape product data or extract content for you; it detects when a page’s structure has changed so you can fix your scrapers or pipelines before they fail.

Why use Schema Drift Detector?

Scraping reliability – Websites change: selectors get renamed, elements move or disappear. Catch layout and selector changes before they break your main scraping Actor.
Pipeline safety – Run Schema Drift Detector first (e.g. on a schedule); only run your scraper when no drift is detected, or get notified so you can update selectors.
Less downtime – Fix selectors or logic as soon as drift is reported instead of after user complaints or empty results.

You can read about scraping reliability and monitoring on the Apify blog, and use this Actor as part of a monitoring and alerting workflow for any site you scrape.

What can Schema Drift Detector do?

Detect DOM structure changes – Builds a stable fingerprint of tag sequence, nesting, and chosen attributes (e.g. id, class, data-*), and compares it across runs.
Watch critical selectors – In selector-watch mode, track presence, count, and content hash for specific elements (e.g. price, title, “Add to cart”).
Alert on drift – Triggers when similarity drops below your threshold or when a watched selector goes missing, changes count by more than 30%, or changes content.
Persist state – Stores the latest fingerprint per URL in the Key-Value Store so every run is compared to the previous one.
Webhook notifications – Optional POST to your URL when drift is detected (with retries and optional secret header).
Apify Proxy support – Use residential or other proxy groups and country code for consistent, geo-specific checks.
Human-readable diff – Output includes a top changes summary (e.g. tag count deltas) to understand what changed.

Your Actor + the Apify platform

Schema Drift Detector runs on Apify, so you get scheduling (e.g. daily drift checks), API access to results and run triggers, integrations (webhooks, “Run another Actor”), proxy rotation (Apify Proxy), and monitoring of runs and storage. You can chain it with your scraping Actor and automate your whole pipeline.

What data does Schema Drift Detector produce?

The Actor writes one dataset item per monitored URL. The main data points are:

Data point	Description
url	The page that was checked.
driftDetected	Whether structural or selector drift was detected.
similarity	Similarity score (0–1) between current and previous DOM fingerprint.
driftScore	1 − similarity; higher means more change.
watch	Per-selector status (ok, changed, missing) and counts in selector-watch mode.
topChanges	Human-readable summary of main changes (e.g. tag count deltas).
fingerprintHash	Hash of the current DOM fingerprint.
prevFingerprintHash	Hash of the previous run’s fingerprint (null on first run).
timestamp	When the check was performed.

The Key-Value Store holds the latest fingerprint (and optional watch results) per URL for the next run.

How to use Schema Drift Detector to monitor website structure

Open Schema Drift Detector on Apify Store (or in your Console) and click Try for free or Start.
Go to the Input tab and add your Start URLs—the same URLs you scrape or plan to scrape.
Choose Mode: dom-fingerprint (structure only) or selector-watch (structure + specific selectors).
If you use selector-watch, add Watch selectors (e.g. { "name": "price", "selector": ".price", "attribute": "text" }).
Optionally set Proxy (e.g. Apify Proxy, group RESIDENTIAL, country code), Diff threshold, and Alerts (webhook URL and secret).
Click Start. Results appear in the Dataset; the latest fingerprint per URL is saved in the Key-Value Store for the next run.

You can schedule the Actor (e.g. daily), use Run another Actor to run your scraper only after a successful drift check, or use webhooks to notify your system when drift is detected.

How much does it cost to run Schema Drift Detector?

Schema Drift Detector runs on Apify’s consumption-based pricing: you pay for Compute Units (CUs) and proxy usage (if you enable Apify Proxy). Cost depends on how many URLs you check, how often you run, and proxy settings. A single run over a few URLs typically uses a small amount of CUs. You can run the Actor on the free plan to try it; for ongoing monitoring, use the Apify pricing page to estimate cost based on your run frequency and URL count. No pay-per-result fee—you only pay for the compute and proxy you use.

Input

Schema Drift Detector has the following input options. Click on the Input tab on the Actor detail page for the full schema and tooltips.

Input	Description
Start URLs	List of URLs to monitor (same format as other Apify crawlers).
Mode	`dom-fingerprint` (structure only) or `selector-watch` (structure + selectors).
Watch selectors	In selector-watch mode: array of `{ "name", "selector", "attribute" }` to track.
Fingerprint	Options: `includeText`, `includeAttributes`, `maxDepth`, `maxNodes`.
Diff	`threshold` (0–1, default 0.15), `reportTopChanges` (number of top changes in output).
Alerts	`webhookUrl`, `webhookSecret`, `sendOnlyOnDrift`.
Proxy	`useApifyProxy`, `groups`, `countryCode`.

Example: DOM fingerprint (one URL, with proxy)

{
  "startUrls": [{ "url": "https://example.com/product/123" }],
  "mode": "dom-fingerprint",
  "fingerprint": { "maxDepth": 20, "maxNodes": 5000 },
  "diff": { "threshold": 0.15, "reportTopChanges": 50 },
  "proxy": { "useApifyProxy": true, "groups": ["RESIDENTIAL"], "countryCode": "DE" },
  "maxRequestsPerCrawl": 200
}

Example: Selector watch (multiple URLs, webhook on drift)

{
  "startUrls": [
    { "url": "https://shop.example.com/p/1" },
    { "url": "https://shop.example.com/p/2" }
  ],
  "mode": "selector-watch",
  "watchSelectors": [
    { "name": "price", "selector": ".price", "attribute": "text" },
    { "name": "title", "selector": "h1", "attribute": "text" }
  ],
  "diff": { "threshold": 0.15, "reportTopChanges": 50 },
  "alerts": {
    "webhookUrl": "https://your-server.com/webhook/drift",
    "webhookSecret": "your-secret",
    "sendOnlyOnDrift": true
  },
  "proxy": { "useApifyProxy": true, "groups": ["RESIDENTIAL"], "countryCode": "US" }
}

Output

You can download the dataset produced by Schema Drift Detector in various formats such as JSON, CSV, or Excel from the run’s Storage tab (Dataset). The Key-Value Store contains the latest fingerprints for the next run.

Output example (one item per URL)

{
  "url": "https://example.com/product/123",
  "timestamp": "2025-02-10T12:00:00.000Z",
  "mode": "selector-watch",
  "driftDetected": true,
  "similarity": 0.92,
  "driftScore": 0.08,
  "watch": [
    { "name": "price", "status": "changed", "prevCount": 1, "count": 1 },
    { "name": "title", "status": "ok", "prevCount": 1, "count": 1 }
  ],
  "topChanges": [
    { "type": "tag_count", "description": "+5 <div>", "detail": { "tag": "div", "prevCount": 10, "count": 15, "delta": 5 } }
  ],
  "fingerprintHash": "a1b2c3...",
  "prevFingerprintHash": "d4e5f6..."
}

Integrate with other Actors and Apify API

Scheduler – Run Schema Drift Detector on a schedule (e.g. daily) on the same URLs you scrape. If driftDetected is false for all, run your scraping Actor.
Run another Actor – In a pipeline, run Schema Drift Detector first, then trigger your scraper only when the run succeeds (and optionally when no drift, using dataset filters or custom logic).
Webhook – Set alerts.webhookUrl and sendOnlyOnDrift: true. Your server receives a POST only when drift is detected; you can pause scrapers, notify the team, or run tests.
API – Use the Apify API to fetch the dataset and key-value store, and to trigger runs programmatically. The API tab on the Actor page shows how to call the Actor and access results.

Tips and advanced options

Reduce false positives – Keep includeText: false unless you need to detect copy changes. Use stable attributes (id, class, data-*) in includeAttributes.
Tune sensitivity – Increase diff.threshold (e.g. 0.2) to be more tolerant; decrease (e.g. 0.08) to be more sensitive. Start around 0.15.
Selector-watch – Use a few critical selectors (price, title, CTA) to reduce noise from unrelated layout changes.
Stable selectors – Prefer selectors like [data-product-price], h1, main .title rather than fragile class names that change often.
Consistent proxy – Use the same proxy group and country across runs when possible; different regions or IPs can serve different HTML and look like drift.
Debug – Set debug: true to see fingerprint stats and top changes in the log and tune thresholds or selectors.

FAQ and support

How often should I run Schema Drift Detector?

Depends on how often the site changes and how critical your scraper is. Daily or before each scraping run is common.

Does it work with Apify Proxy?

Yes. Set Proxy → useApifyProxy: true and choose groups (e.g. RESIDENTIAL) and countryCode for consistent checks.

Where is the previous fingerprint stored?

Fingerprints are saved in a Key-Value Store. Each URL’s fingerprint is stored under a key derived from the URL. For reliable comparison across runs, set Fingerprint store ID in the Input: in Apify Console go to Storage → Key-Value Stores, create a store (or pick an existing one), copy its ID, and paste it into the Actor input. All runs that use the same store ID will share fingerprints. If you leave it empty, the Actor uses a default store name (behavior may depend on the platform).

I get too many drift alerts. What can I do?

Increase diff.threshold, keep includeText: false, and use selector-watch only for a few critical selectors. Use the same proxy and country across runs.

How do I run my scraper only when there’s no drift?

Use Scheduler or Run another Actor: run Schema Drift Detector first, then run your scraper and optionally filter (e.g. only if no dataset item has driftDetected: true).

Is it legal to use Schema Drift Detector?

Schema Drift Detector only inspects page structure (DOM and optional selectors) to detect changes; it does not extract personal data, private user data, or content for storage. You are responsible for complying with the target website’s terms of service and applicable law (e.g. GDPR) when you choose which URLs to monitor and how you use the results. If you are unsure, consult your legal advisor. You can also read Apify’s blog post on the legality of web scraping.

Feedback and issues

If you run into issues or have feature ideas, use the Issues tab on the Actor’s page—we’re open to feedback and use it to improve the Actor. For integrations and programmatic access, see the API tab.

Drift To Patch Auto Fixer

quantifiable_bouquet/drift-to-patch-auto-fixer

Automatically detects broken CSS selectors caused by website changes and generates validated replacement selectors. The Actor scans the DOM, ranks candidates, validates them across reloads, and outputs ready-to-use selector patches with confidence scores for resilient scraping pipelines.

Hayder Al-Khalissi

Web Drift Detector – Website Change Monitoring & Content Diff

bilal-dev/web-drift-detector

Detect website changes automatically. Monitor pricing, content, policies, and competitors using fast browserless web change detection. Structured diffs, severity scoring, historical snapshots, and webhook alerts. Ideal for compliance, SaaS, ecommerce, and monitoring workflows.

Muhammad Bilal

5.0

Seo Content Drift Penalty Radar

bilal-dev/seo-content-drift-penalty-radar

An Apify Actor that monitors SEO-critical web pages over time and detects content changes that explain ranking drops, such as content removal, heading drift, keyword dilution, and CTA loss.

Muhammad Bilal

5.0

Job Posting Drift Intelligence Actor

bilal-dev/job-posting-drift-intelligence-actor

Job Posting Drift Intelligence monitors job listings over time and detects meaningful changes like salary updates, remote/onsite shifts, seniority inflation, and requirement changes. Turn static job posts into actionable job lifecycle intelligence

Muhammad Bilal

5.0

Ai Data Quality Guardian

quantifiable_bouquet/ai-data-quality-guardian

Validate, clean, and score datasets automatically. Detect anomalies, schema drift, duplicates, and data quality issues to produce reliable, structured outputs for analytics and automation workflows.

Hayder Al-Khalissi

I18n Audit

lisaakinfiieva/i18n-audit

Detects translation gaps and meaning/structural differences between multilingual pages. - Finds missing content and meaning drift in translated web pages - Compares multilingual pages to detect translation and structure gaps - Identifies incomplete or inconsistent page translations across languages

Lisa Akinfiieva

5.0

Website Change Detector

technicaldost/website-change-detector

Monitor websites for content changes with visual comparison. Get alerts when pages update.

Technical Dost Solutions

Dataset Quality Scorer

fiery_dream/dataset-quality-scorer

Score ML datasets for quality (completeness, consistency, duplicates, balance). Detect data drift, outliers, and recommend improvements.

Cody Churchwell

Website Change Detector

eloquent_mountain/website-change-detector

Monitors websites for changes. Detects modifications to HTML structure and visual differences via screenshots. Provides detailed change reports including HTML diff. Track multiple URLs. Use tasks for recurring runs. Integrate as API

Paco

Website Change Notification API

gr_59017/website-change-notification-api

Monitors websites for content changes using Cheerio. Compares current and previous page versions and notifies when changes are detected. Ideal for tracking notices, updates, pricing changes, and announcements.