Schema Drift Detector avatar

Schema Drift Detector

Pricing

from $0.05 / 1,000 results

Go to Apify Store
Schema Drift Detector

Schema Drift Detector

Detect website structure changes before your scrapers break. Monitors pages with a headless browser, compares DOM fingerprints or watched selectors across runs, and alerts on drift via dataset or webhook. Built for reliable scraping pipelines.

Pricing

from $0.05 / 1,000 results

Rating

0.0

(0)

Developer

Hayder Al-Khalissi

Hayder Al-Khalissi

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

What does Schema Drift Detector do?

Schema Drift Detector is an Apify Actor that monitors any website for structural changes (schema drift) that would break your scrapers. It takes a list of Start URLs, loads each page with a headless browser, builds a DOM fingerprint—and optionally tracks specific selectors you care about—then compares the result to the previous run. You get a drift report in the Dataset and optional webhook alerts when something changes. Input is configured in the Actor’s Input tab: Start URLs, mode (dom-fingerprint or selector-watch), optional watch selectors, fingerprint and diff options, proxy, and alerts.

Schema Drift Detector does not scrape product data or extract content for you; it detects when a page’s structure has changed so you can fix your scrapers or pipelines before they fail.


Why use Schema Drift Detector?

  • Scraping reliability – Websites change: selectors get renamed, elements move or disappear. Catch layout and selector changes before they break your main scraping Actor.
  • Pipeline safety – Run Schema Drift Detector first (e.g. on a schedule); only run your scraper when no drift is detected, or get notified so you can update selectors.
  • Less downtime – Fix selectors or logic as soon as drift is reported instead of after user complaints or empty results.

You can read about scraping reliability and monitoring on the Apify blog, and use this Actor as part of a monitoring and alerting workflow for any site you scrape.


What can Schema Drift Detector do?

  • Detect DOM structure changes – Builds a stable fingerprint of tag sequence, nesting, and chosen attributes (e.g. id, class, data-*), and compares it across runs.
  • Watch critical selectors – In selector-watch mode, track presence, count, and content hash for specific elements (e.g. price, title, “Add to cart”).
  • Alert on drift – Triggers when similarity drops below your threshold or when a watched selector goes missing, changes count by more than 30%, or changes content.
  • Persist state – Stores the latest fingerprint per URL in the Key-Value Store so every run is compared to the previous one.
  • Webhook notifications – Optional POST to your URL when drift is detected (with retries and optional secret header).
  • Apify Proxy support – Use residential or other proxy groups and country code for consistent, geo-specific checks.
  • Human-readable diff – Output includes a top changes summary (e.g. tag count deltas) to understand what changed.

Your Actor + the Apify platform

Schema Drift Detector runs on Apify, so you get scheduling (e.g. daily drift checks), API access to results and run triggers, integrations (webhooks, “Run another Actor”), proxy rotation (Apify Proxy), and monitoring of runs and storage. You can chain it with your scraping Actor and automate your whole pipeline.


What data does Schema Drift Detector produce?

The Actor writes one dataset item per monitored URL. The main data points are:

Data pointDescription
urlThe page that was checked.
driftDetectedWhether structural or selector drift was detected.
similaritySimilarity score (0–1) between current and previous DOM fingerprint.
driftScore1 − similarity; higher means more change.
watchPer-selector status (ok, changed, missing) and counts in selector-watch mode.
topChangesHuman-readable summary of main changes (e.g. tag count deltas).
fingerprintHashHash of the current DOM fingerprint.
prevFingerprintHashHash of the previous run’s fingerprint (null on first run).
timestampWhen the check was performed.

The Key-Value Store holds the latest fingerprint (and optional watch results) per URL for the next run.


How to use Schema Drift Detector to monitor website structure

  1. Open Schema Drift Detector on Apify Store (or in your Console) and click Try for free or Start.
  2. Go to the Input tab and add your Start URLs—the same URLs you scrape or plan to scrape.
  3. Choose Mode: dom-fingerprint (structure only) or selector-watch (structure + specific selectors).
  4. If you use selector-watch, add Watch selectors (e.g. { "name": "price", "selector": ".price", "attribute": "text" }).
  5. Optionally set Proxy (e.g. Apify Proxy, group RESIDENTIAL, country code), Diff threshold, and Alerts (webhook URL and secret).
  6. Click Start. Results appear in the Dataset; the latest fingerprint per URL is saved in the Key-Value Store for the next run.

You can schedule the Actor (e.g. daily), use Run another Actor to run your scraper only after a successful drift check, or use webhooks to notify your system when drift is detected.


How much does it cost to run Schema Drift Detector?

Schema Drift Detector runs on Apify’s consumption-based pricing: you pay for Compute Units (CUs) and proxy usage (if you enable Apify Proxy). Cost depends on how many URLs you check, how often you run, and proxy settings. A single run over a few URLs typically uses a small amount of CUs. You can run the Actor on the free plan to try it; for ongoing monitoring, use the Apify pricing page to estimate cost based on your run frequency and URL count. No pay-per-result fee—you only pay for the compute and proxy you use.


Input

Schema Drift Detector has the following input options. Click on the Input tab on the Actor detail page for the full schema and tooltips.

InputDescription
Start URLsList of URLs to monitor (same format as other Apify crawlers).
Modedom-fingerprint (structure only) or selector-watch (structure + selectors).
Watch selectorsIn selector-watch mode: array of { "name", "selector", "attribute" } to track.
FingerprintOptions: includeText, includeAttributes, maxDepth, maxNodes.
Diffthreshold (0–1, default 0.15), reportTopChanges (number of top changes in output).
AlertswebhookUrl, webhookSecret, sendOnlyOnDrift.
ProxyuseApifyProxy, groups, countryCode.

Example: DOM fingerprint (one URL, with proxy)

{
"startUrls": [{ "url": "https://example.com/product/123" }],
"mode": "dom-fingerprint",
"fingerprint": { "maxDepth": 20, "maxNodes": 5000 },
"diff": { "threshold": 0.15, "reportTopChanges": 50 },
"proxy": { "useApifyProxy": true, "groups": ["RESIDENTIAL"], "countryCode": "DE" },
"maxRequestsPerCrawl": 200
}

Example: Selector watch (multiple URLs, webhook on drift)

{
"startUrls": [
{ "url": "https://shop.example.com/p/1" },
{ "url": "https://shop.example.com/p/2" }
],
"mode": "selector-watch",
"watchSelectors": [
{ "name": "price", "selector": ".price", "attribute": "text" },
{ "name": "title", "selector": "h1", "attribute": "text" }
],
"diff": { "threshold": 0.15, "reportTopChanges": 50 },
"alerts": {
"webhookUrl": "https://your-server.com/webhook/drift",
"webhookSecret": "your-secret",
"sendOnlyOnDrift": true
},
"proxy": { "useApifyProxy": true, "groups": ["RESIDENTIAL"], "countryCode": "US" }
}

Output

You can download the dataset produced by Schema Drift Detector in various formats such as JSON, CSV, or Excel from the run’s Storage tab (Dataset). The Key-Value Store contains the latest fingerprints for the next run.

Output example (one item per URL)

{
"url": "https://example.com/product/123",
"timestamp": "2025-02-10T12:00:00.000Z",
"mode": "selector-watch",
"driftDetected": true,
"similarity": 0.92,
"driftScore": 0.08,
"watch": [
{ "name": "price", "status": "changed", "prevCount": 1, "count": 1 },
{ "name": "title", "status": "ok", "prevCount": 1, "count": 1 }
],
"topChanges": [
{ "type": "tag_count", "description": "+5 <div>", "detail": { "tag": "div", "prevCount": 10, "count": 15, "delta": 5 } }
],
"fingerprintHash": "a1b2c3...",
"prevFingerprintHash": "d4e5f6..."
}

Integrate with other Actors and Apify API

  • Scheduler – Run Schema Drift Detector on a schedule (e.g. daily) on the same URLs you scrape. If driftDetected is false for all, run your scraping Actor.
  • Run another Actor – In a pipeline, run Schema Drift Detector first, then trigger your scraper only when the run succeeds (and optionally when no drift, using dataset filters or custom logic).
  • Webhook – Set alerts.webhookUrl and sendOnlyOnDrift: true. Your server receives a POST only when drift is detected; you can pause scrapers, notify the team, or run tests.
  • API – Use the Apify API to fetch the dataset and key-value store, and to trigger runs programmatically. The API tab on the Actor page shows how to call the Actor and access results.

Tips and advanced options

  • Reduce false positives – Keep includeText: false unless you need to detect copy changes. Use stable attributes (id, class, data-*) in includeAttributes.
  • Tune sensitivity – Increase diff.threshold (e.g. 0.2) to be more tolerant; decrease (e.g. 0.08) to be more sensitive. Start around 0.15.
  • Selector-watch – Use a few critical selectors (price, title, CTA) to reduce noise from unrelated layout changes.
  • Stable selectors – Prefer selectors like [data-product-price], h1, main .title rather than fragile class names that change often.
  • Consistent proxy – Use the same proxy group and country across runs when possible; different regions or IPs can serve different HTML and look like drift.
  • Debug – Set debug: true to see fingerprint stats and top changes in the log and tune thresholds or selectors.

FAQ and support

How often should I run Schema Drift Detector?

Depends on how often the site changes and how critical your scraper is. Daily or before each scraping run is common.

Does it work with Apify Proxy?

Yes. Set ProxyuseApifyProxy: true and choose groups (e.g. RESIDENTIAL) and countryCode for consistent checks.

Where is the previous fingerprint stored?

Fingerprints are saved in a Key-Value Store. Each URL’s fingerprint is stored under a key derived from the URL. For reliable comparison across runs, set Fingerprint store ID in the Input: in Apify Console go to Storage → Key-Value Stores, create a store (or pick an existing one), copy its ID, and paste it into the Actor input. All runs that use the same store ID will share fingerprints. If you leave it empty, the Actor uses a default store name (behavior may depend on the platform).

I get too many drift alerts. What can I do?

Increase diff.threshold, keep includeText: false, and use selector-watch only for a few critical selectors. Use the same proxy and country across runs.

How do I run my scraper only when there’s no drift?

Use Scheduler or Run another Actor: run Schema Drift Detector first, then run your scraper and optionally filter (e.g. only if no dataset item has driftDetected: true).

Schema Drift Detector only inspects page structure (DOM and optional selectors) to detect changes; it does not extract personal data, private user data, or content for storage. You are responsible for complying with the target website’s terms of service and applicable law (e.g. GDPR) when you choose which URLs to monitor and how you use the results. If you are unsure, consult your legal advisor. You can also read Apify’s blog post on the legality of web scraping.


Feedback and issues

If you run into issues or have feature ideas, use the Issues tab on the Actor’s page—we’re open to feedback and use it to improve the Actor. For integrations and programmatic access, see the API tab.