Drift To Patch Auto Fixer
Pricing
from $0.05 / 1,000 results
Drift To Patch Auto Fixer
Automatically detects broken CSS selectors caused by website changes and generates validated replacement selectors. The Actor scans the DOM, ranks candidates, validates them across reloads, and outputs ready-to-use selector patches with confidence scores for resilient scraping pipelines.
Pricing
from $0.05 / 1,000 results
Rating
0.0
(0)
Developer

Hayder Al-Khalissi
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Drift-to-Patch Auto-Fixer
Detect broken CSS selectors on any website and get validated replacement selectors—automatically.
When a site changes its HTML, your scrapers break. This Actor finds new selectors for you, validates them across multiple page loads, and outputs ready-to-use patches plus confidence scores. One run tells you what drifted and how to fix it.
Why use this Actor?
- Catch selector drift before it breaks production – Run on key URLs (product pages, listings) to see when baseline selectors stop working.
- Get replacement selectors, not just errors – The Actor scans the DOM, ranks candidates by stability and relevance, and validates the best one across several reloads.
- Use patches in your pipeline – Output includes old → new selector per field, confidence scores, and extracted values. Working selectors are also saved to the Key-value store for use by other actors.
- Deterministic and transparent – Same URL and input produce the same candidate order. Dataset includes the full candidate list and per-field results for auditing.
What it does
- Loads each start URL with Playwright and runs your baseline selectors for every configured field.
- If a selector fails (drift): scans the DOM for candidate selectors (IDs, classes, tags,
data-*attributes), ranks them (text similarity, attribute stability, element frequency), and validates the top candidates over multiple page reloads. - Outputs one dataset item per URL with drift status, chosen patch(es), confidence, and extracted values. When patches are accepted, writes working selectors to the default Key-value store.
Input
Configure in the Apify console or via API. Full schema is in the Actor’s Input tab.
| Input | Type | Description |
|---|---|---|
| Start URLs | array | URLs to check (e.g. product or listing pages). Required. |
| Fields | array | Fields to extract. Each object: name (e.g. price, title) and baselineSelector (e.g. .price, h1). Required. |
| Candidate search depth | integer | Max depth when scanning DOM for candidates. Default: 10. |
| Validation runs | integer | Number of page reloads to validate each candidate. Default: 3. |
| Confidence threshold | number | Min confidence (0–1) to accept a patch. Default: 0.75. |
| Debug | boolean | Set to true for verbose logs. Default: false. |
| Selector blocklist | array | CSS selector patterns to exclude from candidates (e.g. ["a"] to never use bare links). |
| Selector allowlist | array | If set, only these selectors are allowed as candidates. |
| Validation URLs | array | Optional URLs to validate the chosen patch on before accepting (max 5). If the selector returns empty on any, the patch is rejected. |
| Save evidence on drift | boolean | When true, save a screenshot and HTML snippet to the Key-value store when drift is detected (keys: evidence_<url>_screenshot.png, evidence_<url>_snippet.html). Default: false. |
| Mode | string | patch (default): full run, write patches and evidence to KV. detect-only: no KV writes; report drift, candidates, and decisions only (safe for CI/monitoring). |
Per-field (in Extraction Fields), you can optionally set Field type (price, title, link, image) and Value pattern (regex string). When set, they act as semantic guardrails: a chosen patch is rejected if the extracted value does not match (e.g. a "price" field must look like a price, not a CTA like "Learn more"). If semantic validation fails, the patch is rejected (or marked needs-review) even if it’s stable across runs.
Example input
{"startUrls": [{ "url": "https://example.com/product/1" }],"fields": [{ "name": "price", "baselineSelector": ".price", "type": "price", "valuePattern": "(€|\\$|£)\\s?\\d+" },{ "name": "title", "baselineSelector": "h1", "type": "title" }],"validationRuns": 3,"confidenceThreshold": 0.75,"selectorBlocklist": ["a", "div", "span"]}
Output
Dataset (default)
One item per crawled URL.
| Field | Description |
|---|---|
url | Page that was checked. |
driftDetected | Whether any field’s baseline selector failed. |
chosenPatch | Accepted or reviewed patches per field: field, oldSelector, newSelector, confidence, stableValue, runValues, patchDecision (auto-apply / needs-review / rejected), decisionReason. |
patchDecision | Overall URL decision: auto-apply (high confidence + semantic pass), needs-review, or rejected. Per-field patches include their own decision; this is the overall URL decision. |
rejectionReason | If rejected, reason (e.g. semanticMismatch, lowConfidence, emptyOnValidationUrl). |
candidateList | Candidate selectors (selector, sample value, score, selectorSpecificityScore, uniquenessHint). |
confidenceScore | Overall confidence for this URL (0–1). |
extracted | Final extracted value per field. |
perFieldResults | Per-field drift, patch, and confidence. |
timestamp | When the item was produced (ISO 8601). |
Key-value store
When at least one patch is accepted, the Actor writes to the default Key-value store. Keys are URL-based (e.g. selectors_example.com__). Value shape:
{"selectors": {"price": { "selector": ".product-price", "confidence": 0.9 },"title": { "selector": "h1.title", "confidence": 0.85 }}}
Use these entries in other actors to switch to the latest working selectors.
When Save evidence on drift is enabled, the Key-value store also gets evidence_<urlSafeKey>_screenshot.png and evidence_<urlSafeKey>_snippet.html for each URL where drift was detected (for debugging).
Use cases
- Monitor production scrapers – Schedule runs on critical URLs to detect selector breakage early. Use detect-only mode for CI or scheduled checks (no KV writes).
- Recover from redesigns – After a site change, get candidate replacements and confidence scores in one run.
- Resilient pipelines – Feed Key-value store output into your main scraper so it can use patched selectors when baseline fails.
- Extraction audits – Review which fields drifted, which candidates were tried, and why a patch was chosen.
Tips and limitations
- Text extraction only – The Actor takes the text of the first element matching each selector. It does not handle lists or complex nested structures.
- Tune confidence – Raise
confidenceThresholdif patches are noisy; lower it if good candidates are rejected. - Validation runs – More runs (e.g. 3–5) improve stability; fewer (2) speed up runs.
- KV store keys – Keys use only safe characters (e.g.
selectors_example.com__). List keys in the run’s Key-value store in the Apify console or via API.
Run locally
- Put your input in
storage/key_value_stores/default/INPUT.json. - Set the storage directory and start the Actor:
- Windows (PowerShell):
$env:CRAWLEE_STORAGE_DIR = (Resolve-Path ".\storage").Path; npm start - macOS/Linux:
CRAWLEE_STORAGE_DIR=./storage npm start
- Windows (PowerShell):
- Results:
storage/datasets/default/(dataset),storage/key_value_stores/default/(persisted selectors).
Support
For bugs or feature requests, open an issue in the Actor’s source repository or contact the maintainer through the Apify platform.