Pricing

from $0.05 / 1,000 results

Drift To Patch Auto Fixer

Automatically detects broken CSS selectors caused by website changes and generates validated replacement selectors. The Actor scans the DOM, ranks candidates, validates them across reloads, and outputs ready-to-use selector patches with confidence scores for resilient scraping pipelines.

Pricing

from $0.05 / 1,000 results

Rating

0.0

(0)

Developer

Hayder Al-Khalissi

Actor stats

Bookmarked

Total users

Monthly active users

2 days ago

Last modified

Drift-to-Patch Auto-Fixer

Detect broken CSS selectors on any website and get validated replacement selectors—automatically.

When a site changes its HTML, your scrapers break. This Actor finds new selectors for you, validates them across multiple page loads, and outputs ready-to-use patches plus confidence scores. One run tells you what drifted and how to fix it.

Why use this Actor?

Catch selector drift before it breaks production – Run on key URLs (product pages, listings) to see when baseline selectors stop working.
Get replacement selectors, not just errors – The Actor scans the DOM, ranks candidates by stability and relevance, and validates the best one across several reloads.
Use patches in your pipeline – Output includes old → new selector per field, confidence scores, and extracted values. Working selectors are also saved to the Key-value store for use by other actors.
Deterministic and transparent – Same URL and input produce the same candidate order. Dataset includes the full candidate list and per-field results for auditing.

What it does

Loads each start URL with Playwright and runs your baseline selectors for every configured field.
If a selector fails (drift): scans the DOM for candidate selectors (IDs, classes, tags, data-* attributes), ranks them (text similarity, attribute stability, element frequency), and validates the top candidates over multiple page reloads.
Outputs one dataset item per URL with drift status, chosen patch(es), confidence, and extracted values. When patches are accepted, writes working selectors to the default Key-value store.

Input

Configure in the Apify console or via API. Full schema is in the Actor’s Input tab.

Input	Type	Description
Start URLs	array	URLs to check (e.g. product or listing pages). Required.
Fields	array	Fields to extract. Each object: `name` (e.g. `price`, `title`) and `baselineSelector` (e.g. `.price`, `h1`). Required.
Candidate search depth	integer	Max depth when scanning DOM for candidates. Default: `10`.
Validation runs	integer	Number of page reloads to validate each candidate. Default: `3`.
Confidence threshold	number	Min confidence (0–1) to accept a patch. Default: `0.75`.
Debug	boolean	Set to `true` for verbose logs. Default: `false`.
Selector blocklist	array	CSS selector patterns to exclude from candidates (e.g. `["a"]` to never use bare links).
Selector allowlist	array	If set, only these selectors are allowed as candidates.
Validation URLs	array	Optional URLs to validate the chosen patch on before accepting (max 5). If the selector returns empty on any, the patch is rejected.
Save evidence on drift	boolean	When `true`, save a screenshot and HTML snippet to the Key-value store when drift is detected (keys: `evidence_<url>_screenshot.png`, `evidence_<url>_snippet.html`). Default: `false`.
Mode	string	`patch` (default): full run, write patches and evidence to KV. `detect-only`: no KV writes; report drift, candidates, and decisions only (safe for CI/monitoring).

Per-field (in Extraction Fields), you can optionally set Field type (price, title, link, image) and Value pattern (regex string). When set, they act as semantic guardrails: a chosen patch is rejected if the extracted value does not match (e.g. a "price" field must look like a price, not a CTA like "Learn more"). If semantic validation fails, the patch is rejected (or marked needs-review) even if it’s stable across runs.

Example input

{
  "startUrls": [{ "url": "https://example.com/product/1" }],
  "fields": [
    { "name": "price", "baselineSelector": ".price", "type": "price", "valuePattern": "(€|\\$|£)\\s?\\d+" },
    { "name": "title", "baselineSelector": "h1", "type": "title" }
  ],
  "validationRuns": 3,
  "confidenceThreshold": 0.75,
  "selectorBlocklist": ["a", "div", "span"]
}

Output

Dataset (default)

One item per crawled URL.

Field	Description
`url`	Page that was checked.
`driftDetected`	Whether any field’s baseline selector failed.
`chosenPatch`	Accepted or reviewed patches per field: field, oldSelector, newSelector, confidence, stableValue, runValues, patchDecision (auto-apply / needs-review / rejected), decisionReason.
`patchDecision`	Overall URL decision: auto-apply (high confidence + semantic pass), needs-review, or rejected. Per-field patches include their own decision; this is the overall URL decision.
`rejectionReason`	If rejected, reason (e.g. semanticMismatch, lowConfidence, emptyOnValidationUrl).
`candidateList`	Candidate selectors (selector, sample value, score, selectorSpecificityScore, uniquenessHint).
`confidenceScore`	Overall confidence for this URL (0–1).
`extracted`	Final extracted value per field.
`perFieldResults`	Per-field drift, patch, and confidence.
`timestamp`	When the item was produced (ISO 8601).

Key-value store

When at least one patch is accepted, the Actor writes to the default Key-value store. Keys are URL-based (e.g. selectors_example.com__). Value shape:

{
  "selectors": {
    "price": { "selector": ".product-price", "confidence": 0.9 },
    "title": { "selector": "h1.title", "confidence": 0.85 }
  }
}

Use these entries in other actors to switch to the latest working selectors.

When Save evidence on drift is enabled, the Key-value store also gets evidence_<urlSafeKey>_screenshot.png and evidence_<urlSafeKey>_snippet.html for each URL where drift was detected (for debugging).

Use cases

Monitor production scrapers – Schedule runs on critical URLs to detect selector breakage early. Use detect-only mode for CI or scheduled checks (no KV writes).
Recover from redesigns – After a site change, get candidate replacements and confidence scores in one run.
Resilient pipelines – Feed Key-value store output into your main scraper so it can use patched selectors when baseline fails.
Extraction audits – Review which fields drifted, which candidates were tried, and why a patch was chosen.

Tips and limitations

Text extraction only – The Actor takes the text of the first element matching each selector. It does not handle lists or complex nested structures.
Tune confidence – Raise confidenceThreshold if patches are noisy; lower it if good candidates are rejected.
Validation runs – More runs (e.g. 3–5) improve stability; fewer (2) speed up runs.
KV store keys – Keys use only safe characters (e.g. selectors_example.com__). List keys in the run’s Key-value store in the Apify console or via API.

Run locally

Put your input in storage/key_value_stores/default/INPUT.json.
Set the storage directory and start the Actor:
- Windows (PowerShell): $env:CRAWLEE_STORAGE_DIR = (Resolve-Path ".\storage").Path; npm start
- macOS/Linux: CRAWLEE_STORAGE_DIR=./storage npm start
Results: storage/datasets/default/ (dataset), storage/key_value_stores/default/ (persisted selectors).

Support

For bugs or feature requests, open an issue in the Actor’s source repository or contact the maintainer through the Apify platform.

Sitemap Audit

apage/sitemap-audit

Get a Sitemap Health Score (0-100) for any website. Discover, parse, and validate XML sitemaps. Find 404s, redirects, canonical mismatches, noindex conflicts, hreflang issues, missing pages and estimate crawl budget waste.

Andy Page

Instagram Video Scraper and Downloader Fastest

neuro-scraper/instagram-video-scraper-and-downloader-fastest

⚡ Apify’s fastest Instagram Video Downloader & Metadata Scraper! Grab videos, captions, hashtags, owner info — instantly, securely, and privacy-first. Drop URLs, get ready-to-use JSON & download links in seconds. 🚀🔒 Analytics & monitoring made effortless!

Neuro Scraper

InvoiceFoundry

krffl-llc/invoice-generator

InvoiceFoundry turns your client, line item, and branding details into polished PDF invoices in seconds. It automates taxes and currencies, embeds QR codes for instant payment, and streams structured data to your CRM or BI tools so every invoice stays consistent and audit ready.

Krffl LLC

Fast TikTok API (free-watermark videos)

novi/fast-tiktok-api

Fastest TikTok API for Trend, Hashtag, Search, Music, User, Comment. Real-time, authentic data. No pre-built databases. Provides no-watermark video download links.

Novi

1.9K

3.5

(5)

TikTok Search Scraper (free-watermark videos)

novi/tiktok-search-api

TikTok Search Scraper helps exploring, discovering, and analyzing TikTok content with ease. Gain valuable insights and optimize your marketing efforts. Provides no-watermark download link.

Novi

466

5.0

(2)

Tiktok Trend API (free-watermark videos)

novi/tiktok-trend-api

Stay ahead of the trends with the TikTok Trend API. Access real-time insights on the latest viral content. Provides no-watermark download link. Optimize your content and engage with the TikTok community using this essential API.

Novi

840

3.0

(2)

TikTok Hashtag API

novi/tiktok-hashtag-api

Dominate TikTok with our Hashtag API! Get instant access to trending video data, watermark-free downloads, and optimize your content for maximum impact.

Novi

247

5.0

(2)

TikTok Sound Music API (free-watermark videos)

novi/tiktok-sound-api

Extract data from TikTok Sound/Music. Scrape full user profiles including posts, total likes, name, nickname, numbers of comments, shares, followers, following, and more.

Novi

146

5.0

(1)

Multiple TikTok Hashtags Scraper (free-watermark videos)

novi/multiple-tiktok-hashtag-scraper

Multiple Hashtag TikTok Scraper. Just add one or more hashtags and extract posts, images, URLs, comments, likes, users, locations, timestamps, and more. No proxy cost, low usage expense.

Novi

135