Scraper Regression Watchdog avatar

Scraper Regression Watchdog

Pricing

Pay per event

Go to Apify Store
Scraper Regression Watchdog

Scraper Regression Watchdog

Scraper Regression Watchdog runs any Apify actor against your test inputs and validates output quality — checking result counts, required fields, type consistency, empty-field rates, and schema drift against a stored baseline. It tells you immediately when a scraper breaks so you can fix it...

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

2

Monthly active users

2 days ago

Last modified

Categories

Share

What does Scraper Regression Watchdog do?

Scraper Regression Watchdog runs any Apify actor against your test inputs and validates output quality — checking result counts, required fields, type consistency, empty-field rates, and schema drift against a stored baseline. It tells you immediately when a scraper breaks so you can fix it before it causes data loss.

Why use Scraper Regression Watchdog?

  • Automated QA — run your scraper on a schedule and get alerts when output quality degrades
  • Schema drift detection — compares current output against a stored baseline to catch disappeared fields and type changes
  • Required-field validation — ensures critical fields (URL, price, title, etc.) are present and non-empty in every result
  • Result count bounds — flags both zero-result failures and suspicious spikes (duplicates, pagination bugs)
  • Type consistency checks — catches fields that flip between types (string vs number) across results
  • Webhook alerts — POST alert payloads to Slack, Discord, or custom endpoints on failure
  • Works with any actor — just provide an actor ID and test input

What data can you extract?

FieldExample
actorIdapify/web-scraper
buildTaglatest
verdicthealthy / degraded / broken / error
totalResults25
runStatusSUCCEEDED
checksArray of pass/warn/fail checks with messages
baselineDriftNew fields, missing fields, type changes
sampleResultsFirst 3 results from the test run
checkedAt2026-02-28T12:00:00.000Z

How much does it cost to monitor scraper quality?

Scraper Regression Watchdog uses pay-per-event pricing.

EventWhat triggers itPrice
startEach watchdog run$0.025
actor-checkedEach actor tested and validated$0.005

Real-world cost examples

ScenarioCost
Test 1 actor once$0.03
Daily check of 5 actors$0.05/day = $1.50/month
Hourly check of 1 critical actor$0.03/hour = $0.72/day

Platform compute costs are billed separately. The test actor run uses its own memory allocation (default: 4 GB).

How to monitor scraper quality with Scraper Regression Watchdog

  1. Set the actor ID — the Apify actor you want to test.
  2. Provide test input — a small, deterministic input (e.g., 1–2 URLs, low limits) for fast, reproducible runs.
  3. Define required fields — fields that must be present in every result (e.g., url, title, price).
  4. Set result bounds — minimum and maximum expected results to catch zero-output failures and duplicates.
  5. Run and review — the watchdog runs your actor, validates output, and reports a verdict.

Example input

{
"actorId": "apify/web-scraper",
"buildTag": "latest",
"actorInput": {
"startUrls": [{ "url": "https://example.com" }],
"maxRequestsPerCrawl": 5
},
"requiredFields": ["url", "title"],
"minResults": 1,
"maxResults": 10,
"actorMemoryMbytes": 2048,
"actorTimeoutSecs": 120
}

Input parameters

ParameterTypeDefaultDescription
actorIdstringrequiredApify actor ID or full name to test.
buildTagstringlatestBuild tag to test against.
actorInputobject{}Input JSON for the test actor run.
actorMemoryMbytesinteger4096Memory for the test run (MB).
actorTimeoutSecsinteger300Timeout for the test run (seconds).
requiredFieldsstring[][]Fields that must be non-empty in every result. Supports dot notation.
minResultsinteger1Minimum expected results.
maxResultsinteger10000Maximum expected results.
typeCheckFieldsstring[][]Fields to check for type consistency.
webhookUrlstringURL to POST alerts on non-healthy verdicts.

Output example

{
"actorId": "apify/web-scraper",
"buildTag": "latest",
"verdict": "healthy",
"totalResults": 5,
"runStatus": "SUCCEEDED",
"runUrl": "https://console.apify.com/actors/.../runs/...",
"checks": [
{ "name": "run-completed", "status": "passed", "message": "Actor run completed successfully." },
{ "name": "min-results", "status": "passed", "message": "Got 5 results (minimum: 1)." },
{ "name": "required-fields", "status": "passed", "message": "All 2 required fields present in all 5 results." },
{ "name": "empty-rate", "status": "passed", "message": "Empty field rate: 4.2% (healthy)." },
{ "name": "baseline-drift", "status": "passed", "message": "No schema drift detected from baseline." }
],
"sampleResults": [],
"baselineDrift": null,
"checkedAt": "2026-02-28T12:00:00.000Z"
}

Verdicts explained

VerdictMeaning
healthyAll checks passed. Scraper is working as expected.
degradedSome warnings but no failures. Data quality may be reduced.
brokenOne or more checks failed. Scraper needs immediate attention.
errorThe test run itself failed to start or crashed.

Tips for best results

  • Use small, deterministic test inputs — 1–2 URLs with low limits. This keeps test runs fast and costs low.
  • Schedule daily or weekly runs — catch regressions before they cause data loss.
  • Set required fields for business-critical data — URL, price, title, email, etc.
  • Use type-check fields for numeric data — catches when a price field flips from number to string.
  • Set tight min/max bounds — if you expect exactly 5 results, set minResults: 5 and maxResults: 10.
  • Add a webhook for real-time alerts — connect to Slack or Discord to get notified immediately.
  • Test specific build tags — test beta builds before promoting to latest.
  • The baseline updates automatically — each successful run becomes the new baseline for drift detection.

Integrations

Connect Scraper Regression Watchdog with Apify integrations to build automated monitoring. Schedule the watchdog to run daily on your critical actors and alert via Slack or email when quality degrades.

Using the Apify API

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('automation-lab/scraper-regression-watchdog').call({
actorId: 'apify/web-scraper',
actorInput: { startUrls: [{ url: 'https://example.com' }], maxRequestsPerCrawl: 5 },
requiredFields: ['url', 'title'],
minResults: 1,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Verdict: ${items[0].verdict}`);
for (const check of items[0].checks) {
console.log(` [${check.status}] ${check.name}: ${check.message}`);
}

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
run = client.actor('automation-lab/scraper-regression-watchdog').call(run_input={
'actorId': 'apify/web-scraper',
'actorInput': {'startUrls': [{'url': 'https://example.com'}], 'maxRequestsPerCrawl': 5},
'requiredFields': ['url', 'title'],
'minResults': 1,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(f"Verdict: {items[0]['verdict']}")
for check in items[0]['checks']:
print(f" [{check['status']}] {check['name']}: {check['message']}")

cURL

curl "https://api.apify.com/v2/acts/automation-lab~scraper-regression-watchdog/runs" \
-X POST \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"actorId": "apify/web-scraper", "actorInput": {"startUrls": [{"url": "https://example.com"}], "maxRequestsPerCrawl": 5}, "requiredFields": ["url", "title"], "minResults": 1}'

Use with AI agents via MCP

Scraper Regression Watchdog is available as a tool for AI assistants via the Model Context Protocol (MCP).

Setup for Claude Code

claude mcp add --transport http apify \
"https://mcp.apify.com?tools=automation-lab/scraper-regression-watchdog"

Setup for Claude Desktop, Cursor, or VS Code

{
"mcpServers": {
"apify-scraper-regression-watchdog": {
"url": "https://mcp.apify.com?tools=automation-lab/scraper-regression-watchdog"
}
}
}

Example prompts

  • "Check if our scrapers are still working correctly"
  • "Monitor these actors for output quality regressions"

Learn more in the Apify MCP documentation.

FAQ

What actors can I test? Any Apify actor you have permission to run. Just provide its actor ID or full name.

Does it modify the actor under test? No. The watchdog only runs the actor and reads its output. It never modifies the actor's code, settings, or data.

What is the baseline? The baseline is a snapshot of the output schema (field names and types) from the last successful run. It's stored in a named key-value store and persists across watchdog runs.

How does drift detection work? The watchdog compares current output fields and types against the stored baseline. It reports new fields (minor), missing fields (breaking), and type changes (breaking).

Can I test multiple actors in one run? Not directly — the watchdog tests one actor per run. Schedule multiple watchdog runs (one per actor) for multi-actor monitoring.

What happens if the test run fails? The watchdog catches the error and reports a verdict of error with the failure message. No baseline update occurs.

How much does a test run cost? The watchdog PPE charge is $0.03 per test. You also pay platform compute costs for the test actor run (CU + proxy), which depends on the actor and input.

The watchdog reports "error" but my actor works fine manually — what's wrong? Check the actorMemoryMbytes and actorTimeoutSecs settings. The watchdog runs your actor with these limits, which may be lower than what you use manually. Also verify the actorInput JSON matches what your actor expects.

My baseline keeps drifting — how do I reset it? The baseline updates automatically after each successful (healthy) run. If you changed your actor's output schema intentionally, run the watchdog once and accept the new baseline. The next run will compare against the updated schema.

Other automation tools