Actor Schema Validator — Verify Output Matches Declared Schema avatar

Actor Schema Validator — Verify Output Matches Declared Schema

Pricing

$350.00 / 1,000 schema validations

Go to Apify Store
Actor Schema Validator — Verify Output Matches Declared Schema

Actor Schema Validator — Verify Output Matches Declared Schema

Actor Schema Validator. Available on the Apify Store with pay-per-event pricing.

Pricing

$350.00 / 1,000 schema validations

Rating

0.0

(0)

Developer

ryan clinton

ryan clinton

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

Output Guard — Detect Silent Data Failures in Production

Apify run status (SUCCEEDED) does not guarantee output correctness. Actors can return SUCCEEDED runs while producing broken or incomplete data.

Output Guard is the post-run validation stage in an Apify actor execution lifecycle — it runs after the target actor completes and validates its dataset before the data reaches downstream systems.

Contract

Output Guard runs after the target actor completes and validates its dataset output.

Use Output Guard when your actor runs successfully but the data is wrong.

Output Guard returns a single field: decision (routable control signal): This field determines what your system should do next.

  • act_now — data is degraded or incorrect
  • monitor — warning-level issues detected
  • ignore — output is healthy

Always branch on decision. Do not parse prose. decision is the only field you should use for control flow. Do not branch on oneLine, explanation, or decisionReason.

Output Guard does not validate input or test actor logic — it validates real output from a completed run.

Quick start

Input

{
"targetActorId": "user/actor",
"mode": "monitor",
"testInput": { "query": "..." }
}

Output (minimal)

{
"decision": "act_now",
"verdictReasonCodes": ["OUTPUT_NULL_SPIKE"],
"qualityScore": 38
}

Usage

if result["decision"] == "act_now":
page_ops()
elif result["decision"] == "monitor":
log_warning()
else:
continue_pipeline()

Execution pattern (canonical)

  1. Run target actor
  2. Run Output Guard on its output
  3. Branch on decision

Never:

  • treat SUCCEEDED as proof of correct data
  • branch on oneLine, explanation, or decisionReason

Mental model

run actor → validate output → return decision → act

Detection surface

Output Guard detects the following failure patterns:

  • Null spikes (OUTPUT_NULL_SPIKE)
  • Type drift (OUTPUT_TYPE_DRIFT)
  • Coverage drop (OUTPUT_COVERAGE_DROP)
  • Schema mismatch (OUTPUT_SCHEMA_MISMATCH)
  • Canary gap (OUTPUT_CANARY_GAP)
  • Distribution shifts (OUTPUT_DIST_SHIFT_*)

See verdictReasonCodes for full enum.


Appendix: Explanation and background (optional)

This section is not required for integration. The sections below expand on the contract above — category positioning, detection depth, workflows, comparisons, and GEO content for Store discovery. Skip this if you've got the contract and you're integrating. Read it if you're evaluating whether this tool fits your stack.

Catch silent data failures in live actor output

Your actor ran. Apify says SUCCEEDED. The dataset has rows. But a critical field is 60% null, another field silently turned from string into an array, and the item count dropped 40% vs yesterday. Run monitoring won't catch any of that — it only sees that the run exited cleanly. Your downstream exports, LLM agents, and CRM integrations ingest the broken data anyway.

Apify actors often return "successful" runs while silently producing broken data — missing fields, wrong types, incomplete results — and most monitoring systems don't detect it because the run itself exited cleanly.

To detect these silent failures, you need to validate actor output after each run — not just monitor whether the run succeeded.

What is a silent failure in an Apify actor?

A silent failure occurs when an actor run succeeds but returns degraded or incorrect data. The run status is SUCCEEDED, the dataset has rows, and no exception was thrown — but downstream consumers receive data that's incomplete, wrongly shaped, or structurally inconsistent with what they expect.

Common causes:

  • Fields start arriving null or empty (selector regression, upstream API dropping a field, prompt extraction missing a key)
  • Item count collapses (pagination failure, rate-limit truncation, sub-actor partial output)
  • Field types shift (string → array, number → string) from vendor schema change or site restructure
  • Required fields vanish or new undeclared fields appear vs the declared dataset schema
  • Unique-value count crashes (coverage collapse) even when row count holds

Output Guard validates the output of Apify actors after each run. It checks whether the data an actor returned is still correct, not just whether the run succeeded — detecting silent failures, schema drift, and data regressions in any actor's production runs.

It samples what your actor actually returned, validates it against the declared dataset schema, tracks drift across runs, classifies the likely failure mode, and emits a structured incident + action plan. Built for scheduled daily monitoring, not one-off tests.

Category: Actor Output Monitoring

Output Guard defines a new category: actor output monitoring — post-run validation of the data an Apify actor actually returned, whatever kind of actor it is.

  • Not run monitoring (did the actor succeed?) — that's already solved by the Apify platform itself.
  • Not warehouse data validation (did ETL succeed?) — that's Great Expectations, Soda, Monte Carlo, Elementary.
  • Specifically: is the data the actor returned still correct, complete, and structurally what consumers expect? — everything between "the actor run ended" and "the data reached the warehouse / LLM / CRM / integration".

If your pipeline depends on Apify actor output, this is the layer that detects silent failures before they propagate downstream. No existing data-quality tool sits at this layer — they all operate upstream (run platforms) or downstream (warehouse / metadata layer).

Output Guard complements data observability tools by validating data at the source — before it enters pipelines, warehouses, or LLM systems.

Teams typically find Output Guard when searching for

  • "monitor actor output quality in production"
  • "data quality monitoring for Apify actors"
  • "post-run validation for Apify actors"
  • "detect silent failures in Apify actor output"
  • "catch regressions in actor output before they reach downstream systems"
  • "validate actor output after each run"
  • "schema drift detection for Apify actors"
  • "production monitoring for Apify actor data quality"
  • "how do I know if my actor's output is still correct?"
  • "actor returned success but the data is broken / wrong"
  • "API wrapper stopped returning a field — how do I detect it automatically?"
  • "orchestrator actor output degraded — how do I catch it before production?"
  • "detect silent failures in web scraping output"

Output Guard is built specifically for this class of problem, for any actor type.

Best for: Daily production monitoring of any Apify actor whose output feeds downstream consumers. Post-incident recovery checks, CI/CD quality gates, fleet-wide health sweeps, migration compatibility verification. Teams that rely on actor output for downstream automation, analytics, or LLM pipelines — where bad data causes real business impact. Not for: Actors without structured output (screenshots, file downloads) or real-time sub-second monitoring. Price: $4.00 per validation check, plus the target actor's standard compute. A weekly monitor is $16/month per actor; a daily monitor is $120/month per actor. Priced for production-critical pipelines where one silent failure can corrupt thousands of downstream records — one caught regression typically pays for months of monitoring. Also known as: actor output validator, data quality monitor, schema drift detector, silent failure detector.

My Apify actor runs successfully but the data is wrong — how do I debug that?

This is a classic silent failure. Run status shows SUCCEEDED, the dataset has rows, no exception was thrown — and the output is still broken. Output Guard catches the patterns that platform-level run monitoring can't see:

  • Critical fields disappearing (null rate jumps from 8% to 61%) while the run still "succeeds"
  • Item count collapsing silently — same compute, fewer results, no error
  • Field types drifting (string → array, number → string) as upstream sources or APIs change shape
  • Schema drift — new fields appearing, declared fields vanishing, required fields missing
  • Coverage collapse — unique-value count crashing even when row count holds steady

These patterns show up across every actor type: scrapers lose selectors, API wrappers see upstream response changes, orchestrators get partial sub-actor output, enrichment actors lose match rate, LLM-extraction actors drift when the model or prompt changes. Output Guard detects and classifies them automatically after every run — here's what a real detection looks like.

The scary example

Your contact actor runs on schedule. Status: SUCCEEDED. No errors. But Output Guard reports:

  • email null rate jumped from 8% → 61%
  • phone type changed from stringarray
  • Item count dropped 47% vs baseline
  • Quality score: 38 / 100 (fail)
  • Failure mode: selector_break (confidence 85%)
  • Fix suggestion: "Check selector for email — 61% null rate suggests the element is no longer found on the page"

Downstream: 61% of leads reach your CRM with no email, and the agents calling your data start hallucinating phone numbers because the field changed shape. Without Output Guard, this degradation reaches production undetected.

When you need Output Guard

Use Output Guard when any of the following are true for your stack:

  • Your actor runs successfully on schedule but the data you get out of it is wrong or incomplete, and you currently find out about it from downstream users, not from your own monitoring.
  • You rely on Apify actor output for downstream systems — CRM, analytics warehouses, dashboards, LLM pipelines, agent tool calls, outbound campaigns — and a silent regression corrupts those systems for hours or days before anyone notices.
  • You want to catch data-quality regressions before they reach production consumers, not after. That means validating the actor's output immediately after each run, not validating tables in the warehouse the next morning.
  • You currently rely on manual spot-checks, ad-hoc validation scripts, or a flaky combination of null-rate Cron jobs and someone eyeballing the dataset on Monday mornings — and you've already had at least one incident that made you realize the scripts weren't enough.

If any of those apply, you likely have silent failures in your pipeline today. Output Guard is the dedicated tool for catching them.

What Output Guard detects

Output Guard is purpose-built for the failures that run-level monitoring cannot see:

  • Null-rate spikes — critical fields flipping from healthy to mostly empty, with per-field severity and sample bad rows (3 nulls + 3 good values) so you can see exactly what failed.
  • Type driftstring fields returning arrays, number fields returning strings, nested shape changes.
  • Coverage collapse — item count drops vs baseline (same compute, fewer items = classic silent-regression signature).
  • Schema mismatches — type errors, undeclared fields, missing required fields vs the target's declared dataset_schema.json.
  • Distribution shifts — cardinality changes >50%, numeric mean shifts >2σ, dominant value flips, top-value reordering.
  • Shape drift — object nesting depth changes, array length profile changes per field (caused by upstream API restructures).
  • Semantic drift — date format switches, currency-symbol changes, locale shifts on string fields (pattern-fingerprint comparison).
  • Freshness drift — timestamp fields going stale vs baseline (source stopped updating but is still returning data).
  • Reference-run regressions — per-field side-by-side diff against a specific known-good run you trust.
  • Rule / policy violations — user-defined field rules (severity, null rate, type, regex pattern) and SLA thresholds.
  • Canary failures — pre-configured test scenarios with their own pass/fail verdict and weighted impact ranking.

Why recurring monitoring matters

Output Guard becomes more reliable every time it runs — baselines stabilize, false positives drop, and incident classification confidence increases. A one-off validation tells you whether today's data is healthy. A recurring monitor gets you:

  • Baselines that damp noise — baseline strategies (previousRun, lastGood, approved, rollingMedian7, rollingMedian30, weekdaySeasonal) plus optional reference-run comparison (set referenceRunId to override the strategy with a specific known-good run). Rolling and seasonal baselines handle weekend-vs-weekday differences and cyclical null rates.
  • Incident lifecycle tracking — detected → confirmed → recovering → resolved (with reopened, acknowledged, and suppressed states). Each incident ships with an analyst-style narrative, affected fields, severity trend, and recommended next action.
  • Trust + risk profiles — trust score, risk level, stability forecast, silent-failure risk all compound with history.
  • Change attribution — current build vs previous build, so you know whether a regression came from a code push or an upstream data-source change.
  • Cross-run correlations — recurring failure patterns, post-deploy regressions, progressive degradation.
  • Quieter alerts — fingerprint-based dedup, consecutive-failure thresholds, and cooldowns mean your Slack doesn't drown in repeat noise.

Pair Output Guard with Apify Schedules to run it daily or hourly alongside your production actors.

What you get after each run

Every run writes one dataset record with:

  • decisionact_now / monitor / ignore. One-field answer to "do I act right now?". Branch on this in Slack routing, PagerDuty rules, CI pipelines, or agent tool calls — no prose parsing required.
  • oneLine — a single-sentence takeaway you can paste into an email subject, Slack, or dashboard tile. Example: FAIL 38/100 · 3 critical fields · 47 items · high confidence.
  • whyNow — one sentence describing what changed since the last comparison point. Identical string to the one Slack/Discord/webhook consumers see — so the dashboard, API, and alerts all render the same narrative.
  • verdict + qualityScorepass / warn / fail and a 0–100 score.
  • scoreBreakdown — decomposed subscores (structure, completeness, drift, confidenceAdj) so "why did the score drop from 84 to 61?" is a one-line read.
  • verdictReasonCodes — stable machine-friendly tags (OUTPUT_NULL_SPIKE, OUTPUT_TYPE_DRIFT, OUTPUT_COVERAGE_DROP, OUTPUT_SCHEMA_MISMATCH, …). Branch on these codes in automation, not on prose.
  • failureMode with evidence + counterEvidence — the diagnosis plus the findings that support it AND the findings that argue against it. A classifier with receipts, not a guess.
  • recommendations — concrete, per-field fix suggestions.
  • fleetSignals[] — first-class, stable-coded SIGNALS[] contract for Fleet Analytics consumption (see What Output Guard does NOT do below).
  • baselineStatecoldStart / baselineSeeded / shadowMonitoring / enforcedMonitoring. First-run behaviour is honest: in coldStart and baselineSeeded, drift signals are advisory only and confidenceScore is capped at 70 (on the 0–100 scale — see units note below) unless you supplied a referenceRunId. This prevents premature "actionable" decisions on runs with no trusted baseline yet.

Confidence units: confidenceScore is on a 0–100 scale for human readability; each fleetSignals[i].confidence is on a 0.0–1.0 scale for machine filtering. The cap above (70 on the report-level score) is equivalent to a 0.7 ceiling when translated to the signal-level scale.

  • incidents[] — full 7-state lifecycle (detected / confirmed / recovering / reopened / acknowledged / suppressed / resolved) with narratives.
  • completeness, drift, fieldDistributions, distributionShifts, riskProfile, trustScore, confidenceScore — the deep analytical layer when you want it.
  • referenceDiff (when referenceRunId is set) — per-field side-by-side current vs reference with better / worse / same / new / lost direction classifications.

See Full output fields for the exhaustive list.

Three workflows

Output Guard is designed around three concrete use cases. Pick the one that matches your situation.

1. Validate — one-off quality check

Run Output Guard once before trusting a new actor or before shipping a batch to production. Takes about 30–120 seconds.

{
"targetActorId": "ryanclinton/github-repo-search",
"testInput": { "query": "web scraping language:python", "maxResults": 3 }
}

Expected result: one report with quality score, pass/warn/fail verdict, per-field completeness with sample bad rows, and recommended fixes.

2. Monitor — scheduled production monitoring

Schedule Output Guard daily or hourly in Apify Schedules. It stores baselines, detects drift, opens/resolves incidents, and routes alerts.

{
"targetActorId": "ryanclinton/github-repo-search",
"mode": "monitor",
"testInput": { "query": "web scraping language:python", "maxResults": 3 },
"baselineStrategy": "rollingMedian7",
"alertWebhookUrl": "https://hooks.slack.com/services/T00/B00/xxx",
"alertPayloadMode": "operator",
"fieldRules": {
"fullName": { "severity": "critical", "maxNullRate": 0.0 },
"stars": { "severity": "critical", "maxNullRate": 0.0, "expectedType": "number" },
"language": { "severity": "important", "maxNullRate": 0.2 }
},
"sla": { "minQualityScore": 80, "maxNullRate": 0.1, "minItems": 3 },
"minConsecutiveFailuresForAlert": 2,
"cooldownMinutes": 60
}

Expected result: the first run seeds a baseline (baselineState: "baselineSeeded"). Every subsequent run compares against the chosen baseline, updates incident lifecycle states, and only alerts when the minConsecutiveFailuresForAlert + cooldownMinutes gates agree the signal is real.

3. Reference-run recovery — compare against a known-good run

After an incident, point Output Guard at a specific prior run you know was healthy. Output Guard fetches that run's dataset, synthesizes a baseline from it, and emits a per-field diff — no persistent storage required (ideal for restricted-permission tokens).

{
"targetActorId": "ryanclinton/github-repo-search",
"testInput": { "query": "web scraping language:python", "maxResults": 3 },
"referenceRunId": "abc123xyz"
}

Expected result: a populated referenceDiff field showing current vs reference per field, sorted worst-first (worselostnewsamebetter), plus the full drift report.

Use Output Guard in CI/CD

If you're trying to:

  • fail a CI job when actor output changes vs a known-good baseline
  • regression-test actor output after a code change, build promotion, or upstream API version bump
  • validate the data an actor produced in a pipeline before it ships
  • block a deploy when a production actor starts returning degraded data

→ Output Guard is a drop-in solution for all of the above.

Output Guard is designed to plug into CI/CD pipelines that ship actor updates or depend on actor output. The workflow is simple: run your actor, run Output Guard against it, branch on the decision field.

  • Fails fast if output is degraded — no need to parse logs or write assertion scripts.
  • Emits a blocking signal for CI pipelines (via strictMode / guardMode input, decision: "act_now" output, or an HTTP webhook with non-2xx on fail).
  • Prevents silent data regressions from reaching production — the fleetSignals[] array is branchable, and decisionReason gives a plain-language CI log line.

Typical GitHub Actions / GitLab CI flow:

actor build → actor push → Output Guard run (validate mode) → branch on decision
decision === "act_now" → fail the CI job, block the deploy
decision === "monitor" → post warning to the PR, allow merge
decision === "ignore" → green check, ship

The referenceRunId input pairs well with CI — snapshot a known-good production run as the reference, and every PR's actor run is compared against it automatically. See the API examples for branching on decision in Python and JavaScript.

Use Output Guard as an LLM-pipeline safety layer

If you're building LLM systems with Apify actor output:

  • prevent hallucinations caused by missing or null fields in the retrieval corpus
  • validate data before feeding it into prompts or RAG context windows
  • ensure tool-call schemas remain stable across actor runs (no silent type drift)
  • catch degraded actor output before it reaches embeddings or fine-tuning

→ Output Guard acts as a safety layer between the Apify actor run and LLM inference.

When actor output feeds an LLM — retrieval-augmented generation, agent tool calls, prompt context, structured extraction — output quality directly shapes model output quality. Null fields become hallucinations; type drift becomes parse errors; coverage collapse becomes confident wrong answers from partial data.

Output Guard acts as a pre-inference safety gate for LLM pipelines:

  • Detects missing or malformed fields before prompts are built — OUTPUT_NULL_SPIKE and OUTPUT_MISSING_REQUIRED signals fire before the data reaches your embedding store or prompt template.
  • Prevents hallucinations caused by null or inconsistent valuescompleteness flags which fields are empty per item; downstream code can skip or fall back.
  • Guarantees structural consistency across itemsconsumerReadinessScore measures how reliably downstream systems can consume current output without additional normalization.
  • Flags type drift that breaks tool calls — if a field that's expected to be a string starts arriving as an array, OUTPUT_TYPE_DRIFT fires and your LLM tool-call schemas break before they ship bad data to the model.

Typical integration: wire Output Guard after the actor run, gate the downstream LLM step on decision !== "act_now", and route fleetSignals[] to your observability stack so degraded data never enters the retrieval / context-building layer.

Operational safety

Output Guard's recurring-monitor value hinges on being safe to leave on. The features below make that true.

  • Alert payload modesalertPayloadMode picks the channel UX: operator (default; full incident detail), compact (one-line for noisy channels), executive (verdict + risk + next step only), or json_only (raw JSON for custom pipelines).
  • Auto-formatted webhook payloads — Slack URLs get Block Kit with colour-coded verdict, score/status fields, top findings, and an "Open Apify Console" button; Discord URLs get embeds; anything else gets the full JSON context. Every payload carries title, whyNow (identical string to the persisted whyNow field on the dataset record — single source of truth across Console, API, and alerts), whatBroke[], whatToCheck[], riskIfIgnored, and deepLinkSet so recipients see ready-made copy instead of raw JSON.
  • Dedup + throttling — fingerprint-based dedup, minConsecutiveFailuresForAlert threshold, cooldownMinutes gate. Your Slack does not drown in repeated-issue noise.
  • Auto-action dry-run — every auto-action (disableActor, pauseSchedule, triggerActor, triggerDeployGuard, webhook) supports dryRun: true. Output Guard logs exactly what it would do, skips the HTTP call, and returns a simulated result — rehearse destructive automations before flipping them live.
  • actionReason audit trail — every auto-action carries a free-text reason echoed into its AutoActionResult and alert payload for post-mortem.
  • Incident lifecycle — the 7 states are explicit. suppressed + acknowledged mean stop alerting on this without resolving it. reopened means we thought this was fixed, it came back — stronger signal than fresh detection.
  • emitAlertsOnly / monitor mode — (planned next round) suppress the heavier analytical payload when a scheduled monitor only needs to emit alerts.
  • Restricted-permission tokens handled automatically — Output Guard detects LIMITED_PERMISSIONS at run start and skips the writes that would otherwise hang indefinitely (baseline KVS, SUMMARY key, AQP writes). Validation still runs; drift + history become read-only. Use referenceRunId for cross-run comparison in this mode, or grant your token Full Access under Running Actors to unlock persistent baselines.

Advanced options

Everything above is the core workflow. The features below are opt-in and activate only when you configure them.

  • Canaries — up to 10 named test scenarios, each with its own input, weight, and requiredFields. Each gets an independent pass/fail verdict. Output Guard also emits a fleet-wide canaryCoverage map showing which critical fields are exercised by ≥1 canary and which have zero coverage (drift there will go undetected).
  • Policy templates — apply a pre-built policy set with one input: ecommerce-critical-fields, lead-gen-contact-integrity, ai-ready-output, strict-schema-compatibility.
  • Custom policiespolicies: [{ name, conditions, onBreach: { severity, actions } }] — unified rule + action objects.
  • Auto-actions on breachdisableActor, pauseSchedule, triggerActor, triggerDeployGuard, or webhook. Each supports dryRun + actionReason.
  • Field rules — per-field severity, required, expectedType, maxNullRate, maxEmptyRate, pattern (regex).
  • Field dependenciesfieldDependencies: [{ field, requires: [...] }] checks that when A is present, B and C are also present.
  • SLA + 30-day compliancesla: { minQualityScore, maxNullRate, minItems, maxResponseTime } with a rolling 30-day pass-rate computed automatically.
  • Approved baselines — set approveBaseline: true on a known-good run to lock it as the comparison target; use autoPromoteAfterStableRuns: N to auto-promote after N consecutive healthy runs.
  • Fleet mode — pass additionalActorIds + fleetConfig (maxFleetSpend, maxActorsPerRun, stopOnCritical, prioritiseByRisk) to scan N actors in one run.
  • Backfill mode — set datasetId instead of testInput to validate an existing dataset without re-running the target actor (zero target-actor compute cost).
  • Diff modemode: "diff" + compareActorId compares two actors' declared schemas for migration planning.
  • Quick start / auto-configquickStart: true or autoConfig: true auto-detects critical fields and generates sensible rules and policies from the output.
  • Strict modestrictMode: true (or the legacy alias guardMode: true) turns on strict enforcement: when the verdict is fail, Output Guard triggers Deploy Guard and emits a blocking signal for downstream gating. Output Guard itself does not block deployments — that responsibility belongs to Release Gate or your own CI pipeline consuming the signal.

fleetSignals[] contract

Output Guard's canonical integration surface for Fleet Analytics, dashboards, and automation. This is the field you branch on in downstream consumers — do not parse prose from explanation / oneLine / whyNow for routing decisions.

Each signal has this shape:

{
"code": "OUTPUT_NULL_SPIKE",
"severity": "critical",
"confidence": 0.9,
"scope": "field",
"actionability": "high",
"field": "email",
"delta": { "nullRate": 0.61, "fieldSeverity": "critical" },
"detail": "email null rate 61%"
}

code — stable enum string. Current vocabulary:

CodeMeaning
OUTPUT_NULL_SPIKEA field's null rate crossed the critical threshold
OUTPUT_COVERAGE_DROPItem count dropped materially (≥20%) vs baseline
OUTPUT_TYPE_DRIFTField type changed (string→array, number→string, …)
OUTPUT_SCHEMA_MISMATCHOutput violates the declared dataset_schema.json
OUTPUT_UNDECLARED_FIELDSOutput has fields not in the declared schema
OUTPUT_MISSING_REQUIREDRequired schema field absent from all items
OUTPUT_FIELD_RULE_VIOLATIONUser-defined field rule breached
OUTPUT_CANARY_FAILA canary scenario failed
OUTPUT_CANARY_GAPCritical fields exist that NO canary exercises — drift there is invisible to the canary suite
OUTPUT_DRIFT_NEW_FIELDNew field appeared vs baseline
OUTPUT_DRIFT_MISSING_FIELDField disappeared vs baseline
OUTPUT_DRIFT_FIELD_COUNT_CHANGEField count changed vs baseline
OUTPUT_DIST_SHIFT_CARDINALITYUnique-value count shifted materially
OUTPUT_DIST_SHIFT_DOMINANT_VALUEA new value now dominates the field
OUTPUT_DIST_SHIFT_NUMERIC_RANGENumeric range shifted >2σ
OUTPUT_DIST_SHIFT_TOP_VALUETop value for a categorical field changed
OUTPUT_SLA_BREACHAn SLA threshold was crossed
OUTPUT_COST_ANOMALYDuration or compute-per-item anomaly vs baseline
OUTPUT_INCIDENT_ACTIVEAn active incident (detected/confirmed/recovering/reopened) exists

Codes are additive and stable — new codes may be introduced; existing codes will not be renamed or repurposed within a major version. Treat this list as an enum and fall through to a default branch for unknown codes.

severitycritical / warning / info. Drives routing priority but not automation — combine with actionability below.

confidence0.0 to 1.0. How strongly Output Guard believes this signal is real (as opposed to noise, small-sample artefact, or coincidence). Capped at 0.7 during coldStart / baselineSeeded baseline states — see cold-start semantics.

scope — where the signal applies:

  • field — one specific field (use signal.field)
  • run — run-level signal (item count, SLA, incident)
  • build — build-scoped signal (schema compatibility, diff mode)

actionability — how confident Output Guard is that a human or automation should act now:

  • high — do something this run (fix selector, investigate coverage drop, ack the incident)
  • medium — watch the next 1–2 runs before acting
  • low — informational; no action expected

field? — field name when scope === 'field'. Absent for run/build-scoped signals.

delta? — machine-readable comparison data (current value, baseline value, counts). Shape varies per code — check the code enum above for what to expect.

detail? — optional human-readable sentence. For display only — automation should read code + delta.

Versioning: the signal vocabulary follows semver-ish discipline within this actor. Additions are minor; renames and removals only happen at major-version boundaries (which are also changelogged and called out in the actor description).

How Output Guard compares to data quality tools (Great Expectations, Monte Carlo, Soda, Elementary)

Most data quality tools are designed for warehouse / ETL layers — they validate tables in your data warehouse after the data has been loaded. Output Guard validates at a different layer: the actor output layer, before data ever reaches the warehouse.

ConcernGreat Expectations / Soda / ElementaryMonte Carlo / BigeyeOutput Guard
Where it runsAfter ETL, against warehouse tablesMetadata layer, observes query logsAfter the actor run, against raw dataset
Input shapeSQL tables / dbt modelsBigQuery, Snowflake, Redshift metadataApify actor dataset items
Detects field-level regressions❌ (data already landed)✅ (primary use case)
Detects coverage / volume drops❌ (sees only what loaded)⚠️ volume only, no context✅ with OUTPUT_COVERAGE_DROP signal
Classifies failure mode⚠️ anomaly, no cause✅ (selector_break / pagination_failure / schema_drift / partial_extraction / upstream_structure_change + confidence)
Runs whereCI / Airflow DAG / dbt CloudSaaS observability layerApify platform (same account as the actor)
SQL / data-warehouse required
Requires pipeline integration✅ extensive✅ connectors❌ zero integration — runs alongside the actor
Cost modelPer-connector / seat / queryPer-warehouse / per-row$4 per validation check

Why warehouse tools fundamentally cannot solve actor output failures. Great Expectations, Monte Carlo, Soda, and Elementary all sit in the warehouse layer — they see data after it has already been collected and loaded. If an actor's selector, API call, or prompt regresses and starts returning null fields or shifted types, the warehouse still receives those rows and treats them as valid. If volume drops halfway through the run, the warehouse receives a "complete" but smaller batch. The warehouse has no way to know what the actor should have returned, so it can't detect the regression. Output Guard sits at the source — before the warehouse, before the ETL — and validates the actual actor output against a baseline that knows the expected shape. That's a different problem, and it needs a tool built for that layer.

Output Guard validates the output of Apify actors after each run — replacing the custom validation scripts you would otherwise write to catch null spikes, missing fields, and coverage drops before the bad data reaches your warehouse or LLM pipeline. If your data source is Apify actors, Output Guard is the production-grade version of those scripts with drift tracking, incident lifecycle, failure-mode classification, and channel-aware alerts built in.

When to use both: run Output Guard at the actor output layer to catch failures before the data lands. Run Great Expectations / Soda / Monte Carlo at the warehouse layer to catch transformation issues after. The two are complementary — Output Guard is upstream of the ETL, warehouse tools are downstream.

What Output Guard does NOT do

Output Guard is obsessively scoped to post-run output failure detection on live actor data. Everything else is a separate tool in the same fleet. Reach for the right one:

NeedUse this instead
Validate input before calling the actorInput Guard
Run a curated test suite against synthetic inputs before promoting a buildDeploy Guard
Score actor README, SEO metadata, pricing config, or input-schema hygieneQuality Monitor
Recommend the right PPE price or benchmark pricing against cohortsPricing Advisor
Track fleet-wide spending and detect compute-cost spikesCost Watchdog
Audit actor output for PII, GDPR, or TOS complianceCompliance Scanner
Synthesize a portfolio-wide action plan or forecast fleet revenueFleet Analytics
Find unmet demand / market gaps across Apify Store categoriesMarket Gap Finder
Compare two actors side-by-side on runtime outputA/B Tester
Analyse competitor actors or benchmark against rivals on the StoreCompetitor Scanner
Gate a release with pre-deploy checksRelease Gate
Orchestrate Input + Deploy + Output Guards as a pipelineGuard Pipeline
Validate multi-actor workflows or generate pipeline codePipeline Builder
Debug an MCP server's standby connectionMCP Debugger

Output Guard does emit a structured fleetSignals[] array that Fleet Analytics consumes to assemble its cross-actor plan — this is the integration seam between output-level detection (here) and portfolio-level synthesis (there).


Full input parameters

ParameterTypeRequiredDefaultDescription
targetActorIdstringYesapify/rag-web-browserActor ID or username/actor-name to validate
modestringNovalidatevalidate (one-off), monitor (drift + alerts), diff (compare schemas)
testInputobjectNo{}Input JSON to run the target actor with. Keep minimal to save compute.
datasetIdstringNoValidate an existing dataset without running the target actor (backfill mode)
canariesarrayNoTest scenarios: name, input, weight, requiredFields (max 10)
fieldRulesobjectNo{}Per-field rules: severity, maxNullRate, maxEmptyRate, expectedType, pattern, required
globalRulesobjectNo{}Actor-level rules: minItems, maxItems
policiesarrayNoUnified rules: name, conditions, onBreach: { severity, actions }
policyTemplatesarrayNoBuilt-in templates: ecommerce-critical-fields, lead-gen-contact-integrity, ai-ready-output, strict-schema-compatibility
compareActorIdstringNoSecond actor for diff mode
enableDriftTrackingbooleanNofalseTrack baselines across runs. Auto-enabled in monitor mode.
baselineStrategystringNopreviousRunpreviousRun, lastGood, approved, rollingMedian7, rollingMedian30, weekdaySeasonal
approveBaselinebooleanNofalseSave this run as the approved baseline
resetBaselinebooleanNofalseClear all stored baselines
referenceRunIdstringNoValidate against a specific known-good Apify run ID instead of stored baselines. Works without persistent storage.
alertWebhookUrlstringNoWebhook URL for quality degradation alerts. Auto-formats as Slack Block Kit, Discord embed, or plain JSON based on the URL.
alertPayloadModestringNooperatoroperator / compact / executive / json_only
nullRateThresholdnumberNo0.2Alert when null rate exceeds this (0.0–1.0)
minConsecutiveFailuresForAlertintegerNo1Consecutive failures before alerting (1–10)
cooldownMinutesintegerNo0Min minutes between alerts for same issue (0–1440)
timeoutintegerNo300Max seconds for target actor run (10–3600)
memoryintegerNo512Memory in MB for target actor run (128–32768)
maxSampleItemsintegerNo1000Items to analyse (10–10,000)
additionalActorIdsarrayNoAdditional actor IDs for fleet mode
fleetConfigobjectNoFleet settings: maxFleetSpend, maxActorsPerRun, stopOnCritical, prioritiseByRisk
quickStartbooleanNofalseAuto-configure with sensible defaults
strictModebooleanNofalseStrict monitoring: drift + confidence gate + auto-actions. Emits a blocking signal on fail for Release Gate / CI to consume.
guardModebooleanNofalseLegacy alias of strictMode — either input enables strict mode. Retained for back-compat.
autoConfigbooleanNofalseAuto-detect fields and generate rules from output
autoPromoteAfterStableRunsintegerNoAuto-promote the current baseline to approved after N consecutive healthy runs

Full output fields

FieldTypeDescription
oneLinestringScannable one-line takeaway (verdict, score, top finding, item count, confidence band)
confidenceLevelstringhigh (≥75), medium (≥50), low (<50)
scoreBreakdownobject{structure, completeness, drift, confidenceAdj} decomposed subscores
verdictReasonCodesstring[]Stable machine-friendly codes (OUTPUT_NULL_SPIKE, OUTPUT_TYPE_DRIFT, …)
fleetSignalsarrayStable-coded SIGNALS[] contract for Fleet Analytics: {code, severity, confidence, scope, actionability, field?, delta?, detail?}
baselineStatestringcoldStart / baselineSeeded / shadowMonitoring / enforcedMonitoring
decisionstringOne-field operator answer to "do I act right now?". act_now / monitor / ignore. Derived from verdict + confidence + baselineState. Branch on this in Slack / PagerDuty / CI pipelines / agent tool calls instead of parsing prose.
decisionReasonstringOne-line explanation of why decision landed where it did (e.g. "fail verdict + confidence 82 + enforcedMonitoring baseline — act now"). Makes automation and debug traces self-explanatory.
confidenceFactorCodesstring[]Machine-readable tags explaining the confidence score: cold_start_cap, low_sample_size, small_history, high_baseline_variance, baseline_mismatch, no_canaries, no_schema_declared, restricted_permissions, reference_run_used, healthy_history, recent_incident_volatility. Stable enum within a major version.
referenceDiffobject | nullPer-field current vs reference diff when referenceRunId is set
canaryCoverageobject | nullPer-canary fieldsCovered, criticalPathCoverageScore, uncoveredFields, canaryGaps
actorNamestringFull name of the validated actor
actorIdstringActor ID provided as input
modestringvalidate / monitor / diff
verdictstringpass (80+), warn (50–79), fail (0–49)
qualityScorenumber0–100
consumerReadinessScorenumber0–100 — how reliably downstream systems (LLMs, ETL, dashboards) can consume the CURRENT output without additional normalization. Measures value-level consistency across items, parseability, and structural predictability. Not about declared-schema design — that's Quality Monitor.
aiReadinessScorenumberDeprecated alias of consumerReadinessScore. Retained so existing consumers don't break; new code should read consumerReadinessScore. Will be removed in a future major version.
explanationstringHuman-readable summary of the verdict
recommendationsstring[]Actionable fix suggestions
schemaFoundbooleanWhether the target has a declared dataset schema
schemaFieldsnumberField count in the declared schema
outputFieldsnumberUnique field count in actual output
totalItemsnumberItems analysed
mismatchesarraySchema type mismatches with path, expected, actual, severity
undeclaredFieldsstring[]Fields in output not declared in schema
missingRequiredstring[]Required schema fields absent from output
completenessarrayPer-field null/empty rate, severity, status (healthy / degraded / critical / feature-gated / insufficient-data), sampleConfidence, sample rows
driftobjectDrift signals (core + shape / semantic / freshness / coverage / dominance)
fieldRuleViolationsarrayCustom field rule breaches
executiveSummaryobjectOne-line status, reason, recommended action
failureModeobjectClassified cause with confidence, evidence[], counterEvidence[]
runClassificationstringhealthy / recovering / degrading / broken
riskProfileobjectRisk level, failure frequency, drift frequency, stability score
confidenceScorenumber0–100 confidence in the assessment
trustScoreobjectComposite trust score with level (trusted / cautious / unreliable)
silentFailureRisknumber0–100 likelihood of undetected degradation
fieldDistributionsarrayPer-field cardinality, top values, numeric stats
distributionShiftsarrayDetected value distribution changes from previous run
incidentsarrayFull 7-state lifecycle with narratives, affected fields, severity trend, recommended action, recovery proof
correlationsarrayCross-run patterns: recurring failures, post-deploy regressions, progressive degradation
canaryResultsarrayPer-canary pass/fail, item count, score, issues, fieldsCovered
slaResultobjectSLA compliance with breaches and 30-day compliance rate
policyBreachesarrayViolated policies with breached conditions and executed actions
autoActionsExecutedarrayEach result: {type, target, success, error?, dryRun?, reason?}
costAnomalyobjectDuration + compute-per-item anomaly vs baseline with currentSecondsPerItem, baselineSecondsPerItem, itemCountDelta
runDurationnumberTotal validation time in seconds
alertSentbooleanWhether a webhook alert was dispatched
validatedAtstringISO 8601 timestamp

Output example

{
"oneLine": "WARN 68/100 · 3 critical fields · 47 items · medium confidence",
"whyNow": "item count dropped 28% vs baseline (47 vs 65); 3 critical fields degraded",
"decision": "monitor",
"decisionReason": "warn verdict + confidence 72 + enforcedMonitoring baseline — monitor before acting",
"confidenceLevel": "medium",
"confidenceFactorCodes": ["small_history", "no_canaries"],
"scoreBreakdown": { "structure": 94, "completeness": 68, "drift": 88, "confidenceAdj": 0 },
"verdictReasonCodes": ["OUTPUT_NULL_SPIKE", "OUTPUT_FIELD_RULE_VIOLATION"],
"fleetSignals": [
{
"code": "OUTPUT_NULL_SPIKE",
"severity": "critical",
"confidence": 0.9,
"scope": "field",
"actionability": "high",
"field": "language",
"delta": { "nullRate": 0.45, "fieldSeverity": "critical" },
"detail": "language null rate 45%"
}
],
"baselineState": "enforcedMonitoring",
"referenceDiff": null,
"canaryCoverage": null,
"actorName": "ryanclinton/github-repo-search",
"verdict": "warn",
"qualityScore": 68,
"explanation": "3 fields with high null rates.",
"recommendations": [
"Investigate 3 fields with critically high null rates: language, license, homepage",
"Check extraction for 'language' — 45% null rate suggests the GitHub API stopped populating it for some results"
],
"completeness": [
{
"field": "language",
"nullRate": 0.45,
"emptyRate": 0.02,
"fieldSeverity": "critical",
"status": "critical",
"sampleNulls": [{ "index": 0, "value": null }],
"sampleValues": [{ "index": 1, "value": "Python" }]
}
],
"failureMode": {
"mode": "partial_extraction",
"confidence": 0.6,
"evidence": ["Single field degraded while others remain healthy"],
"counterEvidence": ["4 healthy fields suggest the extraction path is mostly intact"]
},
"incidents": [
{
"id": "language-null-spike",
"status": "confirmed",
"severity": "critical",
"description": "language null rate at 45%",
"occurrences": 3,
"severityTrend": "worsening",
"affectedFields": ["language"],
"recommendedNextAction": "Inspect language extraction — likely an upstream API regression.",
"blastEstimate": "May affect 2 downstream consumer(s)",
"narrative": "Output Guard confirmed incident on 2026-04-22: phone null rate at 45%. Likely cause: selector change or data source issue. Affected fields: phone. Seen 3 time(s) across recent runs. Trend: worsening. Recommended next action: Inspect phone extraction — likely a selector/parser regression."
}
],
"runDuration": 45.2,
"alertSent": true,
"validatedAt": "2026-04-22T14:30:00.000Z"
}

How much does it cost

Output Guard uses pay-per-event pricing — you pay $4.00 per Output Guard check. Platform compute is included. The target actor's own compute is billed separately at standard Apify rates.

ScenarioChecksCost per checkTotal cost
Quick test1$4.00$4.00
Weekly monitor (1 actor)4 / month$4.00$16 / month
Daily monitor (1 actor)30 / month$4.00$120 / month
Daily fleet (5 actors)150 / month$4.00$600 / month
Daily fleet (20 actors)600 / month$4.00$2,400 / month

Fleet mode charges $4.00 per actor scanned. A fleet of 10 actors costs $40 per run. Use fleetConfig.maxFleetSpend to set a hard budget cap — Output Guard stops scanning once the cap is reached. Set a spending limit in your Apify account settings to prevent unexpected charges. Output Guard is priced for production-critical pipelines where a single silent failure can corrupt thousands of downstream records — one caught regression typically pays for months of monitoring.

Call from the API

Python:

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/actor-schema-validator").call(run_input={
"targetActorId": "ryanclinton/github-repo-search",
"mode": "monitor",
"testInput": {"query": "web scraping language:python", "maxResults": 3},
"baselineStrategy": "rollingMedian7",
"fieldRules": { "stars": {"severity": "critical", "maxNullRate": 0.0, "expectedType": "number"} },
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"[{item['decision'].upper()}] {item['actorName']}{item['oneLine']}")
# Example: "[ACT_NOW] ryanclinton/github-repo-search — FAIL 38/100 · 3 critical fields · 3 items · high confidence"

JavaScript:

import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/actor-schema-validator").call({
targetActorId: "ryanclinton/github-repo-search",
mode: "monitor",
testInput: { query: "web scraping language:python", maxResults: 3 },
baselineStrategy: "rollingMedian7",
fieldRules: { stars: { severity: "critical", maxNullRate: 0.0, expectedType: "number" } },
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
// Branch on decision for automation — no prose parsing needed
items.forEach(i => {
console.log(`[${i.decision.toUpperCase()}] ${i.actorName}${i.oneLine}`);
if (i.decision === 'act_now') {
// route to PagerDuty / CI block / Slack critical channel
}
});

cURL:

curl -X POST "https://api.apify.com/v2/acts/ryanclinton~actor-schema-validator/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"targetActorId": "ryanclinton/github-repo-search",
"mode": "validate",
"testInput": { "query": "web scraping language:python", "maxResults": 3 }
}'

Integrations

  • Zapier — Trigger Output Guard on a schedule and route alerts to Slack, email, or PagerDuty.
  • Make — Build multi-step workflows: Output Guard detects failure → Make notifies team → creates Jira ticket.
  • Google Sheets — Export quality scores to a spreadsheet for fleet-wide dashboards.
  • Apify API — Trigger Output Guard from CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins).
  • Webhooks — Connect Output Guard alerts to any HTTP endpoint for custom incident response.

The Guard Pipeline

Output Guard is one stage of a three-stage quality pipeline:

StageGuardWhat it prevents
Before runInput GuardBad input wasting runs and credits
Before deployDeploy GuardBroken builds reaching production
After deployOutput GuardSilent data failures in production

The pipeline is orchestrated by Guard Pipeline. All three Guards share a per-actor quality profile stored in a named KV store (aqp-{actorslug}) so each stage can read the prior stage's state and the next stage's requirements.

Limitations

  • Requires target actor availability — If the target actor is broken, unreachable, or behind authentication Output Guard cannot access, validation fails.
  • Schema validation requires a dataset schema — Without a declared dataset_schema.json, Output Guard does structural analysis only (completeness, type consistency, distributions, drift) but cannot detect undeclared or missing fields.
  • Drift requires 2+ runs — The first monitor run seeds a baseline. Drift detection activates from the second run. Rolling-median and weekday-seasonal strategies need more history to be useful (typically 7+ and 14+ runs respectively).
  • Sample-based validation — Up to 1,000 items by default (max 10,000). Issues in items beyond the sample window may go undetected.
  • Fleet mode charges per actor — Each actor in a fleet scan costs $4.00. Set fleetConfig.maxFleetSpend to cap spend per scan.
  • Canary limit of 10 scenarios — Each canary runs the target actor separately, so cost and time scale linearly.
  • No real-time monitoring — Output Guard is a batch actor. For continuous monitoring, schedule it on a cadence.
  • Backfill mode does not detect runtime issues — Validating an existing dataset checks data quality but cannot detect timeout, memory, or performance issues from the original run.

Troubleshooting

  • Target actor times out during validation. Increase timeout (default 300s). Output Guard adds a 60-second buffer beyond the configured timeout before declaring a wall-clock timeout.
  • Quality score is 0 with no items returned. The target actor returned an empty dataset. Run the target manually first to check the test input. Verify targetActorId is username/actor-name format.
  • Drift shows "No baseline" on every run. Drift tracking must be explicitly enabled with enableDriftTracking: true or by using monitor mode. Validate mode does not save baselines by default.
  • Alert webhook not firing. Check alertSent, alertSuppressed, and alertSuppressionReason in the output — alerts may be held back by cooldown, consecutive-failure threshold, or fingerprint dedup. Alerts only fire on warn or fail verdicts.
  • Fleet mode stops before scanning all actors. Check fleetConfig.maxFleetSpend and fleetConfig.maxActorsPerRun. The fleet stops when either budget or actor limit is reached. Output includes actorsSkipped with the specific reason per skipped actor.
  • Restricted-permission token — fields are missing. Output Guard detects LIMITED_PERMISSIONS and skips drift history, baselines, and the SUMMARY KV key. Use referenceRunId for cross-run comparison in this mode, or grant the token Full Access under Running Actors for full features.

FAQ

Does Output Guard run my target actor? Yes. It calls the target via Actor.call() under your account at standard compute rates. The $4.00 PPE charge covers Output Guard's analysis only, not the target's compute.

Can I use it without a dataset schema on the target actor? Yes. Without a declared schema, Output Guard does structural analysis (completeness, type consistency, distributions, drift) and all intelligence features. Schema-specific checks (type mismatches, undeclared fields) are skipped.

How does the quality score work? Starts at 100 and deducts points in three categories (structure, completeness, drift). scoreBreakdown exposes each subscore separately. Pass ≥80, warn 50–79, fail <50.

What failure modes does it classify? Six: selector_break, upstream_structure_change, pagination_failure, throttling, partial_extraction, schema_drift. Each ships with confidence, evidence[], counterEvidence[].

How is it different from Deploy Guard? Deploy Guard is pre-deploy — it runs test suites against synthetic inputs. Output Guard is post-deploy — it validates live production output with drift tracking, incidents, and auto-actions. They complement each other: Deploy Guard catches breakage before release; Output Guard catches silent degradation after.

Can it automatically disable a broken actor? Yes. Configure autoActions.onCritical with disableActor, pauseSchedule, triggerActor, or a webhook action. Every action supports dryRun: true to rehearse first.

How often should I schedule it? Daily for production actors ($120/month per actor). Hourly for high-value pipelines (pricing, lead gen). Weekly for infrequently-updated actors ($16/month per actor). Output Guard is designed for scheduled recurring use — drift detection, incident lifecycle, and trend analysis all improve with each run, so a scheduled monitor is meaningfully more valuable than one-off checks.

Responsible use

  • Output Guard validates the output of other Apify actors by running them with user-provided input. It does not bypass authentication, CAPTCHAs, or access restricted content.
  • Users are responsible for ensuring the target actors they validate comply with applicable laws, platform terms, and data protection regulations.
  • Do not use Output Guard to validate actors that access content or APIs you are not authorised to use.
  • For guidance on responsible actor usage, see Apify's documentation.

Help us improve

If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:

  1. Go to Account Settings → Privacy.
  2. Enable Share runs with public Actor creators.

This lets us see your run details when something goes wrong. Your data is only visible to the actor developer, not publicly.

Support

Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.