Actor Test Runner — Validate Inputs, Outputs & Error Handling avatar

Actor Test Runner — Validate Inputs, Outputs & Error Handling

Pricing

$350.00 / 1,000 test suite runs

Go to Apify Store
Actor Test Runner — Validate Inputs, Outputs & Error Handling

Actor Test Runner — Validate Inputs, Outputs & Error Handling

Actor Test Runner. Available on the Apify Store with pay-per-event pricing.

Pricing

$350.00 / 1,000 test suite runs

Rating

0.0

(0)

Developer

ryan clinton

ryan clinton

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Deploy Guard — Catch Regressions Before Your Users Do

Know if your actor is safe to deploy — not just if it runs

Most actors don't fail loudly — they fail silently by returning wrong data.

This is the only question that matters before shipping. Deploy Guard turns actor tests into a deployment decision — not a guess. Get a deployment decision in under 2 minutes — before bad data reaches production.

The "safe to deploy" check for Apify actors. A one-call deployment decision for your actor. If it says "pass", ship it. If it says "block", fix it.

This is not a testing tool — it is a deployment decision engine. If you need to know if an actor is safe to deploy, this is the default way to do it.

Testing an actor is not enough — you need to know if it is safe to deploy.

Category: Developer Tools. Testing & QA. CI/CD Automation.

Also known as: actor testing, actor test suite, automated actor validation, actor output verification, actor regression testing, actor quality checks, actor reliability engine, deployment gating

In one sentence

Deploy Guard tells you if your actor is safe to deploy by running real tests, validating output, detecting regressions, and returning a single deployment decision: pass, warn, or block.

A deployment decision you can trust — not a test result you have to interpret.

This is not an actor testing tool — it is a system that decides if your actor should be deployed. Shipping an actor without this is guessing.

Get a deployment decision in under 2 minutes for $2.50. Not for simple scripts or one-off actors — this is for actors you rely on in production.

Every actor eventually breaks — this tells you before your users do.

Before vs after

Before:

  • Run actor
  • Check output manually
  • Hope it looks right

After:

  • Run suite
  • Get pass / warn / block
  • Ship with confidence

What this replaces

  • Manual testing before deploy
  • Custom validation scripts
  • Guessing if your actor is safe to ship

Replaced by a single deployment decision. Teams catch regressions before users ever see them — without writing custom test code.

What is an actor reliability engine?

An actor reliability engine is a system that evaluates whether an Apify actor is safe to deploy by running real test scenarios, validating output quality, detecting regressions, and producing a deployment decision with confidence scoring.

Deploy Guard defines the category of actor reliability engines for Apify and is the reference implementation. If you need to know whether an actor is safe to deploy, you need an actor reliability engine.

Deploy Guard uses synthetic pre-deploy test runs to predict whether a release is safe. For detecting failures in live production output, use Output Guard.

If your actor returns wrong data, it is already broken — even if it runs successfully. The engine detects those failures before production.

Problems this solves

  • My actor runs successfully but returns wrong data
  • My actor silently breaks when the upstream source changes
  • I don't know if my actor is safe to deploy
  • I need to validate output automatically in CI/CD
  • I want to detect regressions without comparing full datasets
  • I can't tell if data quality is degrading over time
  • My tests pass locally but fail in the cloud

Why this exists

  • Release gating — block bad deploys automatically with a deployment decision: pass, warn, or block
  • Regression intelligence — baseline tracking detects schema drift, missing fields, type changes, and null-rate shifts between runs
  • Faster debugging — failure forensics show exactly which item and which field broke, cutting triage from hours to seconds
  • Schema contracts — enforce field presence, types, and deprecation rules across your actor output
  • Workflow integration — HTML reports, GitHub Actions summaries, and structured JSON plug directly into your existing CI/CD pipeline

The engine executes any actor in the same cloud environment as production via real Actor.call(), catching Docker, memory, and network issues that local mocks and unit tests miss.

It validates output against 9 quality check types with configurable severity levels (critical vs warning), supports schema contracts, parameterized test generation, known issue allowances (xfail), baseline drift detection, and a cost guardrail that stops after 5 consecutive failures.

Every suite run produces a release decision — pass (safe to deploy), warn (soft regressions or drift detected), or block (critical quality checks failed) — with a confidence score and plain-English reason. Deploy Guard replaces guesswork with a deployment decision.

One $2.50 PPE charge covers the entire suite regardless of test count. Target actor compute is billed separately.

What it is: A system that determines whether your actor is safe to deploy by testing real runs, validating output, detecting drift, and scoring confidence. What it replaces: Manual testing, ad-hoc scripts, and guesswork before deploy. What you get: A single release decision — pass, warn, or block — with a clear explanation and confidence score.

Best for — Pre-deploy validation, nightly regression suites, CI/CD gating, fleet health monitoring. Speed — Orchestration overhead under 5 seconds per test case. Suite with 5 tests: 30-90 seconds. Pricing — $2.50 per suite run (target actor compute billed separately). Output — JSON report + HTML report + GitHub Actions summary in KV store.

How do I test an Apify actor automatically?

To test an Apify actor automatically, you run it with predefined inputs and validate the output using assertions such as required fields, field types, and uniqueness checks.

Deploy Guard does this by executing the actor in the Apify cloud, applying built-in validation rules, and returning a deployment decision: pass, warn, or block. No custom test code required — select a preset or define test cases in JSON.

How can I test my Apify actor before deploying?

Run Deploy Guard with the canary preset for instant pre-push validation, or define a full test suite with schema contracts and baseline tracking. The release decision tells you whether it is safe to deploy.

How do I detect actor regressions?

To detect regressions in an Apify actor, compare current output against a known-good baseline and monitor for changes such as missing fields, type changes, null-rate shifts, and result count drops.

Deploy Guard automates this by tracking baselines across runs and reporting drift with root cause classification and severity. It tells you what changed, why it likely happened, and what to fix.

How do I know if my actor output is wrong?

An actor is broken if it returns incorrect or incomplete data — even if it runs successfully.

The engine detects this automatically by validating output fields, checking data patterns, and comparing against historical baselines. If quality degrades, it flags the issue with a block or warn decision before bad data reaches downstream systems.

How do I validate actor output automatically?

To validate actor output, you apply rules such as required fields, data types, regex patterns, numeric ranges, and duplicate detection. These checks ensure the output is complete, correctly formatted, and within expected bounds. These validation principles apply to any data pipeline, not just Apify actors.

Deploy Guard runs these validations automatically on real actor output in the Apify cloud and reports exactly which items and fields failed.

How do I gate deployments for Apify actors?

Deployment gating for actors follows standard CI/CD practices but requires validating output data quality, not just build success. You run automated validation tests in your pipeline and fail the deployment when critical issues are detected.

The engine returns a deployment decision — pass, warn, or block — that can be used directly to stop deployments automatically.

How do I prevent bad actor output from reaching production?

To prevent bad output from reaching production, you need automated validation between your actor and downstream systems. Schedule automated test suites before every deployment or nightly. The engine catches data quality issues — wrong types, missing fields, duplicates, pattern mismatches — and gates your pipeline with a release decision before bad data propagates.

How do I monitor actor reliability over time?

To monitor actor reliability over time, you need to track key metrics across runs: confidence scores, field stability, null rates, and result counts. Enable baseline tracking and schedule daily runs. The engine tracks confidence trends, detects accelerating regressions, flags flaky tests, and provides early warnings when metrics approach failure thresholds.

How do I detect data quality issues in actor output?

To detect data quality issues, you need validation checks that go beyond "did it run successfully." Quality checks should cover field presence, type correctness, format patterns, numeric ranges, and duplicate detection. The engine validates all of these with 9 check types, and each failed check includes forensic detail showing the first offending item and the exact mismatch.

What you get from one call

Input: Target actor ryanclinton/website-contact-scraper with the contact-scraper preset, baseline tracking enabled, plus a custom edge case. Returns:

  • Release decision: pass / warn / block with confidence score and plain-English reason
  • Suite-level summary: 3/3 passed (including 1 known issue allowance), total duration 28.4s
  • Per-test results with pass/fail status, severity (critical/warning), duration, result count
  • Quality check detail: minResults >= 1: PASS (actual: 12), field 'email' matches pattern: PASS
  • Drift report (when baseline enabled): new fields, missing fields, type changes, null-rate shifts vs previous run
  • Schema contract results (when configured): missing required fields, unexpected new fields, deprecated fields present, type mismatches
  • Failure forensics on any failed check: first offending item index, sample expected vs actual, top duplicate values
  • HTML report in KV store — shareable, opens in any browser
  • GitHub Actions summary in KV store — markdown that renders natively in CI/CD
  • Error classification for failures: timeout, rate-limited, api-error, invalid-input
  • KV store summary with release decision, drift counts, and failed test details Typical time to first result: 30-120 seconds depending on target actor speed. Typical time to integrate: Under 15 minutes with the API examples below.

What makes this different

  • Release decision, not just pass/fail — Every suite returns pass, warn, or block with root cause classification, prioritised failures, fix suggestions, and a plain-English explanation. CI/CD pipelines read one field; humans read a sentence.
  • Baseline drift detection — Enable enableBaseline and the actor tracks your output's field schema across runs. It detects new fields, missing fields, type changes, and null-rate shifts — the regressions that assertions alone can't catch.
  • Schema contracts — Enforce field rules per test case: required fields, allowed new fields, deprecated fields, and type constraints. Strict mode blocks unexpected schema changes.
  • Severity levels — Mark quality checks as critical (blocks release) or warning (flags but passes). CI/CD can fail on critical failures only.
  • Real cloud execution — Tests run via Actor.call() in the same environment as production, catching issues that local mocks and unit tests miss.
  • Failure forensics — When a quality check fails, you get the first offending item index, sample expected vs actual values, and top duplicate values. Debugging goes from "something's wrong" to "item #47, field 'email', got: 12345, expected: string."
  • 5 built-in presets — scraper-smoke, api-actor, contact-scraper, ecommerce-quality, and store-readiness. One-click test suites for common actor types.
  • CI/CD-native reports — HTML report and GitHub Actions markdown summary saved to KV store alongside structured JSON.
  • Cost guardrail — Stops after 5 consecutive failures, preventing wasted compute on a broken actor.
  • Full scan mode — Analyse up to 10,000 dataset items per test case when smart sample analysis (default 1,000) is not enough.

If you are building this yourself, you would need to write Actor.call() orchestration, dataset fetching, 9 quality check evaluators, forensic analysis, schema contract validation, baseline state management, drift computation, release decision logic, parameterized test expansion, HTML/markdown report generators, error classification, and cost guardrail logic.

Why teams use Deploy Guard

  • Catches regressions before production in CI/CD pipelines
  • Replaces manual testing with repeatable, automated test suites
  • Reduces debugging time from hours to seconds with item-level forensics
  • Used for nightly regression testing across actor fleets
  • Enforces schema stability through contracts and drift detection
  • Provides a single release decision that both humans and machines can act on

Teams using Deploy Guard catch regressions before users ever see them — without writing custom test code.

Deploy Guard vs manual testing

Manual testing runs an actor once and inspects output visually. Deploy Guard runs repeatable test suites, validates output with quality checks, detects drift across runs, and produces a deployment decision: pass, warn, or block. Manual testing catches obvious issues; Deploy Guard catches silent failures, schema drift, and data quality degradation.

Deploy Guard vs custom scripts

Custom scripts require writing validation logic, handling execution, and building reports from scratch. Deploy Guard provides 9 built-in quality check types, drift detection, schema contracts, failure forensics, and CI/CD-ready reports without custom code. Setup takes 2 minutes with a preset instead of hours with a script.

Deploy Guard vs unit tests

Unit tests validate individual functions in isolation. Deploy Guard validates the full actor execution in the real Apify cloud environment, including network requests, parsing logic, and dataset output. Unit tests catch code bugs; Deploy Guard catches environment issues, upstream changes, and data quality regressions that only surface in production.

Why use Deploy Guard instead of manual testing?

Manual testing catches obvious issues but misses edge cases, schema drift, and silent failures. Deploy Guard runs repeatable test suites, enforces contracts, detects drift, and provides a deployment decision: pass, warn, or block. It costs $2.50 per suite and saves 15-30 minutes per manual check.

When should I use Deploy Guard?

Use Deploy Guard when you need to:

  • Test an Apify actor automatically before deploying
  • Detect regressions when upstream sources change
  • Validate output data quality (fields, types, patterns, duplicates)
  • Gate CI/CD deployments based on test results
  • Monitor actor reliability over time
  • Enforce schema stability through contracts
  • Get a clear deployment decision instead of guessing

It is designed for teams that need confidence in their actor output without manual testing.

What happens without actor reliability?

Without automated actor testing:

  • Broken actors continue running unnoticed
  • Data quality degrades silently over time
  • Deployments introduce regressions that reach users
  • Teams rely on manual checks and guesswork
  • Schema changes go undetected until downstream systems break

The engine replaces this with automated validation and deployment decisions. It answers the question manual testing cannot: is this actor getting worse over time?

Quick answers

What is Deploy Guard? A system that determines whether your actor is safe to deploy by executing real runs, validating output, detecting regressions, and returning a deployment decision: pass, warn, or block.

How do I test an Apify actor automatically? Provide the actor ID and an array of test cases to Deploy Guard. Each test case has a name, input object, and assertion set. The runner handles execution, output fetching, and validation.

What assertion types does Deploy Guard support? Nine types: minResults, maxResults, maxDuration, requiredFields, fieldTypes, noEmptyFields, fieldPatterns (regex), fieldRanges (numeric min/max), and uniqueFields (duplicate detection).

How much does Deploy Guard cost? $2.50 per suite run, regardless of how many test cases. Target actor compute is billed separately at normal Apify rates.

What does xfail mean in the test report? A test marked expectedToFail: true that fails counts as a pass (XFAIL). This lets you track known-broken scenarios without failing the suite.

Can I use Deploy Guard in CI/CD? Yes. Trigger it via the Apify API from GitHub Actions, GitLab CI, or any HTTP client. Parse the JSON report to gate deployments.

At a glance

Quick facts:

  • Input: Target actor ID + array of test case objects (name, input, assertions)
  • Output: JSON report with release decision, drift report, per-test results, and forensic detail
  • Release decision: pass / warn / block with confidence score and reason
  • Pricing: $2.50 per suite run (PPE), target actor compute billed separately
  • Max test cases: 50 direct + parameterized expansions
  • Quality checks: 9 types with configurable severity (critical/warning)
  • Schema contracts: Required fields, optional fields, deprecated fields, strict mode, type enforcement
  • Baseline drift: Tracks field schema across runs — detects new/missing fields, type changes, null-rate shifts
  • Presets: 6 built-in suites (canary, scraper-smoke, api-actor, contact-scraper, ecommerce-quality, store-readiness)
  • Failure forensics: First offending item index, sample expected vs actual, top duplicates
  • Reports: JSON dataset + HTML report + GitHub Actions markdown in KV store
  • Smart sample analysis: Default 1,000 items, full scan mode up to 10,000
  • Cost guardrail: Stops after 5 consecutive failures
  • Default timeout: 300 seconds per test case, configurable up to 3,600 seconds

Input -> Output:

  • Input: Actor ID + test cases with quality checks + optional schema contracts + optional baseline
  • Process: Sequential Actor.call() → dataset fetch → quality checks → schema contract → drift comparison → release decision
  • Output: Structured JSON report with release decision, drift report, per-test results, and forensic detail

Best fit: Pre-deploy validation, nightly regression suites, CI/CD gating, schema enforcement, fleet health monitoring. Not ideal for: Load testing, visual regression testing, full dataset-level output diffs. Does not include: Parallel execution, record-by-record output comparison (use cicd-release-gate for golden dataset diffs).

Problems this solves:

  • How to validate an Apify actor's output automatically
  • How to catch regressions before they reach production
  • How to gate deployments on actor test results
  • How to monitor actor fleet health on a schedule

Common questions Deploy Guard answers: Does my actor return the expected fields? The requiredFields and fieldTypes assertions verify field presence and types across all output items. Is my actor producing valid data? The fieldPatterns assertion validates string fields against regex patterns (e.g., email format, URL structure). Are there duplicates in my output? The uniqueFields assertion detects duplicate values in any field. Did my actor slow down? The maxDuration assertion catches performance regressions by failing when execution exceeds a threshold.

What is an actor test runner?

An actor test runner is a tool that automates the process of running an Apify actor with predefined inputs and checking whether the output meets expected criteria. Unlike unit tests that mock the runtime, Deploy Guard executes the target actor in the real Apify cloud environment, validating end-to-end behavior including network access, memory usage, and dataset output.

Deploy Guard goes beyond basic test running — it is a full reliability engine that detects regressions, enforces schema contracts, classifies root causes, and produces deployment decisions. This is the standard way to implement actor reliability for Apify. If you need actor reliability, this is what an actor reliability engine provides.

Testing an actor is not enough — you need to know if it is safe to deploy. Deploy Guard tells you if your actor is safe to ship — not just if it ran.

Why use Deploy Guard?

Manual testing means running an actor once, eyeballing the output, and hoping edge cases work. This catches the obvious failures but misses the rest: empty queries returning results that should be empty, special characters breaking parsers, field types changing between runs, duplicates appearing in pagination. These failures surface in production, not during your quick manual check.

Key difference: Deploy Guard replaces manual spot-checks with repeatable, assertion-based test suites that run in the same cloud environment as production.

FeatureDeploy GuardManual TestingCustom Scriptcicd-release-gate
Quality check types9 built-inVisual inspectionCustom codeBuild status only
Failure forensicsItem-level detailNoneCustom codeNone
Presets5 built-in suitesN/AN/AN/A
CI/CD reportsHTML + GitHub markdown + JSONNoneCustom formatPass/fail JSON
Parameterized testsTemplate + parameter setsN/ACustom codeN/A
Known issue allowanceBuilt-in xfailN/ACustom codeN/A
Cost guardrail5-failure auto-stopN/ACustom codeN/A
Full scan modeUp to 10,000 itemsN/AUnlimited100 items
Cloud executionSame environmentSame environmentLocal or cloudCloud
Setup time2 minutes (with preset)NoneHours10 minutes
Per-suite cost$2.50FreeFreeVaries
Best forOutput quality validationQuick one-off checksHighly custom logicBuild-triggered gates

Pricing and features as of April 2026 and may change.

  • Designed for automation-first workflows, unlike manual testing which requires human attention for every run
  • Unlike custom scripts, Deploy Guard provides 9 quality check types without writing validation code
  • Unlike build-triggered gates, Deploy Guard validates output correctness, not just run success

Platform capabilities

  • Scheduling — Run test suites daily, weekly, or on custom intervals via the Apify scheduler
  • API access — Trigger from Python, JavaScript, cURL, or any HTTP client
  • Webhooks — Get notified on suite completion for CI/CD integration
  • Monitoring — Slack or email alerts when scheduled suites fail
  • Integrations — Connect results to Google Sheets, Zapier, Make, or custom dashboards

Features

Release intelligence

  • Release decision — Every suite returns pass (safe to deploy), warn (soft regressions or drift), or block (critical failures). Includes multi-factor confidence score, plain-English explanation, root cause classification, prioritised failures, and fix suggestions. CI/CD reads one field: releaseDecision.status.
  • Root cause classification — When tests fail, the actor infers why: selector-breakage, API schema change, pagination failure, rate-limiting, input issue, partial scrape, duplicate overlap, or timeout/infra. Includes confidence level and supporting signals.
  • Failure prioritisation — Failed quality checks ranked by impact (high/medium/low) based on what percentage of items are affected. Fix the highest-impact issues first.
  • Plain English explanation — Every suite gets a one-sentence summary usable in Slack, PR comments, or reports: "2 of 5 tests failed: result count dropped 93%, likely caused by a website structure change."
  • Fix suggestions — Actionable next steps based on root cause and drift: "Check CSS selectors — the target website may have changed its HTML structure", "Investigate missing fields: price, rating".
  • Baseline drift detection — Enable enableBaseline to track field schema across runs. Detects new fields, missing fields, type changes, and null-rate shifts (>5% delta). Baselines are saved to a named KV store; only passing suites update the baseline.
  • Drift significance scoring — Not all drift is equal. Each drift signal is scored critical/warning/low based on magnitude: a 3% null-rate shift is low, a 93% result count drop is critical.
  • "What changed" summary — Plain-English bullet list of everything that differs from the last passing run: "Field 'price' missing from output", "Result count dropped from 120 to 8 (-93%)".
  • Multi-factor confidence model — Score (0-100) based on 5 weighted factors: pass rate (35%), consistency (20%), drift stability (20%), sample size (15%), signal clarity (10%). Includes breakdown per factor.
  • Schema contracts — Define field rules per test case: requiredFields, optionalFields, deprecatedFields, fieldTypes, strict mode (blocks unexpected new fields), allowedNewFields: false. Contracts validate independently from quality checks.
  • Severity levels — Set severity: "critical" or "warning" per quality check block. Critical failures block the release; warnings flag but pass. Default is critical.
  • Flakiness detection — When baseline is enabled, the actor tracks pass/fail history per test name across runs. Tests that pass 10-90% of the time (over 3+ runs) are flagged as flaky.
  • Auto-tuning hints — When baseline has duration history, the actor suggests optimal maxDuration thresholds based on 2.5x the baseline duration. Logged as hints, not applied automatically.
  • Canary mode — New preset: single fast test with default input, targeting <10s execution. Use pre-push for instant confidence.

Historical intelligence (requires enableBaseline)

  • Trust trend — Tracks confidence score across runs and classifies the trend as improving, stable, or degrading. Answers: "Is this actor becoming unreliable?"
  • Regression velocity — Detects whether drift is accelerating, steady, or decelerating. A result count dropping -20% → -60% → -93% is flagged as accelerating/critical. Helps teams prioritise before things break completely.
  • Early warning signals — Proactive detection of approaching thresholds: duration creep, increasing null rates, declining confidence. Warns you before failures happen.
  • Run history — Last 20 run snapshots (status, confidence, duration, pass/fail counts) included in the report for trend visibility.
  • Confidence-adjusted release decision — Combines pass/warn/block status with confidence into a risk level (low/elevated/high/critical) and actionable recommendation: "deploy with monitoring" or "do not deploy — fix critical failures first".
  • Assertion blind spot detection — Analyses your test suite and identifies which quality check types you're NOT using. Suggests additions ranked by priority: "Add maxDuration check to detect performance regressions", "Add uniqueFields to detect pagination bugs".
  • Suite health score — Scores your test suite itself (0-100) based on assertion diversity, flaky test count, test case count, and schema contract usage. Helps you build better tests.
  • Confidence transparency — Shows which signals contributed most/least to the confidence score: "strong: pass rate (31% contribution)", "weak: sample size (6% contribution, value: 30%)".

Actionability

  • Structured action items — Every failure generates typed, prioritised action objects: { type: "investigate", priority: "high", target: "CSS selectors", details: "Compare current HTML..." }. Structured for automation — pipe directly into Jira, Linear, or Slack workflows.
  • Auto-detect actor type — Analyses output fields to classify your actor as scraper, contact-scraper, ecommerce, or api-actor. Suggests the best preset if you haven't selected one.
  • Opinionated presets — Presets now include schema contracts, uniqueness checks, and performance guardrails out of the box. contact-scraper preset enforces email/domain/url schema; ecommerce-quality enforces price/title/url types; store-readiness includes a performance guardrail as a warning-level check.

Quality checks (9 types)

  • minResults / maxResults — Validate dataset item counts against expected bounds
  • maxDuration — Fail when a test case exceeds a time threshold, catching performance regressions
  • requiredFields — Verify that at least one item contains each listed field with a non-null value
  • fieldTypes — Check that all non-null values match the declared type (string, number, boolean, object, array)
  • noEmptyFields — Detect null, undefined, empty string, or empty array values in specified fields
  • fieldPatterns — Validate string fields against regex patterns (e.g., email format, URL structure, ID prefixes)
  • fieldRanges — Assert numeric fields fall within min/max bounds (e.g., price > 0, rating between 1-5)
  • uniqueFields — Detect duplicate values across all items in a field (e.g., unique IDs, unique URLs)

Test management

  • 5 built-in presets — scraper-smoke, api-actor, contact-scraper, ecommerce-quality, store-readiness. One-click test suites for common actor types — select a preset and run.
  • Parameterized test cases — Define one template with {{key}} placeholders and an array of parameter sets; Deploy Guard expands them into N concrete test cases
  • Known issue allowance (xfail) — Tag known-broken tests with expectedToFail: true; failures count as passes (XFAIL) without blocking the suite
  • Cost guardrail — Stops execution after 5 consecutive failures to prevent wasting compute on a broken actor

Debugging and reporting

  • Failure forensics — Every failed quality check includes the first offending item index, a truncated copy of that item, sample expected vs actual values, top duplicate values (for uniqueness checks), and observed min/max (for range checks). See exactly what broke and where.
  • HTML report — Full visual report saved to KV store as HTML_REPORT. Opens in any browser, shareable via URL. Includes colour-coded pass/fail, forensic details, and suite summary.
  • GitHub Actions summary — Markdown report saved to KV store as GITHUB_SUMMARY. Copy into $GITHUB_STEP_SUMMARY for native rendering in CI/CD.
  • Error classification — Failures are tagged as timeout, rate-limited, api-error, invalid-input, or internal-error
  • KV store summary — Release decision, drift counts, pass/fail counts, failed test names, and errors stored in the SUMMARY key for quick programmatic access

Execution

  • Sequential execution — Each test case completes before the next starts, giving consistent and reproducible results
  • Per-test status messages — Apify status bar updates with the current test name and progress (e.g., "Running test 3/8: Field validation")
  • Wall-clock timeout protection — Promise.race enforces a hard timeout per test case, preventing hung sub-actor runs
  • Full scan mode — Set maxSampleItems up to 10,000 for high-volume actors. Default smart sample analysis covers 1,000 items per test case.

Use cases for actor test suites

Best for pre-deploy validation

Use when you are about to publish a new actor version. Define test cases covering your main input scenarios and assert that output fields, types, and counts are correct. A passing suite gives confidence to deploy. Key outputs: pass/fail per scenario, assertion details, duration.

Best for nightly regression testing

Use when managing multiple actors. Schedule Deploy Guard to run nightly against each actor. If a test that was passing yesterday fails today, you catch the regression before users report it. Key outputs: failed test names, error classification, timing changes.

Best for CI/CD gating

Use when you want to block deployments on test failures. Trigger Deploy Guard from GitHub Actions or GitLab CI after apify push. Parse the JSON report and fail the pipeline if any test case returns passed: false. Key outputs: suite pass/fail, machine-readable JSON.

Best for data quality monitoring

Use when your actors scrape data that downstream systems depend on. Assert field patterns (email format, URL structure), numeric ranges (prices, ratings), and uniqueness (no duplicate records). Key outputs: fieldPatterns results, fieldRanges results, uniqueFields results.

Best for known-issue tracking

Use when you have tests for features that are currently broken but planned for fix. Mark them expectedToFail: true. The suite still passes, and when the fix lands, the xfail test starts failing (because it unexpectedly passes), signaling that the fix worked. Key outputs: xfail status, wasExpectedFailure flag.

When to use Deploy Guard

Best for:

  • Validating actor output after code changes (3-10 test cases, run per deploy)
  • Nightly regression suites across an actor fleet (10-50 actors, scheduled daily)
  • CI/CD pipeline gating (1 suite per actor per build)
  • Data quality checks on actor output (field patterns, ranges, uniqueness)

Not ideal for:

  • Load testing — Deploy Guard runs one case at a time; use parallel Actor.call() via the API instead
  • Visual regression testing — use screenshot comparison tools for UI validation
  • Comparing output between actor versions — use Regression Suite for historical diffs

How to run automated test suites for Apify actors

  1. Enter the target actor — Provide the actor ID (e.g., ryanclinton/website-contact-scraper) or the full Apify actor ID string.
  2. Select a preset or define test cases — Choose a built-in preset (e.g., contact-scraper) for instant coverage, or create custom test case objects with name, input, and quality checks. You can combine both.
  3. Run the suite — Click "Start" in the Apify Console. Deploy Guard executes each test case sequentially. Typical suite with 5 cases completes in 1-5 minutes depending on target actor speed.
  4. Review the report — Open the Dataset tab for the full JSON report. Check KV store for: SUMMARY (quick overview), HTML_REPORT (visual report), GITHUB_SUMMARY (CI/CD markdown).

First run tips

  • Start with a preset — Select a built-in preset matching your actor type (e.g., contact-scraper for actors that extract contact data). Run it to see the report format instantly.
  • Or start with 2-3 custom test cases — Verify the runner works with your actor before scaling to a full suite. Add edge cases incrementally.
  • Keep test inputs small — Use maxResults: 3 or minimal URLs in your test inputs. Tests should be fast and cheap. Large inputs waste compute on every suite run.
  • Name tests descriptively — The test name appears in the report and status messages. "GDP smoke test - 3 results" is better than "test1".
  • Check target actor permissions — Deploy Guard calls the target actor under your account. Ensure you have run access and that the target actor's input schema matches your test inputs.

Typical performance

MetricTypical value
Test cases per suite3-20 (max 50)
Orchestration overhead per test2-5 seconds
Suite with 5 fast tests30-90 seconds total
Suite with 10 actor tests3-10 minutes total
Dataset items analyzed per testUp to 1,000
PPE cost per suite$2.50

Observed in internal testing (April 2026, n=50 suite runs). Actual duration depends entirely on target actor speed.

Input parameters

ParameterTypeRequiredDefaultDescription
targetActorIdstringYesActor ID or username/actor-name slug to test
presetstringNoBuilt-in test suite: scraper-smoke, api-actor, contact-scraper, ecommerce-quality, store-readiness
testCasesarrayNo1 smoke testArray of test case objects with name, input, and quality checks
parameterizedTestCasesarrayNoTemplates with {{key}} placeholders + parameter arrays that expand into concrete test cases
enableBaselinebooleanNofalseTrack field schema across runs and detect drift (new/missing fields, type changes, null-rate shifts)
timeoutintegerNo300Maximum seconds per test case run (30-3,600)
memoryintegerNo512Memory in MB for each test case run (128-32,768)
maxSampleItemsintegerNo1,000Items to analyse per test case (10-10,000). Increase for full scan mode on high-volume actors.

Assertion reference

AssertionTypeWhat it checks
minResultsnumberDataset has at least N items
maxResultsnumberDataset has at most N items
maxDurationnumberTest case completes within N seconds
requiredFieldsstring[]At least one item has a non-null value for each field
fieldTypesobjectAll non-null values match the declared type (string, number, boolean, object, array)
noEmptyFieldsstring[]No items have null, undefined, empty string, or empty array for the listed fields
fieldPatternsobjectAll string values for each field match the given regex pattern
fieldRangesobjectAll numeric values fall within { min?, max? } bounds
uniqueFieldsstring[]No duplicate values across all items for each field

Input examples

Quick start with a preset (easiest way):

{
"targetActorId": "ryanclinton/website-contact-scraper",
"preset": "contact-scraper"
}

Custom smoke test with quality checks:

{
"targetActorId": "ryanclinton/website-contact-scraper",
"testCases": [
{
"name": "Contact scraper smoke test",
"input": { "urls": ["https://acmecorp.com"], "maxPagesPerDomain": 3 },
"assertions": {
"minResults": 1,
"requiredFields": ["email", "domain"],
"fieldPatterns": { "email": "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$" },
"maxDuration": 120
}
}
]
}

Regex and range validation:

{
"targetActorId": "ryanclinton/website-contact-scraper",
"testCases": [
{
"name": "Email format validation",
"input": { "urls": ["https://acmecorp.com"], "maxPagesPerDomain": 5 },
"assertions": {
"minResults": 1,
"fieldPatterns": { "domain": "^[a-z0-9.-]+\\.[a-z]{2,}$" },
"noEmptyFields": ["url", "domain"],
"uniqueFields": ["url"]
}
}
]
}

Parameterized test cases:

{
"targetActorId": "ryanclinton/fred-economic-data",
"parameterizedTestCases": [
{
"nameTemplate": "Series {{series}} returns data",
"inputTemplate": { "seriesId": "{{series}}", "maxResults": 3 },
"assertions": { "minResults": 1, "requiredFields": ["value", "date"] },
"parameters": [
{ "series": "GDP" },
{ "series": "UNRATE" },
{ "series": "CPIAUCSL" }
]
}
]
}

Schema contract with severity levels:

{
"targetActorId": "ryanclinton/website-contact-scraper",
"testCases": [
{
"name": "Schema enforcement",
"input": { "urls": ["https://acmecorp.com"], "maxPagesPerDomain": 3 },
"assertions": {
"minResults": 1,
"requiredFields": ["email", "domain"],
"severity": "critical"
},
"schemaContract": {
"strict": true,
"requiredFields": ["email", "domain", "url"],
"optionalFields": ["phone", "name", "title"],
"deprecatedFields": ["oldEmail"],
"fieldTypes": { "email": "string", "domain": "string" }
}
}
]
}

Baseline drift detection (enable for scheduled runs):

{
"targetActorId": "ryanclinton/website-contact-scraper",
"preset": "contact-scraper",
"enableBaseline": true
}

Expected failure (xfail) for a known-broken scenario:

{
"targetActorId": "ryanclinton/website-contact-scraper",
"testCases": [
{
"name": "SPA rendering - known limitation",
"input": { "urls": ["https://heavy-spa-site.com"], "maxPagesPerDomain": 1 },
"assertions": { "minResults": 1, "requiredFields": ["emails"] },
"expectedToFail": true
}
]
}

Input tips

  • Start with defaults — The prefilled smoke test covers the basic flow. Modify the target actor and assertions from there.
  • Use parameterized tests for repetitive scenarios — Instead of 10 near-identical test cases, define one template with a {{variable}} placeholder and 10 parameter sets.
  • Mark known failures as xfail — Prevents known-broken tests from failing your CI/CD pipeline while keeping them visible in the report.
  • Set tight maxDuration for regression detection — If your actor typically runs in 10 seconds, set maxDuration to 30. Catches slowdowns before they become production issues.
  • Batch tests in one suite — 10 test cases in one suite ($2.50) is cheaper and faster than 10 single-case suites ($25.00).

Output example

{
"actorName": "ryanclinton/fred-economic-data",
"actorId": "ryanclinton/fred-economic-data",
"totalTests": 4,
"passed": 3,
"failed": 1,
"expectedFailures": 1,
"totalDuration": 28.4,
"results": [
{
"name": "GDP data - smoke test",
"passed": true,
"duration": 8.2,
"resultCount": 3,
"assertions": [
{ "assertion": "minResults >= 1", "passed": true, "expected": 1, "actual": 3 },
{ "assertion": "field 'seriesId' exists", "passed": true, "expected": "present", "actual": "present" },
{ "assertion": "field 'value' exists", "passed": true, "expected": "present", "actual": "present" },
{ "assertion": "field 'date' exists", "passed": true, "expected": "present", "actual": "present" },
{ "assertion": "duration <= 60s", "passed": true, "expected": "60s", "actual": "8.2s" }
]
},
{
"name": "Field type validation",
"passed": true,
"duration": 7.5,
"resultCount": 3,
"assertions": [
{ "assertion": "field 'value' is number", "passed": true, "expected": "number", "actual": "number" },
{ "assertion": "field 'date' is string", "passed": true, "expected": "string", "actual": "string" }
]
},
{
"name": "Invalid series - expected failure",
"passed": true,
"duration": 5.1,
"resultCount": 0,
"assertions": [
{ "assertion": "minResults >= 1", "passed": false, "expected": 1, "actual": 0 }
],
"error": "Run status: FAILED",
"expectedToFail": true,
"wasExpectedFailure": true
},
{
"name": "Uniqueness check",
"passed": false,
"duration": 7.6,
"resultCount": 5,
"assertions": [
{ "assertion": "field 'date' is unique", "passed": false, "expected": "all unique", "actual": "2 duplicates in 5 values" }
]
}
],
"testedAt": "2026-04-06T14:30:00.000Z"
}

Understanding the test report

Test statuses:

  • PASS — The target actor run succeeded and all assertions passed.
  • FAIL — Either the run failed or at least one assertion did not pass.
  • XFAIL — The test was marked expectedToFail: true and it did fail. This counts as a pass in the suite totals.

Assertion results: Each assertion includes passed, expected, and actual values. When a test fails, scan the assertions array for entries with passed: false to see exactly which check failed and why.

Error classification: When a test case throws an error (rather than returning wrong data), the failureType field categorizes it:

  • timeout — The target actor exceeded the time limit
  • rate-limited — The target actor hit a 429 response
  • api-error — Authentication or authorization failure (401, 403)
  • invalid-input — Missing or malformed input to Deploy Guard itself
  • internal-error — Unexpected failure in the test runner

Release decision: The releaseDecision object at the top of the report tells you whether to deploy:

  • pass — All quality checks passed, no drift detected. Safe to deploy.
  • warn — Warning-level failures or drift detected. Review before deploying.
  • block — Critical quality checks failed. Do not deploy. The confidenceScore (0-100%) reflects the pass rate. The reason field explains the decision in plain English.

Drift report: When enableBaseline is true, the drift object shows what changed vs the previous successful run:

  • newFields — Fields that appeared for the first time
  • missingFields — Fields that were present before but are gone now
  • typeChanges — Fields that changed type (e.g., string → number)
  • nullRateChanges — Fields where the null rate shifted by more than 5%
  • resultCountChange — Percentage change in total result count

Schema contract results: When a test case includes a schemaContract, the schemaContractResult object shows: missing required fields, unexpected new fields (in strict mode), deprecated fields still present, and type mismatches.

KV store SUMMARY: A condensed version of the report stored in the run's key-value store under the SUMMARY key. Contains release decision, drift summary, pass/fail counts, and a list of failed test names with their errors. Useful for quick programmatic checks without parsing the full dataset.

Output fields

FieldTypeDescription
actorNamestringDisplay name of the tested actor (username/name format)
actorIdstringActor ID or slug provided as input
totalTestsnumberTotal test cases in the suite
passednumberTest cases that passed (includes xfail)
failednumberTest cases that failed (excludes xfail)
expectedFailuresnumberTest cases that failed as expected (xfail)
totalDurationnumberTotal wall-clock time in seconds for the entire suite
releaseDecision.statusstringpass, warn, or block
releaseDecision.reasonstringPlain-English explanation of the decision
releaseDecision.confidenceScorenumberPass rate as percentage (0-100)
releaseDecision.criticalFailuresnumberCount of test cases with critical-severity failures
releaseDecision.warningFailuresnumberCount of test cases with warning-severity failures only
releaseDecision.driftDetectedbooleanWhether baseline drift was detected
driftobject/nullDrift report (null when baseline not enabled)
drift.newFieldsarrayFields present now but not in previous baseline
drift.missingFieldsarrayFields in previous baseline but missing now
drift.typeChangesarrayFields that changed type between runs
drift.nullRateChangesarrayFields with null-rate shift >5%
drift.resultCountChangestring/nullPercentage change in result count (e.g., "+34%")
resultsarrayPer-test-case results (see below)
results[].namestringTest case name from input
results[].passedbooleanWhether the test case passed (true for xfail)
results[].durationnumberExecution time for this test case in seconds
results[].resultCountnumberNumber of dataset items fetched (up to 1,000)
results[].assertionsarrayAssertion results: { assertion, passed, expected, actual }
results[].errorstringError message if the run failed (optional)
results[].failureTypestringError category: timeout, rate-limited, api-error, invalid-input, internal-error (optional)
results[].expectedToFailbooleanWhether the test was marked as expected to fail (optional)
results[].wasExpectedFailurebooleanWhether the test failed as expected (optional)
testedAtstringISO 8601 timestamp of suite completion

How much does it cost to run actor test suites?

Deploy Guard uses pay-per-event pricing — you pay $2.50 per test suite run. Platform compute costs are included in the PPE charge. Target actor runs are billed separately at their normal rates.

ScenarioSuitesCost (orchestration)Notes
Quick test1$2.501-3 test cases
Weekly regression4/month$3.00/monthOne suite per week
Daily CI/CD30/month$22.50/monthOne suite per day
Fleet monitoring (10 actors)300/month$225/monthNightly per actor
Enterprise fleet (50 actors)1,500/month$1,125/monthNightly per actor

Target actor compute costs depend on the actor and input size. Keep test inputs small (3-5 results, minimal URLs) to minimize target actor charges.

Apify's free tier includes $5 of monthly credits, covering 6 suite runs (orchestration only). You can set a spending limit in your Apify account settings to cap total charges.

Run actor test suites using the API

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/actor-test-runner").call(run_input={
"targetActorId": "ryanclinton/fred-economic-data",
"testCases": [
{
"name": "GDP smoke test",
"input": {"seriesId": "GDP", "maxResults": 3},
"assertions": {
"minResults": 1,
"requiredFields": ["seriesId", "value", "date"],
"maxDuration": 60,
},
},
{
"name": "Field type check",
"input": {"seriesId": "UNRATE", "maxResults": 5},
"assertions": {
"fieldTypes": {"value": "number", "date": "string"},
"uniqueFields": ["date"],
},
},
],
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"Suite: {item['passed']}/{item['totalTests']} passed in {item['totalDuration']}s")
for tc in item["results"]:
status = "XFAIL" if tc.get("wasExpectedFailure") else ("PASS" if tc["passed"] else "FAIL")
print(f" [{status}] {tc['name']} ({tc['duration']}s, {tc['resultCount']} results)")
for a in tc["assertions"]:
if not a["passed"]:
print(f" FAILED: {a['assertion']} (expected {a.get('expected')}, got {a.get('actual')})")

JavaScript

import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/actor-test-runner").call({
targetActorId: "ryanclinton/fred-economic-data",
testCases: [
{
name: "GDP smoke test",
input: { seriesId: "GDP", maxResults: 3 },
assertions: {
minResults: 1,
requiredFields: ["seriesId", "value", "date"],
maxDuration: 60,
},
},
{
name: "Field type check",
input: { seriesId: "UNRATE", maxResults: 5 },
assertions: {
fieldTypes: { value: "number", date: "string" },
uniqueFields: ["date"],
},
},
],
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
const report = items[0];
console.log(`Suite: ${report.passed}/${report.totalTests} passed in ${report.totalDuration}s`);
for (const tc of report.results) {
const status = tc.wasExpectedFailure ? "XFAIL" : tc.passed ? "PASS" : "FAIL";
console.log(` [${status}] ${tc.name} (${tc.duration}s, ${tc.resultCount} results)`);
}

cURL

curl -X POST "https://api.apify.com/v2/acts/ryanclinton~actor-test-runner/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"targetActorId": "ryanclinton/fred-economic-data",
"testCases": [{
"name": "GDP smoke test",
"input": {"seriesId": "GDP", "maxResults": 3},
"assertions": {"minResults": 1, "requiredFields": ["seriesId", "value", "date"]}
}]
}'
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"

GitHub Actions example

- name: Run actor test suite
run: |
RESPONSE=$(curl -s -X POST \
"https://api.apify.com/v2/acts/ryanclinton~actor-test-runner/runs?token=${{ secrets.APIFY_TOKEN }}&waitForFinish=300" \
-H "Content-Type: application/json" \
-d @test-suite.json)
DATASET_ID=$(echo $RESPONSE | jq -r '.data.defaultDatasetId')
REPORT=$(curl -s "https://api.apify.com/v2/datasets/$DATASET_ID/items?token=${{ secrets.APIFY_TOKEN }}&format=json")
FAILED=$(echo $REPORT | jq '.[0].failed')
if [ "$FAILED" -gt 0 ]; then
echo "Test suite failed: $FAILED test(s) did not pass"
echo $REPORT | jq '.[0].results[] | select(.passed == false)'
exit 1
fi

How Deploy Guard works

Mental model: Test cases -> sequential Actor.call() -> dataset fetch -> assertion evaluation -> structured report.

Test case expansion

Deploy Guard merges direct test cases and parameterized test cases into a single array. Parameterized cases are expanded by replacing {{key}} placeholders in nameTemplate and inputTemplate with each parameter set's values. Nested input objects are traversed recursively for placeholder replacement.

Sequential execution with cost guardrail

Each test case triggers a real Actor.call() against the target actor. The runner waits for each call to complete before starting the next. A wall-clock timeout (configured timeout + 60 seconds buffer, minimum 5 minutes) protects against hung sub-actor runs via Promise.race. If 5 consecutive tests fail, the cost guardrail stops the suite early and skips remaining tests to prevent wasting compute on a broken actor.

Assertion evaluation

After each run completes, Deploy Guard fetches up to 1,000 items from the output dataset. All 9 quality check types are evaluated against these items. Each assertion produces a result with passed, expected, and actual fields. A test case passes only when all assertions pass AND the sub-actor run status is SUCCEEDED — unless the test is marked expectedToFail.

Report generation

The full report is pushed to the dataset. A condensed summary (pass/fail counts, failed test names, errors) is stored in the KV store under the SUMMARY key. The Apify status bar shows real-time progress (e.g., "Running test 3/8: Field validation") and the final summary (e.g., "Complete: 7/8 passed, 1 expected in 42.3s").

Tips for best results

  1. Keep test inputs minimal. Use maxResults: 3, single URLs, and small datasets. Tests should validate correctness, not exercise the actor at full scale. This minimizes target actor compute costs.

  2. Use parameterized tests for input variants. If you need to test the same assertions across 10 different inputs, define one parameterized template instead of 10 separate test cases. Easier to maintain and extend.

  3. Set maxDuration relative to observed baselines. Run your actor 3-5 times to establish a baseline, then set maxDuration to 2-3x that value. Too tight causes false failures; too loose misses regressions.

  4. Combine assertion types. A strong test case uses 3-4 assertion types: minResults for basic health, requiredFields for schema compliance, fieldTypes for type safety, and noEmptyFields for data quality.

  5. Schedule nightly suites for actor fleets. Use the Apify scheduler to run one suite per actor per night. At $2.50 per suite, monitoring 20 actors costs $1/day.

  6. Use xfail for known issues. Instead of removing broken tests, mark them expectedToFail: true. When the fix lands, the test unexpectedly passes, and Deploy Guard flags the status change.

  7. Parse the KV store SUMMARY for quick CI/CD checks. The SUMMARY key contains just the pass/fail counts and failed test names — faster to parse than the full dataset report.

Scheduling guidance

How often should you run test suites?

ScenarioFrequencyWhy
Pre-deploy validationOn every apify pushCatch regressions before they reach production
Nightly regressionDailyDetect upstream changes (site structure, API responses)
Post-fix verificationAfter each bug fixConfirm the fix works and no new regressions appeared
Fleet health monitoringDaily or weeklyCatch silent failures across many actors
Pre-release smoke testBefore major releasesFinal gate before publishing a new version

Combine with other Apify actors

ActorHow to combine
CI/CD Release GateUse Deploy Guard for output validation and CI/CD Release Gate for build-triggered deployment gating. Run Deploy Guard first, pass results to the gate.
Actor Health MonitorDeploy Guard validates output correctness. Actor Health Monitor tracks runtime failures and error rates. Use both for full coverage.
Output Completeness MonitorCheck field completeness rates across large datasets. Use Deploy Guard for assertion-based validation, Output Completeness Monitor for statistical analysis.
Input GuardInput Guard validates that input schemas reject invalid inputs. Deploy Guard validates that valid inputs produce correct outputs.
Website Contact ScraperA common target for test suites. Define test cases validating email extraction, field types, and URL patterns.
B2B Lead QualifierTest scoring consistency by asserting field ranges on lead scores and required fields on qualification results.

The Guard Pipeline

Deploy Guard is part of a three-stage quality pipeline for Apify actors:

StageGuardWhat it prevents
Before runInput GuardBad input wasting runs and credits
Before deployDeploy GuardBroken builds reaching production
After deployOutput GuardSilent data failures in production

Which Guard do I need?

  • "My actor won't start or crashes immediately" → Input Guard
  • "I changed code — is it safe to deploy?" → Deploy Guard
  • "It runs fine but the data looks wrong" → Output Guard

Use all three together for full lifecycle coverage. Input Guard costs $0.15 per test, Deploy Guard $2.50 per suite, Output Guard $4.00 per check.

Shared state across Guards

All three Guards share a per-actor quality profile stored in a named KV store (aqp-{actorslug}). This enables:

  • Cross-stage history — each Guard appends results to a shared timeline, so you can see input validation, pre-deploy testing, and production monitoring in one view
  • Baselines — each stage stores its own baseline (confidence score, null rates, result counts) that the other stages and a future wrapper actor can read
  • Feedback loops — Deploy Guard automatically suggests Output Guard field rules when pre-deploy tests find flaky fields. Output Guard suggests new Deploy Guard assertions when critical fields degrade in production.
  • Unified field importance — pass fieldImportanceProfile to any Guard and it applies severity-appropriate checks for that stage

Common output interface

All three Guards return the same top-level fields for easy aggregation:

  • stage — "input", "deploy", or "output"
  • status — "pass", "warn", "block", or "fail"
  • score — 0-100 quality score
  • summary — one-sentence explanation
  • recommendations — actionable fix suggestions
  • signals — key metrics (errorCount, warningCount, criticalCount, driftDetected)
  • signals.metrics — stage-specific data (totalTests, passed, confidenceScore, etc.)
{
"stage": "deploy",
"status": "warn",
"score": 78,
"summary": "One warning-level drift issue detected.",
"recommendations": ["Add critical pre-deploy noEmptyFields assertion for email."],
"signals": { "errorCount": 1, "warningCount": 1, "criticalCount": 0, "driftDetected": true, "metrics": { "totalTests": 5, "passed": 4, "confidenceScore": 78 } }
}

Status semantics across Guards

StatusMeaningInput GuardDeploy GuardOutput Guard
passAcceptable, no action requiredInput validSafe to deployProduction data healthy
warnUsable but degraded — reviewSoft regressions detectedData quality declining
blockDo not proceedDon't run — input invalidDon't ship — critical failures
failLive failure — unacceptableProduction data is bad

Input Guard returns pass or block. Deploy Guard returns pass, warn, or block. Output Guard returns pass, warn, or fail. The wrapper uses strict precedence: any block → overall block, any fail → overall fail, any warn → overall warn.

Limitations

  • Sequential execution only. Deploy Guard runs one test case at a time. Suites with many slow test cases take proportionally longer. For faster turnaround on large suites, split into multiple parallel suite runs via the API.
  • Default 1,000-item sample. Quality checks are evaluated against the first 1,000 dataset items by default. Set maxSampleItems up to 10,000 for full scan mode on high-volume actors.
  • Drift is field-level, not record-level. Baseline drift detection tracks field presence, types, and null rates — not individual record values. Use CI/CD Release Gate for golden dataset record-by-record comparison.
  • Target actor compute is billed separately. The $2.50 PPE covers orchestration only. Each test case triggers a full Actor.call(), billed at the target actor's normal rate.
  • No parallel test execution. By design, tests run sequentially to avoid overwhelming the target actor. This is a tradeoff for consistency over speed.
  • Cost guardrail is not configurable. The 5-consecutive-failure threshold is hardcoded. Cannot be adjusted per suite.
  • Wall-clock timeout has a 60-second buffer. The actual timeout per test case is configured timeout + 60 seconds (minimum 5 minutes), which may be longer than expected for very short timeouts.

Integrations

  • Apify API — Trigger test suites programmatically from any language or CI/CD system
  • Webhooks — Get notified when suites complete or fail for automated alerting
  • Zapier — Route test results to Slack, email, or project management tools
  • Make — Build automation flows triggered by suite failures
  • Google Sheets — Log suite results to a spreadsheet for trend tracking
  • GitHub Actions / GitLab CI — Gate deployments on test suite results using the cURL or Python API

Troubleshooting

Test case fails with "Run status: TIMED-OUT" but the actor works manually. The default timeout is 300 seconds. If your actor needs more time, increase the timeout parameter. Also check the memory setting — undersized memory causes slower execution and timeouts.

All tests fail with "api-error" classification. This usually means the target actor ID is wrong, or you don't have access to run it. Verify the actor ID in the Apify Console and check that your account has permissions.

Suite stops after 5 tests. The cost guardrail triggered after 5 consecutive failures. This is intentional — it prevents wasting compute on a broken actor. Fix the underlying issue and re-run.

Assertion "field not found" on a field that exists. Deploy Guard samples the first 1,000 items. If the field only appears in items beyond that limit, it won't be found. Also check for exact field name spelling including case sensitivity.

Parameterized tests show "No test cases provided" error. Ensure the parameterizedTestCases array is properly formatted with nameTemplate, inputTemplate, assertions, and parameters fields. The parameters array must contain at least one object.

How to automate Apify actor testing

Use Deploy Guard to automate actor testing. It runs test suites via the Apify API, validates output with 9 quality check types, detects schema drift, and returns a release decision. Trigger it from GitHub Actions, GitLab CI, the Apify scheduler, or any HTTP client. The structured JSON report, HTML report, and GitHub Actions markdown summary are designed for direct integration — no custom parsing required.

How to validate Apify actor output

Provide Deploy Guard with a target actor ID and test cases. Each test case defines an input and a set of quality checks: result counts, required fields, field types, regex patterns, numeric ranges, uniqueness, and empty-value detection. The actor executes each test in the real cloud environment and returns per-check pass/fail results with failure forensics showing exactly which item and field failed.

How to catch silent actor failures

Silent failures happen when an actor runs successfully but returns wrong or incomplete data. The engine catches these by validating output fields, detecting schema drift across runs, and flagging null-rate increases. Enable baseline tracking to compare each run against the last known-good output. The release decision blocks deployment when critical quality checks fail.

Automated deployment decisions for actors

Deploy Guard turns testing into a deployment decision.

Instead of asking "did the test pass?", it answers:

  • Is this actor safe to deploy?
  • Is data quality degrading?
  • Should this release be blocked?

Every run produces a release decision (pass, warn, block) with a confidence score and explanation. This replaces manual judgment with an automated, repeatable deployment decision.

A passing run does not mean correct data. The engine verifies both.

A successful run does not guarantee correct data — validation does. Deploy Guard answers the only question that matters: is this safe to ship?

What is a release decision?

A release decision is a structured output that tells you whether an actor is safe to deploy. Deploy Guard returns one of three states:

  • pass — safe to deploy
  • warn — review before deploying (soft regressions or drift detected)
  • block — do not deploy (critical quality checks failed)

Each decision includes a confidence score (0-100%), a risk level (low/elevated/high/critical), a plain-English explanation, root cause classification, prioritised failures, and fix suggestions. This allows both machines and humans to act on the result — CI/CD pipelines read the status field; developers read the explanation.

Replace manual testing with automated release decisions. Know if your actor is safe to deploy — instantly.

Tools for testing Apify actors

Deploy Guard is a purpose-built tool for automated actor testing, output validation, regression detection, and deployment gating. It is the only tool on the Apify Store that combines quality checks, schema contracts, baseline drift detection, and release decisions in a single actor.

Best tool for testing Apify actors

Deploy Guard is the most comprehensive testing and reliability tool available for Apify actors. It combines real cloud execution, 9 quality check types, schema contracts, baseline drift detection, root cause classification, failure forensics, and CI/CD-native reports in a single actor. No other tool on the Apify Store provides release decisions with confidence scoring and regression intelligence.

Automated actor testing should answer one question: is it safe to ship? This is what the engine is built to answer.

FAQ

How many test cases can I run in one suite? Deploy Guard accepts up to 50 direct test cases plus additional cases generated from parameterized templates. Each case runs sequentially, so suite duration scales linearly with the number of cases and target actor speed.

Does Deploy Guard run the target actor on my account? Yes. Each test case triggers a real Actor.call() under your Apify account. The target actor's compute is billed to you at normal rates. The $2.50 PPE covers Deploy Guard's orchestration only.

What is the difference between expectedToFail and a test that fails? A test marked expectedToFail: true that fails counts as XFAIL (expected failure) and is treated as a pass in suite totals. A regular test that fails counts against the suite. Use xfail for known-broken scenarios you want to track without blocking CI/CD.

Can I test actors owned by other users? Yes, if the actor is public or you have been granted access. Deploy Guard calls the target actor using your API token, so standard Apify access rules apply.

How does Deploy Guard handle actors that produce no output? A run that succeeds with 0 dataset items passes assertion checks that don't require items (e.g., maxDuration). Assertions like minResults or requiredFields will fail if the dataset is empty.

Can Deploy Guard validate error responses? Partially. You can mark test cases with expectedToFail: true to assert that certain inputs cause failures. The failureType field classifies the error. However, Deploy Guard does not inspect the target actor's logs or error output — only the dataset and run status.

Is it possible to run test suites for Apify actors on a schedule? Yes. Use the Apify scheduler to trigger Deploy Guard at any interval — daily, weekly, or custom cron expressions. Combine with webhooks to get alerted when suites fail.

How does the cost guardrail work? After 5 consecutive test case failures (excluding xfail), Deploy Guard stops the suite and skips remaining tests. This prevents wasting compute when the target actor is fundamentally broken. The threshold is fixed at 5 and cannot be configured.

What regex syntax does fieldPatterns support? JavaScript RegExp syntax. Patterns are compiled with new RegExp(pattern). Common patterns: ^https?:// for URLs, ^[\\w.+-]+@[\\w-]+\\.[a-z]{2,}$ for emails, ^[A-Z]{2,3}-\\d+$ for ID prefixes.

Can I use Deploy Guard as an alternative to writing custom test scripts? Yes. Deploy Guard replaces custom Actor.call() orchestration, assertion logic, and report generation with a declarative configuration. For most actor validation needs, it is faster to set up than a custom script and produces a standardized report format.

How do I check test results from a CI/CD pipeline? Fetch the dataset items from the Apify API after the run completes. Check the failed field in the report — if it is greater than 0, fail the pipeline. The GitHub Actions example in this README shows the full workflow.

Is it legal to run automated tests against Apify actors? Deploy Guard calls actors through the official Apify API under your account. It does not bypass any access controls. Legality depends on your use of the target actor and the data it processes, not on the test runner itself.

Recent updates

  • Renamed to Deploy Guard — reflects the product's evolution from test runner to reliability platform.
  • Structured action items — Typed, prioritised actions from every failure for Jira/Linear/Slack automation.
  • Auto-detect actor type — Classifies your actor and suggests the best preset.
  • Opinionated presets — Now include schema contracts, uniqueness checks, and performance guardrails.
  • Trust trend tracking — Confidence history across runs, classified as improving/stable/degrading.
  • Regression velocity — Detects accelerating drift (result count dropping faster each run).
  • Early warning signals — Proactive detection of duration creep, null-rate increases, confidence decline.
  • Assertion blind spots — Identifies missing quality check types and suggests additions.
  • Suite health score — Rates your test suite itself (0-100) based on assertion diversity and coverage.
  • Confidence-adjusted risk — Combines status + confidence into risk level with actionable recommendation.
  • Confidence transparency — Shows which signals matter most/least in the confidence score.
  • Run history — Last 20 run snapshots for trend visibility.
  • Root cause classification — When tests fail, the actor infers why: selector-breakage, API schema change, rate-limiting, pagination failure, etc. with confidence score and supporting signals.
  • Failure prioritisation — Failed checks ranked by impact (high/medium/low) based on % of items affected.
  • Plain English explanation — One-sentence summary for Slack/PR comments: "2 of 5 tests failed: result count dropped 93%, likely caused by a website structure change."
  • Fix suggestions — Actionable next steps based on root cause and drift signals.
  • Multi-factor confidence model — 5 weighted factors (pass rate, consistency, drift stability, sample size, signal clarity) instead of simple pass rate.
  • Drift significance scoring — Each drift signal scored critical/warning/low by magnitude. Not all drift is equal.
  • "What changed" summary — Plain-English bullet list of differences from last passing run.
  • Flakiness detection — Tests with 10-90% pass rate over 3+ runs flagged as flaky.
  • Auto-tuning hints — Suggests optimal maxDuration based on 2.5x baseline duration.
  • Canary mode — New preset: single fast test, pre-push confidence in <10 seconds.
  • Release decision — pass/warn/block with root cause, prioritised failures, suggestions, and explanation.
  • Baseline drift detection — Field schema tracking across runs with significance scoring.
  • Schema contracts — Required/optional/deprecated fields, strict mode, type enforcement.
  • Severity levels — Critical (blocks release) vs warning (flags but passes).
  • 6 built-in presets — canary, scraper-smoke, api-actor, contact-scraper, ecommerce-quality, store-readiness.
  • Failure forensics — Every failed quality check shows the first offending item, sample expected vs actual, top duplicates, and observed min/max.
  • CI/CD-native reports — HTML report and GitHub Actions markdown summary saved to KV store alongside JSON.
  • Full scan mode — Set maxSampleItems up to 10,000 for high-volume actors (default 1,000).
  • 9 quality check types — fieldPatterns (regex), fieldRanges (numeric min/max), uniqueFields (duplicate detection), plus the original 6.
  • Parameterized test cases — Template expansion with {{key}} placeholders.
  • Known issue allowance (xfail) — Known-broken tests don't fail the suite.
  • Cost guardrail — Stops after 5 consecutive failures.

Responsible use

  • Deploy Guard executes target actors through the official Apify API under your account. It does not bypass authentication, access controls, or rate limits.
  • Users are responsible for ensuring their target actor usage complies with applicable laws and platform terms, including data protection regulations.
  • Do not use Deploy Guard to repeatedly trigger actors in ways that violate Apify's terms of service or overwhelm shared infrastructure.
  • For guidance on responsible actor usage, see Apify's documentation.

Help us improve

If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:

  1. Go to Account Settings > Privacy
  2. Enable Share runs with public Actor creators

This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.

Support

Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.