Actor Health Monitor — Failures, Trends & Revenue avatar

Actor Health Monitor — Failures, Trends & Revenue

Pricing

$50.00 / 1,000 health checks

Go to Apify Store
Actor Health Monitor — Failures, Trends & Revenue

Actor Health Monitor — Failures, Trends & Revenue

Actor Health Monitor. Available on the Apify Store with pay-per-event pricing.

Pricing

$50.00 / 1,000 health checks

Rating

0.0

(0)

Developer

ryan clinton

ryan clinton

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

10 days ago

Last modified

Share

Actor Health Monitor

Monitor your entire Apify actor fleet from a single run — get failure diagnosis, trend tracking, revenue impact estimates, and actionable fix recommendations for every actor that has issues. Instead of manually clicking through run logs in the Console, this actor checks all your actors at once, reads failed run logs, categorizes the root cause, and tells you exactly what to fix.

Why use this over checking the Apify Console manually? The Console shows you that a run failed. This actor tells you why it failed, whether it's getting worse, how much money you're losing, and what to do about it. One run covers your entire fleet — 10 actors or 500.

Features

  • Automatic failure diagnosis — Reads the last 500 characters of each failed run's log and categorizes the root cause: timeout, schema mismatch, credential error, memory exceeded, input validation, API down, or unknown
  • Trend tracking — Compares current failure rate against the previous period of equal length to detect whether issues are improving, worsening, stable, or new
  • Revenue impact estimation — Calculates estimated revenue loss using your PPE pricing and average results per successful run
  • Actionable recommendations — Generates specific fix suggestions per actor based on failure categories and severity
  • Webhook alerts — Optionally POST a JSON alert summary to Slack, Discord, Teams, Zapier, or any webhook endpoint when actors exceed your failure threshold
  • Fleet-wide summary — Single summary record with total actors, failure rate, top issue, and total revenue impact

Use cases

Daily fleet monitoring

Schedule this actor to run every morning. Get a health report of your entire fleet delivered to Slack. Catch problems before your users report them.

Post-deployment validation

Run after pushing updates to verify nothing broke. Compare failure rates before and after deployment by adjusting the hoursBack window.

Revenue protection

If you run PPE-priced actors, failures mean lost revenue. This actor quantifies exactly how much money failed runs cost you, so you can prioritize fixes by business impact.

SLA monitoring

Set a low failure threshold (e.g., 5%) and connect a webhook to your alerting system. Get notified immediately when any actor degrades below your quality bar.

Multi-actor debugging

When you have hundreds of actors, finding the one that's failing is needle-in-a-haystack. This actor surfaces all problems in one dataset, sorted by severity.

Input

FieldTypeDescriptionDefault
apiTokenString (secret)Your Apify API token (required)
hoursBackInteger (1-720)How far back to check run history24
webhookUrlStringURL to POST alert summary to
failureThresholdNumber (0-1)Alert if failure rate exceeds this (0.1 = 10%)0.1

Example input

{
"apiToken": "apify_api_YOUR_TOKEN_HERE",
"hoursBack": 24,
"webhookUrl": "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK",
"failureThreshold": 0.1
}

Output

Each actor with failures produces one record in the dataset. The final record is always a fleet summary.

Actor report example

{
"actorName": "website-contact-scraper",
"actorId": "BCq991ez5HObhS5n0",
"status": "WARNING",
"failedRuns": 3,
"totalRuns": 50,
"failureRate": 0.06,
"trend": "improving",
"trendDetail": "6% now vs 12% previous period",
"failureCategories": {
"timeout": 2,
"schema_mismatch": 1
},
"estimatedRevenueLoss": 0.45,
"latestFailureLog": "Error: pushData validation failed — field 'emails' expected array but got string...",
"recommendations": [
"2 runs timed out — consider increasing timeoutSecs in run configuration or reducing workload per run",
"1 schema mismatch — check that pushData output matches your dataset schema, or validate with apifyforge-validate before pushing"
],
"checkedAt": "2026-03-18T14:30:00.000Z"
}

Fleet summary example

{
"type": "summary",
"totalActors": 294,
"actorsWithIssues": 5,
"overallFailureRate": 0.03,
"estimatedTotalRevenueLoss": 2.30,
"topIssue": "timeout (affects 3 actors)",
"alertSent": true
}

Output fields — Actor report

FieldTypeDescription
actorNameStringActor name
actorIdStringApify actor ID
statusStringHEALTHY, WARNING, or CRITICAL
failedRunsIntegerNumber of failed runs in the period
totalRunsIntegerTotal runs in the period
failureRateNumberFailed/total ratio (0.06 = 6%)
trendStringimproving, worsening, stable, or new_issue
trendDetailStringHuman-readable trend comparison
failureCategoriesObjectCount of failures by diagnosed category
estimatedRevenueLossNumberEstimated USD lost from failed runs
latestFailureLogStringLast 500 characters of most recent failed run's log
recommendationsArrayActionable fix suggestions
checkedAtStringISO timestamp

Output fields — Fleet summary

FieldTypeDescription
typeStringAlways "summary"
totalActorsIntegerTotal actors in your account
actorsWithIssuesIntegerActors with at least one failure
overallFailureRateNumberFleet-wide failure ratio
estimatedTotalRevenueLossNumberTotal estimated USD lost
topIssueStringMost common failure category
alertSentBooleanWhether webhook alert was sent

How to use the API

Python

from apify_client import ApifyClient
client = ApifyClient(token="YOUR_API_TOKEN")
run = client.actor("ryanclinton/actor-health-monitor").call(
run_input={
"apiToken": "YOUR_API_TOKEN",
"hoursBack": 24,
"failureThreshold": 0.1,
}
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
if item.get("type") == "summary":
print(f"Fleet: {item['totalActors']} actors, {item['actorsWithIssues']} with issues")
print(f"Revenue impact: ${item['estimatedTotalRevenueLoss']:.2f}")
else:
print(f"[{item['status']}] {item['actorName']}: {item['failureRate']*100:.0f}% failures")
for rec in item.get("recommendations", []):
print(f" → {rec}")

JavaScript / Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('ryanclinton/actor-health-monitor').call({
apiToken: 'YOUR_API_TOKEN',
hoursBack: 24,
failureThreshold: 0.1,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
const summary = items.find(i => i.type === 'summary');
console.log(`Fleet: ${summary.totalActors} actors, ${summary.actorsWithIssues} with issues`);
const actorReports = items.filter(i => i.type !== 'summary');
actorReports.forEach(report => {
console.log(`[${report.status}] ${report.actorName}: ${(report.failureRate * 100).toFixed(0)}% failure rate`);
report.recommendations.forEach(rec => console.log(`${rec}`));
});

cURL

curl -X POST "https://api.apify.com/v2/acts/ryanclinton~actor-health-monitor/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"apiToken": "YOUR_API_TOKEN",
"hoursBack": 24,
"failureThreshold": 0.1
}'

How it works

  1. Fetches all your actors via the Apify API using your token
  2. Pulls recent run history for each actor (up to 100 most recent runs)
  3. Filters to your time window — only considers runs started within hoursBack
  4. For actors with failures, reads the last 500 characters of each failed run's log (up to 10 per actor to stay within rate limits)
  5. Diagnoses each failure by pattern-matching the log text against known error signatures:
    • pushData, schema, validationschema_mismatch
    • timeout, TIMED-OUTtimeout
    • 401, 403, credential, API keycredential_error
    • ENOMEM, heap out of memorymemory_exceeded
    • required, missing, invalid inputinput_validation
    • ECONNREFUSED, 503, socket hang upapi_down
    • Everything else → unknown
  6. Computes trends by comparing current period failure rate to the previous period of equal length
  7. Estimates revenue impact using your PPE pricing: failedRuns * avgResultsPerRun * pricePerResult
  8. Generates recommendations based on failure categories and severity
  9. Sends webhook alert if configured and any actor exceeds the failure threshold

Failure categories explained

CategoryWhat it meansCommon causes
timeoutRun exceeded time limitToo much data, slow API, infinite loop
schema_mismatchOutput data doesn't match schemaCode change broke pushData format
credential_errorAuthentication failedExpired API key, wrong token
memory_exceededRan out of RAMLarge datasets in memory, memory leak
input_validationMissing or invalid input fieldsCaller didn't provide required fields
api_downExternal service unreachableThird-party API outage, rate limiting
unknownCouldn't diagnose from logCheck full logs in Apify Console

Integrations

Slack alerts

Use a Slack incoming webhook URL as the webhookUrl parameter. You'll get a JSON payload posted to your channel whenever actors exceed the failure threshold.

Scheduled monitoring

Use Apify Schedules to run this actor every hour, every morning, or on any cron schedule. Combined with webhook alerts, this gives you continuous fleet monitoring.

Zapier / Make / n8n

Connect the webhook to any automation platform to trigger workflows when actors fail — create Jira tickets, send emails, page on-call engineers.

Limitations

  • Log analysis is heuristic — Failure diagnosis is based on pattern matching against the last 500 characters of run logs. Some failures may be categorized as "unknown" if the error message doesn't match known patterns.
  • Revenue estimates are approximate — Based on PPE pricing and average results per run. Actual revenue impact depends on which specific runs failed and their expected output volume.
  • Rate limits — Fetches logs for up to 10 failed runs per actor to stay within Apify API rate limits. Actors with more than 10 failures in the period will have some failures categorized as "unknown".
  • 100 run limit per actor — Only checks the 100 most recent runs per actor. If an actor runs more than 100 times in your hoursBack window, older runs won't be included.
  • No historical storage — Each run produces a fresh report. For historical tracking, export results to a database or use Apify datasets with named storage.

FAQ

How is this different from the Apify Console monitoring?

The Console shows individual run statuses. This actor aggregates across your entire fleet, diagnoses why runs failed by reading logs, tracks whether issues are improving or worsening, estimates revenue impact, and generates actionable recommendations. It's the difference between a dashboard and a health report.

Will this work with actors I don't own?

No. It uses your API token to fetch actors and run logs, so it only sees actors in your account. It cannot access other users' actors or shared organization actors unless your token has access.

How much does it cost to run?

$0.05 per health check. One run covers your entire fleet regardless of size. A daily check costs about $1.50/month.

Can I monitor specific actors only?

Currently, it monitors all actors in your account. You can filter the output dataset to focus on specific actors. A future version may add actor name/ID filters.

What if an actor has zero runs in the period?

It's silently skipped. The fleet summary counts it in totalActors but it won't appear in the individual reports.

Does the webhook send on every run?

Only when at least one actor exceeds your failureThreshold. If everything is healthy, no webhook is sent.

Pricing

  • $0.05 per health check — one check covers your entire fleet
  • Runs in under 2 minutes for fleets of 100+ actors
FrequencyMonthly cost
Daily~$1.50
Twice daily~$3.00
Hourly~$36.00

Changelog

v1.0.0 (2026-03-18)

  • Complete rebuild from scratch (v0 was too thin)
  • Failure diagnosis via log pattern matching
  • 7 failure categories with specific recommendations
  • Trend tracking (current vs previous period)
  • Revenue impact estimation from PPE pricing
  • Webhook alerting with JSON payload
  • Fleet-wide summary with top issue identification