XavvyNess AI Quality Auditor avatar

XavvyNess AI Quality Auditor

Pricing

from $20.00 / 1,000 agent audit reports

Go to Apify Store
XavvyNess AI Quality Auditor

XavvyNess AI Quality Auditor

Score any Apify actor or AI endpoint on 5 dimensions (XAQS 0–100): Completeness, Accuracy, Actionability, Structure, Cost-Efficiency. Returns grade A+ to F with specific improvement recommendations. Audit your actors before publishing.

Pricing

from $20.00 / 1,000 agent audit reports

Rating

0.0

(0)

Developer

XavvyNess

XavvyNess

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

8 hours ago

Last modified

Share

XavvyNess Agent Quality Auditor (XAQS)

The first standardized quality auditing framework for AI agents. Run any Apify actor or AI agent through a structured test suite and get a XAQS score (0–100) across 5 quality dimensions — with specific, actionable improvement recommendations.

Use it on your own actors before publishing, or benchmark competitors.

Demo

🎬 Video demo coming soon. Upload audit-agent.mp4 to YouTube, then run python3 scripts/actor-video-gen.py --embed-readmes to embed it here automatically.


What It Does

  1. Runs your agent with multiple test prompts (customizable)
  2. Scores each response on 5 dimensions (20 points each = 100 total)
  3. Generates a grade (A+ → F) with specific improvement recommendations
  4. Returns a structured audit report ready for your README or team review

The 5 XAQS Dimensions:

DimensionWhat It Measures
CompletenessDoes it fully answer? No truncation, no "continued..."
AccuracyFactual, no hallucinations, no invented URLs or names
ActionabilityConcrete next steps, imperative recommendations
StructureHeaders, bullets, code blocks — readable at a glance
Cost-EfficiencyOutput value per token, no bloat or padding

Grade Scale:

  • 90–100 → A+ (5-star, marketplace ready)
  • 80–89 → A (4-star, minor gaps)
  • 70–79 → B (3-star, needs work)
  • 60–69 → C (2-star, significant issues)
  • <60 → F (1-star, not ready for publishing)

Input

FieldTypeDefaultDescription
agentNamestring"My AI Agent"Human-readable agent name for the report
apifyActorIdstringApify actor ID to audit (e.g. RBobzxRYFVgoX74uu)
agentEndpointstringHTTP endpoint if not an Apify actor
testSuitearray[]Custom test cases (see below)
dimensionsarrayall 5Dimensions to score
runsinteger3How many times to run each test (consistency scoring)
apifyTokenstringYour Apify token (required for Apify actor audits)

Example — Audit an Apify actor

{
"agentName": "My Research Agent",
"apifyActorId": "RBobzxRYFVgoX74uu",
"apifyToken": "apify_api_YOUR_TOKEN",
"runs": 3
}

Example — Audit a specialized scraper (custom test suite)

{
"agentName": "My Smart Extractor",
"apifyActorId": "IN4O5pGUjye34xW0O",
"apifyToken": "apify_api_YOUR_TOKEN",
"testSuite": [
{
"query": "Extract top stories from HN",
"payload": {
"urls": ["https://news.ycombinator.com"],
"extractionPrompt": "Extract top 5 story titles and point scores",
"outputFormat": "json"
},
"expectedKeywords": ["title", "points"],
"minLength": 100
}
],
"runs": 2
}

Example Output

{
"agentName": "XavvyNess Research Engine",
"xaqsScore": 84,
"grade": "A",
"passedTests": 6,
"totalTests": 9,
"dimensions": {
"completeness": { "score": 18, "max": 20, "notes": "Reports are thorough; one truncation detected on deep queries" },
"accuracy": { "score": 17, "max": 20, "notes": "Sources cited correctly; one hallucinated URL in 9 runs" },
"actionability": { "score": 16, "max": 20, "notes": "Key findings actionable; recommendations could be more specific" },
"structure": { "score": 19, "max": 20, "notes": "Excellent use of headers and numbered citations" },
"cost_efficiency": { "score": 14, "max": 20, "notes": "Occasional verbose preambles inflate token count" }
},
"topIssues": [
"1/9 runs produced a hallucinated source URL — add URL validation",
"Deep queries occasionally truncate before conclusion section",
"Preamble text (2-3 sentences before ## Overview) adds no value — remove it"
],
"recommendations": [
"Add output validation to catch empty or truncated reports before returning",
"Strip preamble text from model output using regex before pushing to dataset",
"Increase max_tokens for deep queries from 3500 to 5000"
],
"auditedAt": "2026-04-08T23:00:00.000Z",
"agent": "XavvyNess Agent Quality Auditor"
}

Pricing

$0.02 per audit ($20.00 per 1,000 audits) — PAY_PER_RESULT, charged only on successful completion.

A standard audit (3 runs × 3 tests = 9 total agent calls) typically completes in 3–5 minutes. Failed runs are not charged.


Who Should Use This

  • Apify builders — Audit your actors before publishing to the marketplace
  • Businesses using AI agents — Verify quality before going to production
  • Researchers — Benchmark agent quality systematically across providers
  • Developers — Catch regressions when updating an agent

Custom Test Suites

The default test suite sends { query, depth, format } — works perfectly for research-style agents. For specialized actors (scrapers, analyzers, extractors), use the testSuite field to define custom payload objects that match your actor's actual input schema.

Each test case supports:

  • query — human-readable test name
  • payload — exact input JSON sent to the actor (overrides default)
  • expectedKeywords — words that must appear in the output
  • minLength — minimum output length in characters
  • maxLatencyMs — maximum acceptable run time

Integration

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('g1HQWHIikrGNlt7RF').call({
agentName: 'My Agent',
apifyActorId: 'YOUR_ACTOR_ID',
apifyToken: 'apify_api_YOUR_TOKEN',
runs: 3,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`XAQS Score: ${items[0].xaqsScore}/100 — ${items[0].grade}`);
console.log('Top issues:', items[0].topIssues);

About XavvyNess

XavvyNess builds AI agents that do real work — and XAQS is how we hold them accountable. Run it on any agent, including ours.

More agents →