XavvyNess AI Quality Auditor avatar

XavvyNess AI Quality Auditor

Deprecated

Pricing

from $20.00 / 1,000 agent audit reports

Go to Apify Store
XavvyNess AI Quality Auditor

XavvyNess AI Quality Auditor

Deprecated

Score any Apify actor or AI endpoint on 5 dimensions (XAQS 0–100): Completeness, Accuracy, Actionability, Structure, Cost-Efficiency. Returns grade A+ to F with specific improvement recommendations. Audit your actors before publishing.

Pricing

from $20.00 / 1,000 agent audit reports

Rating

0.0

(0)

Developer

XavvyNess

XavvyNess

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

11 days ago

Last modified

Share

XavvyNess Agent Quality Auditor (XAQS)

The first standardized quality auditing framework for AI agents. Run any Apify actor or AI agent through a structured test suite and get a XAQS score (0–100) across 5 quality dimensions — with specific, actionable improvement recommendations.

Use it on your own actors before publishing, or benchmark competitors.

Demo

🎬 Video demo coming soon. Upload audit-agent.mp4 to YouTube, then run python3 scripts/actor-video-gen.py --embed-readmes to embed it here automatically.


What It Does

  1. Runs your agent with multiple test prompts (customizable)
  2. Scores each response on 5 dimensions (20 points each = 100 total)
  3. Generates a grade (A+ → F) with specific improvement recommendations
  4. Returns a structured audit report ready for your README or team review

The 5 XAQS Dimensions:

DimensionWhat It Measures
CompletenessDoes it fully answer? No truncation, no "continued..."
AccuracyFactual, no hallucinations, no invented URLs or names
ActionabilityConcrete next steps, imperative recommendations
StructureHeaders, bullets, code blocks — readable at a glance
Cost-EfficiencyOutput value per token, no bloat or padding

Grade Scale:

  • 90–100 → A+ (5-star, marketplace ready)
  • 80–89 → A (4-star, minor gaps)
  • 70–79 → B (3-star, needs work)
  • 60–69 → C (2-star, significant issues)
  • <60 → F (1-star, not ready for publishing)

Input

FieldTypeDefaultDescription
agentNamestring"My AI Agent"Human-readable agent name for the report
apifyActorIdstringApify actor ID to audit (e.g. RBobzxRYFVgoX74uu)
agentEndpointstringHTTP endpoint if not an Apify actor
testSuitearray[]Custom test cases (see below)
dimensionsarrayall 5Dimensions to score
runsinteger3How many times to run each test (consistency scoring)
apifyTokenstringYour Apify token (required for Apify actor audits)

Example — Audit an Apify actor

{
"agentName": "My Research Agent",
"apifyActorId": "RBobzxRYFVgoX74uu",
"apifyToken": "apify_api_YOUR_TOKEN",
"runs": 3
}

Example — Audit a specialized scraper (custom test suite)

{
"agentName": "My Smart Extractor",
"apifyActorId": "IN4O5pGUjye34xW0O",
"apifyToken": "apify_api_YOUR_TOKEN",
"testSuite": [
{
"query": "Extract top stories from HN",
"payload": {
"urls": ["https://news.ycombinator.com"],
"extractionPrompt": "Extract top 5 story titles and point scores",
"outputFormat": "json"
},
"expectedKeywords": ["title", "points"],
"minLength": 100
}
],
"runs": 2
}

Example Output

{
"agentName": "XavvyNess Research Engine",
"xaqsScore": 84,
"grade": "A",
"passedTests": 6,
"totalTests": 9,
"dimensions": {
"completeness": { "score": 18, "max": 20, "notes": "Reports are thorough; one truncation detected on deep queries" },
"accuracy": { "score": 17, "max": 20, "notes": "Sources cited correctly; one hallucinated URL in 9 runs" },
"actionability": { "score": 16, "max": 20, "notes": "Key findings actionable; recommendations could be more specific" },
"structure": { "score": 19, "max": 20, "notes": "Excellent use of headers and numbered citations" },
"cost_efficiency": { "score": 14, "max": 20, "notes": "Occasional verbose preambles inflate token count" }
},
"topIssues": [
"1/9 runs produced a hallucinated source URL — add URL validation",
"Deep queries occasionally truncate before conclusion section",
"Preamble text (2-3 sentences before ## Overview) adds no value — remove it"
],
"recommendations": [
"Add output validation to catch empty or truncated reports before returning",
"Strip preamble text from model output using regex before pushing to dataset",
"Increase max_tokens for deep queries from 3500 to 5000"
],
"auditedAt": "2026-04-08T23:00:00.000Z",
"agent": "XavvyNess Agent Quality Auditor"
}

Pricing

$0.02 per audit ($20.00 per 1,000 audits) — PAY_PER_RESULT, charged only on successful completion.

A standard audit (3 runs × 3 tests = 9 total agent calls) typically completes in 3–5 minutes. Failed runs are not charged.


Who Should Use This

  • Apify builders — Audit your actors before publishing to the marketplace
  • Businesses using AI agents — Verify quality before going to production
  • Researchers — Benchmark agent quality systematically across providers
  • Developers — Catch regressions when updating an agent

Custom Test Suites

The default test suite sends { query, depth, format } — works perfectly for research-style agents. For specialized actors (scrapers, analyzers, extractors), use the testSuite field to define custom payload objects that match your actor's actual input schema.

Each test case supports:

  • query — human-readable test name
  • payload — exact input JSON sent to the actor (overrides default)
  • expectedKeywords — words that must appear in the output
  • minLength — minimum output length in characters
  • maxLatencyMs — maximum acceptable run time

Integration

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('g1HQWHIikrGNlt7RF').call({
agentName: 'My Agent',
apifyActorId: 'YOUR_ACTOR_ID',
apifyToken: 'apify_api_YOUR_TOKEN',
runs: 3,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`XAQS Score: ${items[0].xaqsScore}/100 — ${items[0].grade}`);
console.log('Top issues:', items[0].topIssues);

About XavvyNess

XavvyNess builds AI agents that do real work — and XAQS is how we hold them accountable. Run it on any agent, including ours.

More agents →