XavvyNess AI Quality Auditor
Pricing
from $20.00 / 1,000 agent audit reports
XavvyNess AI Quality Auditor
Score any Apify actor or AI endpoint on 5 dimensions (XAQS 0–100): Completeness, Accuracy, Actionability, Structure, Cost-Efficiency. Returns grade A+ to F with specific improvement recommendations. Audit your actors before publishing.
Pricing
from $20.00 / 1,000 agent audit reports
Rating
0.0
(0)
Developer
XavvyNess
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
8 hours ago
Last modified
Categories
Share
XavvyNess Agent Quality Auditor (XAQS)
The first standardized quality auditing framework for AI agents. Run any Apify actor or AI agent through a structured test suite and get a XAQS score (0–100) across 5 quality dimensions — with specific, actionable improvement recommendations.
Use it on your own actors before publishing, or benchmark competitors.
Demo
🎬 Video demo coming soon. Upload
audit-agent.mp4to YouTube, then runpython3 scripts/actor-video-gen.py --embed-readmesto embed it here automatically.
What It Does
- Runs your agent with multiple test prompts (customizable)
- Scores each response on 5 dimensions (20 points each = 100 total)
- Generates a grade (A+ → F) with specific improvement recommendations
- Returns a structured audit report ready for your README or team review
The 5 XAQS Dimensions:
| Dimension | What It Measures |
|---|---|
| Completeness | Does it fully answer? No truncation, no "continued..." |
| Accuracy | Factual, no hallucinations, no invented URLs or names |
| Actionability | Concrete next steps, imperative recommendations |
| Structure | Headers, bullets, code blocks — readable at a glance |
| Cost-Efficiency | Output value per token, no bloat or padding |
Grade Scale:
- 90–100 → A+ (5-star, marketplace ready)
- 80–89 → A (4-star, minor gaps)
- 70–79 → B (3-star, needs work)
- 60–69 → C (2-star, significant issues)
- <60 → F (1-star, not ready for publishing)
Input
| Field | Type | Default | Description |
|---|---|---|---|
agentName | string | "My AI Agent" | Human-readable agent name for the report |
apifyActorId | string | — | Apify actor ID to audit (e.g. RBobzxRYFVgoX74uu) |
agentEndpoint | string | — | HTTP endpoint if not an Apify actor |
testSuite | array | [] | Custom test cases (see below) |
dimensions | array | all 5 | Dimensions to score |
runs | integer | 3 | How many times to run each test (consistency scoring) |
apifyToken | string | — | Your Apify token (required for Apify actor audits) |
Example — Audit an Apify actor
{"agentName": "My Research Agent","apifyActorId": "RBobzxRYFVgoX74uu","apifyToken": "apify_api_YOUR_TOKEN","runs": 3}
Example — Audit a specialized scraper (custom test suite)
{"agentName": "My Smart Extractor","apifyActorId": "IN4O5pGUjye34xW0O","apifyToken": "apify_api_YOUR_TOKEN","testSuite": [{"query": "Extract top stories from HN","payload": {"urls": ["https://news.ycombinator.com"],"extractionPrompt": "Extract top 5 story titles and point scores","outputFormat": "json"},"expectedKeywords": ["title", "points"],"minLength": 100}],"runs": 2}
Example Output
{"agentName": "XavvyNess Research Engine","xaqsScore": 84,"grade": "A","passedTests": 6,"totalTests": 9,"dimensions": {"completeness": { "score": 18, "max": 20, "notes": "Reports are thorough; one truncation detected on deep queries" },"accuracy": { "score": 17, "max": 20, "notes": "Sources cited correctly; one hallucinated URL in 9 runs" },"actionability": { "score": 16, "max": 20, "notes": "Key findings actionable; recommendations could be more specific" },"structure": { "score": 19, "max": 20, "notes": "Excellent use of headers and numbered citations" },"cost_efficiency": { "score": 14, "max": 20, "notes": "Occasional verbose preambles inflate token count" }},"topIssues": ["1/9 runs produced a hallucinated source URL — add URL validation","Deep queries occasionally truncate before conclusion section","Preamble text (2-3 sentences before ## Overview) adds no value — remove it"],"recommendations": ["Add output validation to catch empty or truncated reports before returning","Strip preamble text from model output using regex before pushing to dataset","Increase max_tokens for deep queries from 3500 to 5000"],"auditedAt": "2026-04-08T23:00:00.000Z","agent": "XavvyNess Agent Quality Auditor"}
Pricing
$0.02 per audit ($20.00 per 1,000 audits) — PAY_PER_RESULT, charged only on successful completion.
A standard audit (3 runs × 3 tests = 9 total agent calls) typically completes in 3–5 minutes. Failed runs are not charged.
Who Should Use This
- Apify builders — Audit your actors before publishing to the marketplace
- Businesses using AI agents — Verify quality before going to production
- Researchers — Benchmark agent quality systematically across providers
- Developers — Catch regressions when updating an agent
Custom Test Suites
The default test suite sends { query, depth, format } — works perfectly for research-style agents. For specialized actors (scrapers, analyzers, extractors), use the testSuite field to define custom payload objects that match your actor's actual input schema.
Each test case supports:
query— human-readable test namepayload— exact input JSON sent to the actor (overrides default)expectedKeywords— words that must appear in the outputminLength— minimum output length in charactersmaxLatencyMs— maximum acceptable run time
Integration
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });const run = await client.actor('g1HQWHIikrGNlt7RF').call({agentName: 'My Agent',apifyActorId: 'YOUR_ACTOR_ID',apifyToken: 'apify_api_YOUR_TOKEN',runs: 3,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(`XAQS Score: ${items[0].xaqsScore}/100 — ${items[0].grade}`);console.log('Top issues:', items[0].topIssues);
About XavvyNess
XavvyNess builds AI agents that do real work — and XAQS is how we hold them accountable. Run it on any agent, including ours.