XavvyNess AI Quality Auditor
DeprecatedPricing
from $20.00 / 1,000 agent audit reports
XavvyNess AI Quality Auditor
DeprecatedScore any Apify actor or AI endpoint on 5 dimensions (XAQS 0–100): Completeness, Accuracy, Actionability, Structure, Cost-Efficiency. Returns grade A+ to F with specific improvement recommendations. Audit your actors before publishing.
Pricing
from $20.00 / 1,000 agent audit reports
Rating
0.0
(0)
Developer
XavvyNess
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
11 days ago
Last modified
Categories
Share
XavvyNess Agent Quality Auditor (XAQS)
The first standardized quality auditing framework for AI agents. Run any Apify actor or AI agent through a structured test suite and get a XAQS score (0–100) across 5 quality dimensions — with specific, actionable improvement recommendations.
Use it on your own actors before publishing, or benchmark competitors.
Demo
🎬 Video demo coming soon. Upload
audit-agent.mp4to YouTube, then runpython3 scripts/actor-video-gen.py --embed-readmesto embed it here automatically.
What It Does
- Runs your agent with multiple test prompts (customizable)
- Scores each response on 5 dimensions (20 points each = 100 total)
- Generates a grade (A+ → F) with specific improvement recommendations
- Returns a structured audit report ready for your README or team review
The 5 XAQS Dimensions:
| Dimension | What It Measures |
|---|---|
| Completeness | Does it fully answer? No truncation, no "continued..." |
| Accuracy | Factual, no hallucinations, no invented URLs or names |
| Actionability | Concrete next steps, imperative recommendations |
| Structure | Headers, bullets, code blocks — readable at a glance |
| Cost-Efficiency | Output value per token, no bloat or padding |
Grade Scale:
- 90–100 → A+ (5-star, marketplace ready)
- 80–89 → A (4-star, minor gaps)
- 70–79 → B (3-star, needs work)
- 60–69 → C (2-star, significant issues)
- <60 → F (1-star, not ready for publishing)
Input
| Field | Type | Default | Description |
|---|---|---|---|
agentName | string | "My AI Agent" | Human-readable agent name for the report |
apifyActorId | string | — | Apify actor ID to audit (e.g. RBobzxRYFVgoX74uu) |
agentEndpoint | string | — | HTTP endpoint if not an Apify actor |
testSuite | array | [] | Custom test cases (see below) |
dimensions | array | all 5 | Dimensions to score |
runs | integer | 3 | How many times to run each test (consistency scoring) |
apifyToken | string | — | Your Apify token (required for Apify actor audits) |
Example — Audit an Apify actor
{"agentName": "My Research Agent","apifyActorId": "RBobzxRYFVgoX74uu","apifyToken": "apify_api_YOUR_TOKEN","runs": 3}
Example — Audit a specialized scraper (custom test suite)
{"agentName": "My Smart Extractor","apifyActorId": "IN4O5pGUjye34xW0O","apifyToken": "apify_api_YOUR_TOKEN","testSuite": [{"query": "Extract top stories from HN","payload": {"urls": ["https://news.ycombinator.com"],"extractionPrompt": "Extract top 5 story titles and point scores","outputFormat": "json"},"expectedKeywords": ["title", "points"],"minLength": 100}],"runs": 2}
Example Output
{"agentName": "XavvyNess Research Engine","xaqsScore": 84,"grade": "A","passedTests": 6,"totalTests": 9,"dimensions": {"completeness": { "score": 18, "max": 20, "notes": "Reports are thorough; one truncation detected on deep queries" },"accuracy": { "score": 17, "max": 20, "notes": "Sources cited correctly; one hallucinated URL in 9 runs" },"actionability": { "score": 16, "max": 20, "notes": "Key findings actionable; recommendations could be more specific" },"structure": { "score": 19, "max": 20, "notes": "Excellent use of headers and numbered citations" },"cost_efficiency": { "score": 14, "max": 20, "notes": "Occasional verbose preambles inflate token count" }},"topIssues": ["1/9 runs produced a hallucinated source URL — add URL validation","Deep queries occasionally truncate before conclusion section","Preamble text (2-3 sentences before ## Overview) adds no value — remove it"],"recommendations": ["Add output validation to catch empty or truncated reports before returning","Strip preamble text from model output using regex before pushing to dataset","Increase max_tokens for deep queries from 3500 to 5000"],"auditedAt": "2026-04-08T23:00:00.000Z","agent": "XavvyNess Agent Quality Auditor"}
Pricing
$0.02 per audit ($20.00 per 1,000 audits) — PAY_PER_RESULT, charged only on successful completion.
A standard audit (3 runs × 3 tests = 9 total agent calls) typically completes in 3–5 minutes. Failed runs are not charged.
Who Should Use This
- Apify builders — Audit your actors before publishing to the marketplace
- Businesses using AI agents — Verify quality before going to production
- Researchers — Benchmark agent quality systematically across providers
- Developers — Catch regressions when updating an agent
Custom Test Suites
The default test suite sends { query, depth, format } — works perfectly for research-style agents. For specialized actors (scrapers, analyzers, extractors), use the testSuite field to define custom payload objects that match your actor's actual input schema.
Each test case supports:
query— human-readable test namepayload— exact input JSON sent to the actor (overrides default)expectedKeywords— words that must appear in the outputminLength— minimum output length in charactersmaxLatencyMs— maximum acceptable run time
Integration
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });const run = await client.actor('g1HQWHIikrGNlt7RF').call({agentName: 'My Agent',apifyActorId: 'YOUR_ACTOR_ID',apifyToken: 'apify_api_YOUR_TOKEN',runs: 3,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(`XAQS Score: ${items[0].xaqsScore}/100 — ${items[0].grade}`);console.log('Top issues:', items[0].topIssues);
About XavvyNess
XavvyNess builds AI agents that do real work — and XAQS is how we hold them accountable. Run it on any agent, including ours.