Pricing

Pay per usage

AI Agent Interaction Analyzer

Evaluate AI agent conversations for quality, bias, and optimization. Uses DeepEval metrics for rigorous LLM-powered analysis or free heuristic scoring.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Rams

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

What It Does

Feed in AI conversations (prompts + responses) and get back detailed evaluation scores across multiple dimensions. Ideal for AI developers, researchers, and teams building LLM-powered products who need to monitor and improve their AI outputs.

Use Cases

Quality monitoring — Score AI responses for relevance, coherence, helpfulness, and completeness
Bias detection — Identify confirmation bias, gender bias, racial bias, and other fairness issues
Hallucination checking — Detect when AI fabricates facts not grounded in provided context
Toxicity screening — Flag harmful or inappropriate language in AI outputs
Model comparison — Compare response quality across different models or prompt versions
Regression testing — Track quality over time as you update prompts or switch models

Evaluation Modes

Mode	Cost	What You Get
heuristic	Free (no API key needed)	Fast scoring using text analysis — relevance, coherence, helpfulness, completeness, keyword-based bias detection
deepeval	Uses your OpenAI API key	Rigorous LLM-as-judge metrics — answer relevancy, faithfulness, coherence, helpfulness, hallucination, bias, toxicity
full	Uses your OpenAI API key	Both heuristic and DeepEval results combined for a complete picture

Input

Provide your conversations as a JSON array. Each conversation needs an id and a messages array:

{
  "conversations": [
    {
      "id": "conv_001",
      "messages": [
        {"role": "user", "content": "How do I implement caching in Redis?"},
        {"role": "assistant", "content": "Here's how to implement caching with Redis..."}
      ],
      "context": "Optional: ground truth or source documents for faithfulness/hallucination checks"
    }
  ],
  "mode": "heuristic",
  "openaiApiKey": "sk-... (required for deepeval/full mode only)",
  "modelName": "gpt-4o"
}

Input Fields

Field	Required	Description
`conversations`	Yes (or use URL)	Array of conversation objects to evaluate
`conversationUrl`	Alternative	URL to fetch conversation JSON from
`mode`	No (default: heuristic)	Evaluation mode: `heuristic`, `deepeval`, or `full`
`openaiApiKey`	For deepeval/full	Your OpenAI API key
`modelName`	No (default: gpt-4o)	Which OpenAI model to use for evaluation

Output

Each conversation gets a structured evaluation result pushed to the dataset:

Heuristic Mode Output

{
  "conversation_id": "conv_001",
  "quality": {
    "overall": 0.812,
    "relevance": 1.0,
    "coherence": 0.85,
    "helpfulness": 0.5,
    "completeness": 0.9
  },
  "bias": {
    "toxicity": 0.0,
    "bias_detected": false,
    "categories": []
  }
}

DeepEval Mode Output

{
  "conversation_id": "conv_001",
  "relevancy": {"score": 1.0, "reason": "...", "passed": true},
  "faithfulness": {"score": 0.8, "reason": "...", "passed": true},
  "coherence": {"score": 0.9, "reason": "...", "passed": true},
  "helpfulness": {"score": 0.85, "reason": "...", "passed": true},
  "hallucination": {"score": 0.0, "reason": "...", "passed": true},
  "bias": {"score": 0.0, "reason": "...", "passed": true},
  "toxicity": {"score": 0.0, "reason": "...", "passed": true},
  "overall": 0.636
}

Metrics Explained

Heuristic Metrics (Free)

Relevance — Does the response use terms from the user's question?
Coherence — Is the response well-structured with clear formatting?
Helpfulness — Does it contain actionable content (examples, code, steps)?
Completeness — Is the response proportionally thorough relative to the question?
Bias categories — Detects confirmation, gender, racial, and age bias patterns

DeepEval Metrics (LLM-Powered)

Answer Relevancy — Does the response actually answer what was asked?
Faithfulness — Is the response grounded in the provided context? (requires context field)
Coherence — Is it logically structured and easy to follow?
Helpfulness — Does it provide actionable, useful information?
Hallucination — Does it fabricate facts not in the context? (requires context field)
Bias — Does it contain biased opinions or unfair statements?
Toxicity — Does it contain toxic or harmful language?

Tips

Start with heuristic mode to quickly screen large batches at zero cost
Use deepeval mode for detailed analysis of important conversations
Add a context field to your conversations to enable faithfulness and hallucination checks
Use gpt-4o-mini as the model for cheaper deepeval runs with slightly lower accuracy
Export results as CSV from the Dataset tab for spreadsheet analysis

Pricing

Heuristic mode: Only Apify platform compute costs (minimal)
DeepEval/Full mode: Apify compute + your OpenAI API usage (~$0.01-0.10 per conversation depending on model)

AI Training Data Quality MCP Server

ryanclinton/ai-training-data-quality-mcp

AI training data quality assessment, bias detection, and governance scoring for AI agents via the Model Context Protocol.

Ryan Clinton

AI Services 1

powerful_platypus/AI-services-1

AI agent

GOUNTANTE yendoukoa

Company Analysis AI Agent

harvestlabs/company-analysis-ai-agent

harvest-org

Llm Response Evaluator

fiery_dream/llm-response-evaluator

Evaluate LLM outputs with comprehensive quality metrics and A/B testing capabilities. Free alternative to Confident AI ($99/mo).

Cody Churchwell

AI Services

powerful_platypus/AI-services

AI agent for multiple roles

GOUNTANTE yendoukoa

AI Company Researcher Agent

louisdeconinck/ai-company-researcher-agent

AI-powered agent that performs comprehensive company research and generates detailed business reports.

Louis Deconinck

161

1.1

LinkedIn Agent

apexronin/linkedin-agent

A linkedin agent

Jensin

Genai Agents Scraper Api

fresh_cliff/genai-agents-scraper-api

Extract AI agent interactions, prompts, responses, decision flows. Track AI conversations, analyze performance metrics. Real-time AI data API with mirror fallbacks.

Brennan Crawford

Zillow Agent Data Scraper (Agent Listings, Reviews & Details)

coder_zoro/zillow-agent-data-scraper-agent-listings-reviews-details

Scrape complete Zillow agent data effortlessly. Get agent details, active/rental/sold listings, reviews, and search results with one API call. Ideal for real estate analytics, lead generation, and agent performance tracking.

Zoro

103

AliExpress Scraper + AI Analysis

buseta/aliexpress-advanced-scraper

AI-powered AliExpress scraper with fake review detection, sentiment analysis, and quality scoring. Scrape products, reviews, seller data, shipping info, and variants. 4 modes: Search, Category, Product URLs, Store. AI analysis powered by Claude — no API key needed.