AI Live Benchmark avatar
AI Live Benchmark

Pricing

from $0.01 / 1,000 results

Go to Apify Store
AI Live Benchmark

AI Live Benchmark

Actor that aggregates model benchmark data from multiple sources (including Artificial Analysis) and exposes LLM and media model scores (LLM indices, MMLU‑Pro, GPQA, HLE, LiveCodeBench, SciCode, Math‑500, AIME) plus ELO ratings for text‑to‑image, image‑editing, text‑to‑speech, and video models.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

AIRabbit

AIRabbit

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

AI Live Benchmark MCP Server

An MCP (Model Context Protocol) server that wraps the AI Live Benchmark API, providing access to AI model benchmarks, evaluations, and performance metrics.

Features

  • LLM Models: Get benchmark scores, pricing, and speed metrics for language models
  • Text-to-Image Models: ELO ratings for text-to-image generation models
  • Image Editing Models: ELO ratings for image editing models
  • Text-to-Speech Models: ELO ratings for text-to-speech models
  • Text-to-Video Models: ELO ratings for text-to-video generation models
  • Image-to-Video Models: ELO ratings for image-to-video models
  • CritPt Evaluation: Evaluate code generation submissions against the CritPt benchmark
  • JSONPath Filtering: Use JSONPath expressions to filter large datasets efficiently

Usage

Claude Desktop (via mcp-remote)

No AI Live Benchmark API key is required from users, but an Apify API token is required. Add this to your Claude Desktop config:

{
"mcpServers": {
"ai-live-benchmark": {
"command": "npx",
"args": [
"mcp-remote",
"https://flamboyant-leaf--ai-live-benchmark.apify.actor/mcp",
"--header",
"Authorization: Bearer <APIFY_API_TOKEN>"
]
}
}
}

Optional: Run with mcp-remote

mcp-remote \
https://flamboyant-leaf--ai-live-benchmark.apify.actor/mcp \
--header "Authorization: Bearer <APIFY_API_TOKEN>"

Available Tools

get_llm_models

Get LLM models with benchmark scores, pricing, and speed metrics.

Output Format:

{
"status": 200,
"data": [
{
"id": "string",
"name": "string",
"slug": "string",
"model_creator": {
"id": "string",
"name": "string",
"slug": "string"
},
"evaluations": {
"artificial_analysis_intelligence_index": 62.9,
"artificial_analysis_coding_index": 55.8,
"artificial_analysis_math_index": 87.2,
"mmlu_pro": 0.791,
"gpqa": 0.748,
"hle": 0.087,
"livecodebench": 0.717,
"scicode": 0.399,
"math_500": 0.973,
"aime": 0.77
},
"pricing": {
"price_1m_blended_3_to_1": 1.925,
"price_1m_input_tokens": 1.1,
"price_1m_output_tokens": 4.4
},
"median_output_tokens_per_second": 153.831,
"median_time_to_first_token_seconds": 14.939
}
]
}

Parameters:

  • jsonPath (string, optional): JSONPath expression to filter results. Default: $.data[*]

JSONPath Examples:

  • $.data[?(@.model_creator.slug=="openai")] - Filter by OpenAI models
  • $.data[?(@.rank <= 10)] - Top 10 models by rank
  • $.data[?(@.evaluations.artificial_analysis_intelligence_index > 80)] - Models with intelligence index > 80
  • $.data[?(@.pricing.price_1m_input_tokens < 1)] - Models with input price < $1/M tokens

get_text_to_image_models

Get text-to-image models with ELO ratings.

Output Format:

{
"status": 200,
"data": [
{
"id": "string",
"name": "string",
"slug": "string",
"model_creator": {
"id": "string",
"name": "string"
},
"elo": 1250,
"rank": 1,
"ci95": "-5/+5",
"appearances": 5432,
"release_date": "2025-04",
"categories": [
{
"style_category": "General & Photorealistic",
"subject_matter_category": "People: Portraits",
"elo": 1280,
"ci95": "-5/+5",
"appearances": 1234
}
]
}
]
}

Parameters:

  • jsonPath (string, optional): JSONPath expression to filter results. Default: $.data[*]
  • includeCategories (boolean, optional): Include category breakdowns

JSONPath Examples:

  • $.data[?(@.model_creator.name=="OpenAI")] - Filter by creator
  • $.data[?(@.rank <= 5)] - Top 5 models
  • $.data[?(@.elo >= 1200)] - Models with ELO >= 1200

get_image_editing_models

Get image editing models with ELO ratings. Same output format as text-to-image (without categories).

Parameters:

  • jsonPath (string, optional): JSONPath expression to filter results

get_text_to_speech_models

Get text-to-speech models with ELO ratings. Same output format as text-to-image (without categories).

Parameters:

  • jsonPath (string, optional): JSONPath expression to filter results

get_text_to_video_models

Get text-to-video models with ELO ratings. Same output format as text-to-image.

Parameters:

  • jsonPath (string, optional): JSONPath expression to filter results
  • includeCategories (boolean, optional): Include category breakdowns

get_image_to_video_models

Get image-to-video models with ELO ratings. Same output format as text-to-image.

Parameters:

  • jsonPath (string, optional): JSONPath expression to filter results
  • includeCategories (boolean, optional): Include category breakdowns

evaluate_critpt

Evaluate code generation submissions against the CritPt benchmark.

Parameters:

  • submissions (array, required): Array of submission objects
    • problem_id (string): CritPt problem identifier
    • generated_code (string): Generated code
    • model (string): Model name/identifier
    • generation_config (object): Generation configuration
    • messages (array, optional): Message objects
  • batchMetadata (object, optional): Batch metadata

Note: Must include submissions for all problems in the public set.

JSONPath Filtering

All model endpoints support JSONPath expressions to filter results. This allows you to efficiently query large datasets without fetching everything.

JSONPath Examples

Filter by creator:

{
"jsonPath": "$.data[?(@.model_creator.slug=='openai')]"
}

Filter by rank:

{
"jsonPath": "$.data[?(@.rank <= 10)]"
}

Filter by score/ELO:

{
"jsonPath": "$.data[?(@.elo >= 1200)]"
}

Filter by evaluation metric (LLMs):

{
"jsonPath": "$.data[?(@.evaluations.artificial_analysis_intelligence_index > 80)]"
}

Combine filters:

{
"jsonPath": "$.data[?(@.model_creator.slug=='openai' && @.rank <= 5)]"
}

Get specific fields:

{
"jsonPath": "$.data[*].name"
}

For more JSONPath syntax, see: https://jsonpath.com/

API Rate Limits

The AI Live Benchmark API is rate-limited to:

  • Data API: 1,000 requests per day
  • CritPt Evaluation API: 10 requests per 24-hour window (custom limits available)

Attribution

When using this MCP server or the AI Live Benchmark API, follow your provider's attribution requirements.

License

MIT

API Documentation

For full API documentation, see your provider's documentation.