AI Live Benchmark
Pricing
from $0.01 / 1,000 results
AI Live Benchmark
Actor that aggregates model benchmark data from multiple sources (including Artificial Analysis) and exposes LLM and media model scores (LLM indices, MMLU‑Pro, GPQA, HLE, LiveCodeBench, SciCode, Math‑500, AIME) plus ELO ratings for text‑to‑image, image‑editing, text‑to‑speech, and video models.
Pricing
from $0.01 / 1,000 results
Rating
0.0
(0)
Developer

AIRabbit
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
AI Live Benchmark MCP Server
An MCP (Model Context Protocol) server that wraps the AI Live Benchmark API, providing access to AI model benchmarks, evaluations, and performance metrics.
Features
- LLM Models: Get benchmark scores, pricing, and speed metrics for language models
- Text-to-Image Models: ELO ratings for text-to-image generation models
- Image Editing Models: ELO ratings for image editing models
- Text-to-Speech Models: ELO ratings for text-to-speech models
- Text-to-Video Models: ELO ratings for text-to-video generation models
- Image-to-Video Models: ELO ratings for image-to-video models
- CritPt Evaluation: Evaluate code generation submissions against the CritPt benchmark
- JSONPath Filtering: Use JSONPath expressions to filter large datasets efficiently
Usage
Claude Desktop (via mcp-remote)
No AI Live Benchmark API key is required from users, but an Apify API token is required. Add this to your Claude Desktop config:
{"mcpServers": {"ai-live-benchmark": {"command": "npx","args": ["mcp-remote","https://flamboyant-leaf--ai-live-benchmark.apify.actor/mcp","--header","Authorization: Bearer <APIFY_API_TOKEN>"]}}}
Optional: Run with mcp-remote
mcp-remote \https://flamboyant-leaf--ai-live-benchmark.apify.actor/mcp \--header "Authorization: Bearer <APIFY_API_TOKEN>"
Available Tools
get_llm_models
Get LLM models with benchmark scores, pricing, and speed metrics.
Output Format:
{"status": 200,"data": [{"id": "string","name": "string","slug": "string","model_creator": {"id": "string","name": "string","slug": "string"},"evaluations": {"artificial_analysis_intelligence_index": 62.9,"artificial_analysis_coding_index": 55.8,"artificial_analysis_math_index": 87.2,"mmlu_pro": 0.791,"gpqa": 0.748,"hle": 0.087,"livecodebench": 0.717,"scicode": 0.399,"math_500": 0.973,"aime": 0.77},"pricing": {"price_1m_blended_3_to_1": 1.925,"price_1m_input_tokens": 1.1,"price_1m_output_tokens": 4.4},"median_output_tokens_per_second": 153.831,"median_time_to_first_token_seconds": 14.939}]}
Parameters:
jsonPath(string, optional): JSONPath expression to filter results. Default:$.data[*]
JSONPath Examples:
$.data[?(@.model_creator.slug=="openai")]- Filter by OpenAI models$.data[?(@.rank <= 10)]- Top 10 models by rank$.data[?(@.evaluations.artificial_analysis_intelligence_index > 80)]- Models with intelligence index > 80$.data[?(@.pricing.price_1m_input_tokens < 1)]- Models with input price < $1/M tokens
get_text_to_image_models
Get text-to-image models with ELO ratings.
Output Format:
{"status": 200,"data": [{"id": "string","name": "string","slug": "string","model_creator": {"id": "string","name": "string"},"elo": 1250,"rank": 1,"ci95": "-5/+5","appearances": 5432,"release_date": "2025-04","categories": [{"style_category": "General & Photorealistic","subject_matter_category": "People: Portraits","elo": 1280,"ci95": "-5/+5","appearances": 1234}]}]}
Parameters:
jsonPath(string, optional): JSONPath expression to filter results. Default:$.data[*]includeCategories(boolean, optional): Include category breakdowns
JSONPath Examples:
$.data[?(@.model_creator.name=="OpenAI")]- Filter by creator$.data[?(@.rank <= 5)]- Top 5 models$.data[?(@.elo >= 1200)]- Models with ELO >= 1200
get_image_editing_models
Get image editing models with ELO ratings. Same output format as text-to-image (without categories).
Parameters:
jsonPath(string, optional): JSONPath expression to filter results
get_text_to_speech_models
Get text-to-speech models with ELO ratings. Same output format as text-to-image (without categories).
Parameters:
jsonPath(string, optional): JSONPath expression to filter results
get_text_to_video_models
Get text-to-video models with ELO ratings. Same output format as text-to-image.
Parameters:
jsonPath(string, optional): JSONPath expression to filter resultsincludeCategories(boolean, optional): Include category breakdowns
get_image_to_video_models
Get image-to-video models with ELO ratings. Same output format as text-to-image.
Parameters:
jsonPath(string, optional): JSONPath expression to filter resultsincludeCategories(boolean, optional): Include category breakdowns
evaluate_critpt
Evaluate code generation submissions against the CritPt benchmark.
Parameters:
submissions(array, required): Array of submission objectsproblem_id(string): CritPt problem identifiergenerated_code(string): Generated codemodel(string): Model name/identifiergeneration_config(object): Generation configurationmessages(array, optional): Message objects
batchMetadata(object, optional): Batch metadata
Note: Must include submissions for all problems in the public set.
JSONPath Filtering
All model endpoints support JSONPath expressions to filter results. This allows you to efficiently query large datasets without fetching everything.
JSONPath Examples
Filter by creator:
{"jsonPath": "$.data[?(@.model_creator.slug=='openai')]"}
Filter by rank:
{"jsonPath": "$.data[?(@.rank <= 10)]"}
Filter by score/ELO:
{"jsonPath": "$.data[?(@.elo >= 1200)]"}
Filter by evaluation metric (LLMs):
{"jsonPath": "$.data[?(@.evaluations.artificial_analysis_intelligence_index > 80)]"}
Combine filters:
{"jsonPath": "$.data[?(@.model_creator.slug=='openai' && @.rank <= 5)]"}
Get specific fields:
{"jsonPath": "$.data[*].name"}
For more JSONPath syntax, see: https://jsonpath.com/
API Rate Limits
The AI Live Benchmark API is rate-limited to:
- Data API: 1,000 requests per day
- CritPt Evaluation API: 10 requests per 24-hour window (custom limits available)
Attribution
When using this MCP server or the AI Live Benchmark API, follow your provider's attribution requirements.
License
MIT
API Documentation
For full API documentation, see your provider's documentation.