Llm Benchmarks avatar

Llm Benchmarks

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Llm Benchmarks

Llm Benchmarks

Unified LLM data — pricing, benchmarks, specs, and local deployment info for 300+ models. Compare cost, Open LLM Leaderboard scores, Arena ratings, context lengths, GGUF availability, and VRAM estimates in one dataset.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

David Flagg

David Flagg

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

4 days ago

Last modified

Share

LLM Benchmark Aggregator

The LLM Field Guide. Pricing, benchmarks, specs, and local deployment data for 300+ language models in one dataset.

What you get

Every model includes up to 30 fields across four categories:

Cost

  • Input/output pricing per million tokens (from OpenRouter)
  • Compare across 300+ models from OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, and more

Capability

  • Open LLM Leaderboard scores: Average, IFEval, BBH, MATH Lvl 5, GPQA, MUSR, MMLU-PRO
  • Chatbot Arena: MT-bench (multi-turn conversation quality), MMLU
  • Model type, architecture, parameter count

Specs

  • Context window length and max completion tokens
  • Input/output modalities (text, image, audio)
  • Tokenizer type, instruction format
  • Content moderation status

Local deployment

  • GGUF availability on HuggingFace (can you run it locally?)
  • Estimated VRAM at Q4_K_M quantization
  • HuggingFace model ID for direct download

No API keys required. All data comes from public APIs.

Data sources

SourceDataMethod
OpenRouterPricing, context, modality, 300+ modelsREST API
Open LLM Leaderboard6 benchmark scores, 4,500+ modelsHuggingFace Datasets API
Chatbot Arena (LMSYS)MT-bench, MMLU, 300+ modelsHuggingFace Space CSV
HuggingFaceGGUF model availabilityModels API

Models from OpenRouter are automatically enriched with benchmark scores and GGUF availability where matches exist.

Example output

{
"model_id": "meta-llama/llama-3.1-70b-instruct",
"model_name": "Meta: Llama 3.1 70B Instruct",
"huggingface_id": "meta-llama/Llama-3.1-70B-Instruct",
"pricing_input_per_mtok": 0.52,
"pricing_output_per_mtok": 0.75,
"context_length": 131072,
"open_llm_average": 42.8,
"open_llm_ifeval": 83.6,
"open_llm_bbh": 55.3,
"open_llm_math": 26.4,
"open_llm_gpqa": 18.7,
"open_llm_mmlu_pro": 50.8,
"arena_mt_bench": 8.42,
"parameter_count_b": 70.55,
"gguf_available": true,
"estimated_vram_q4_gb": 43.8,
"is_moderated": false,
"license": "llama3.1",
"sources": ["openrouter", "open_llm_leaderboard", "chatbot_arena"]
}

Filtering

  • Model name — Search by name or ID ('llama', 'claude', 'qwen')
  • Max price — Only models under a price threshold ($/MTok)
  • Min benchmark score — Only models above a quality threshold
  • Model type — Pretrained, chat, fine-tuned, merged, MoE
  • Sort by any field — price, benchmark, context length, parameter count

Use cases

  • Model selection — Find the best model for your budget and use case
  • Cost optimization — Compare pricing across providers for the same capability level
  • Local deployment planning — Which models can you run on your hardware?
  • API integration — Feed structured model data into your own tools and dashboards
  • Market intelligence — Track the LLM landscape as models and prices change daily