Llm Benchmarks
Pricing
from $1.00 / 1,000 results
Llm Benchmarks
Unified LLM data — pricing, benchmarks, specs, and local deployment info for 300+ models. Compare cost, Open LLM Leaderboard scores, Arena ratings, context lengths, GGUF availability, and VRAM estimates in one dataset.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer

David Flagg
Actor stats
0
Bookmarked
2
Total users
0
Monthly active users
4 days ago
Last modified
Categories
Share
LLM Benchmark Aggregator
The LLM Field Guide. Pricing, benchmarks, specs, and local deployment data for 300+ language models in one dataset.
What you get
Every model includes up to 30 fields across four categories:
Cost
- Input/output pricing per million tokens (from OpenRouter)
- Compare across 300+ models from OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, and more
Capability
- Open LLM Leaderboard scores: Average, IFEval, BBH, MATH Lvl 5, GPQA, MUSR, MMLU-PRO
- Chatbot Arena: MT-bench (multi-turn conversation quality), MMLU
- Model type, architecture, parameter count
Specs
- Context window length and max completion tokens
- Input/output modalities (text, image, audio)
- Tokenizer type, instruction format
- Content moderation status
Local deployment
- GGUF availability on HuggingFace (can you run it locally?)
- Estimated VRAM at Q4_K_M quantization
- HuggingFace model ID for direct download
No API keys required. All data comes from public APIs.
Data sources
| Source | Data | Method |
|---|---|---|
| OpenRouter | Pricing, context, modality, 300+ models | REST API |
| Open LLM Leaderboard | 6 benchmark scores, 4,500+ models | HuggingFace Datasets API |
| Chatbot Arena (LMSYS) | MT-bench, MMLU, 300+ models | HuggingFace Space CSV |
| HuggingFace | GGUF model availability | Models API |
Models from OpenRouter are automatically enriched with benchmark scores and GGUF availability where matches exist.
Example output
{"model_id": "meta-llama/llama-3.1-70b-instruct","model_name": "Meta: Llama 3.1 70B Instruct","huggingface_id": "meta-llama/Llama-3.1-70B-Instruct","pricing_input_per_mtok": 0.52,"pricing_output_per_mtok": 0.75,"context_length": 131072,"open_llm_average": 42.8,"open_llm_ifeval": 83.6,"open_llm_bbh": 55.3,"open_llm_math": 26.4,"open_llm_gpqa": 18.7,"open_llm_mmlu_pro": 50.8,"arena_mt_bench": 8.42,"parameter_count_b": 70.55,"gguf_available": true,"estimated_vram_q4_gb": 43.8,"is_moderated": false,"license": "llama3.1","sources": ["openrouter", "open_llm_leaderboard", "chatbot_arena"]}
Filtering
- Model name — Search by name or ID ('llama', 'claude', 'qwen')
- Max price — Only models under a price threshold ($/MTok)
- Min benchmark score — Only models above a quality threshold
- Model type — Pretrained, chat, fine-tuned, merged, MoE
- Sort by any field — price, benchmark, context length, parameter count
Use cases
- Model selection — Find the best model for your budget and use case
- Cost optimization — Compare pricing across providers for the same capability level
- Local deployment planning — Which models can you run on your hardware?
- API integration — Feed structured model data into your own tools and dashboards
- Market intelligence — Track the LLM landscape as models and prices change daily