Pricing

from $1.00 / 1,000 results

Llm Benchmarks

Unified LLM data — pricing, benchmarks, specs, and local deployment info for 300+ models. Compare cost, Open LLM Leaderboard scores, Arena ratings, context lengths, GGUF availability, and VRAM estimates in one dataset.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

David Flagg

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

LLM Benchmark Aggregator

The LLM Field Guide. Pricing, benchmarks, specs, and local deployment data for 300+ language models in one dataset.

What you get

Every model includes up to 30 fields across four categories:

Cost

Input/output pricing per million tokens (from OpenRouter)
Compare across 300+ models from OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, and more

Capability

Open LLM Leaderboard scores: Average, IFEval, BBH, MATH Lvl 5, GPQA, MUSR, MMLU-PRO
Chatbot Arena: MT-bench (multi-turn conversation quality), MMLU
Model type, architecture, parameter count

Specs

Context window length and max completion tokens
Input/output modalities (text, image, audio)
Tokenizer type, instruction format
Content moderation status

Local deployment

GGUF availability on HuggingFace (can you run it locally?)
Estimated VRAM at Q4_K_M quantization
HuggingFace model ID for direct download

No API keys required. All data comes from public APIs.

Data sources

Source	Data	Method
OpenRouter	Pricing, context, modality, 300+ models	REST API
Open LLM Leaderboard	6 benchmark scores, 4,500+ models	HuggingFace Datasets API
Chatbot Arena (LMSYS)	MT-bench, MMLU, 300+ models	HuggingFace Space CSV
HuggingFace	GGUF model availability	Models API

Models from OpenRouter are automatically enriched with benchmark scores and GGUF availability where matches exist.

Example output

{
  "model_id": "meta-llama/llama-3.1-70b-instruct",
  "model_name": "Meta: Llama 3.1 70B Instruct",
  "huggingface_id": "meta-llama/Llama-3.1-70B-Instruct",
  "pricing_input_per_mtok": 0.52,
  "pricing_output_per_mtok": 0.75,
  "context_length": 131072,
  "open_llm_average": 42.8,
  "open_llm_ifeval": 83.6,
  "open_llm_bbh": 55.3,
  "open_llm_math": 26.4,
  "open_llm_gpqa": 18.7,
  "open_llm_mmlu_pro": 50.8,
  "arena_mt_bench": 8.42,
  "parameter_count_b": 70.55,
  "gguf_available": true,
  "estimated_vram_q4_gb": 43.8,
  "is_moderated": false,
  "license": "llama3.1",
  "sources": ["openrouter", "open_llm_leaderboard", "chatbot_arena"]
}

Filtering

Model name — Search by name or ID ('llama', 'claude', 'qwen')
Max price — Only models under a price threshold ($/MTok)
Min benchmark score — Only models above a quality threshold
Model type — Pretrained, chat, fine-tuned, merged, MoE
Sort by any field — price, benchmark, context length, parameter count

Use cases

Model selection — Find the best model for your budget and use case
Cost optimization — Compare pricing across providers for the same capability level
Local deployment planning — Which models can you run on your hardware?
API integration — Feed structured model data into your own tools and dashboards
Market intelligence — Track the LLM landscape as models and prices change daily

AI Model Tracker — LLM Benchmarks & Pricing

aurumworks/ai-model-tracker

Track AI model benchmarks, pricing, and performance. Get rankings, speed metrics, cost per token, and benchmark scores for 500+ LLMs from OpenAI, Anthropic, Google, Meta, and more. Updated weekly.

Aryan Saxena

LLM Radar - AI Model Pricing, Benchmarks & Status Actor API

datahq/llm-radar

Real-time pricing for 110+ AI models, live LMSYS Arena ELO scores, and provider operational status from 11 providers. One API call.

DataHQ

OpenRouter AI Models Directory Scraper

klondikeking/openrouter-models-scraper

Extract structured data for all AI models available on OpenRouter, including pricing, capabilities, context lengths, and supported parameters. Perfect for comparing LLM options and building model catalogs.

Pierrick McD0nald

Llm Sentiment Scraper

david_flagg/llm-sentiment-scraper

Scrape Reddit for real human opinions about 30+ LLM models. Sentiment analysis, top comments, and community consensus from r/LocalLLaMA, r/ChatGPT, r/ClaudeAI, and more. The qualitative layer benchmarks can't capture.

David Flagg

tokenmark - LLM cost analyzer

charcoal_jam/tokenmark-llm-cost-analyzer

Pay-per-event LLM cost analysis. POST a JSONL log or array of LLM call entries; get back per-day/per-model spend, top costly calls, and rule-based route recommendations. Local-first alternative: tokenmark npm package.

autobiz

LLM Provider Price and Latency Monitor

george.the.developer/llm-provider-price-latency-monitor

Normalize model pricing, context length, capabilities, and provider routing data for LLM gateways, AI builders, and FinOps teams.

George Kioko

Website Content to Markdown for LLM Training

easyapi/website-content-to-markdown-for-llm-training

🚀 Transform web content into clean, LLM-ready Markdown! 📘 Scrape multiple pages, extract main content, and convert to Markdown format. Perfect for AI researchers, data scientists, and LLM developers. Fast, efficient, and customizable. Supercharge your AI training data today! 🌐📝🧠