Ai Model Benchmark Scraper avatar

Ai Model Benchmark Scraper

Pricing

Pay per usage

Go to Apify Store
Ai Model Benchmark Scraper

Ai Model Benchmark Scraper

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Donny Nguyen

Donny Nguyen

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Scrape AI benchmark leaderboards to extract model performance scores and rankings. Supports multiple benchmark sources including Chatbot Arena (LMSYS), MMLU, HumanEval, and MT-Bench leaderboards.

Features

  • Multi-benchmark support covering the most popular LLM evaluation frameworks
  • Chatbot Arena scraping with ELO ratings from the LMSYS leaderboard
  • Model metadata extraction including provider, parameter count, and release date
  • Score normalization for cross-benchmark comparison when possible
  • Puppeteer-based rendering to handle JavaScript-heavy leaderboard pages
  • Configurable benchmark selection to target specific evaluation metrics

Use Cases

  • Compare LLM performance across multiple benchmarks before selecting a model
  • Track model performance improvements over time with scheduled runs
  • Build automated reports on the AI model competitive landscape
  • Feed benchmark data into model selection pipelines and evaluation frameworks
  • Monitor when new models appear on leaderboards for competitive intelligence

Input Configuration

ParameterTypeDefaultDescription
benchmarksarray["chatbot-arena"]Benchmarks to scrape

Output Format

Each model entry produces a dataset item with:

  • benchmark - Name of the benchmark source
  • modelName - Full model name or identifier
  • score - Benchmark score or ELO rating
  • rank - Position on the leaderboard
  • provider - Organization or company behind the model
  • parameters - Parameter count when available
  • scrapedAt - ISO timestamp of extraction

Supported Benchmarks

This actor supports scraping from LMSYS Chatbot Arena, HuggingFace Open LLM Leaderboard, and various benchmark result pages. Additional benchmark sources can be requested.

Limitations

  • Some leaderboards use complex React/Gradio rendering that may require multiple attempts
  • Benchmark scores and rankings change frequently; schedule regular runs for latest data
  • Parameter counts and release dates may not be available for all models