Artificial Analysis AI Model Benchmark Scraper avatar

Artificial Analysis AI Model Benchmark Scraper

Pricing

Pay per event

Go to Apify Store
Artificial Analysis AI Model Benchmark Scraper

Artificial Analysis AI Model Benchmark Scraper

Scrapes LLM benchmark scores, pricing, and performance data from Artificial Analysis — the leading independent evaluator of AI models.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Scrapes LLM benchmark scores, pricing, and performance data from Artificial Analysis — the leading independent evaluator of AI models.

What this actor does

Extracts structured data for ~370 AI language models from Artificial Analysis, including:

  • Benchmark scores: Quality index, MMLU-Pro, GPQA Diamond, HumanEval, LiveCodeBench, MATH-500, MMMU-Pro, and more
  • Pricing: Input, output, and blended cost per million tokens
  • Performance: Median throughput (tokens/sec) and time-to-first-token latency
  • Provider info: All hosting providers, cheapest provider by blended price
  • Model metadata: Creator/lab, release date, parameter count, context window, license, open-weight status

All data is extracted in a single request to the /models page, which serves the full model dataset inline as a React Server Component payload. No per-model crawling needed.

Use cases

  • Model selection: Compare cost-vs-quality trade-offs across providers
  • Price monitoring: Track pricing changes across OpenAI, Anthropic, Google, Meta, and 40+ hosting providers
  • Research and benchmarking: Import baseline scores into your own evaluation pipeline
  • Cost optimization: Find the cheapest or fastest provider for a given quality target

Input

FieldTypeRequiredDefaultDescription
maxItemsintegerYes10Maximum number of model records to return. Set to a large number (e.g. 500) to retrieve all models.

Output

Each dataset item represents one AI model. Example record:

{
"model_slug": "claude-4-opus",
"model_name": "Claude 4 Opus",
"provider": "Anthropic",
"release_date": "2025-05-22",
"parameter_count": null,
"context_window_tokens": 200000,
"aa_quality_index": 57.4,
"mmlu_pro_score": 0.812,
"gpqa_diamond_score": 0.738,
"humaneval_score": 0.921,
"math_score": 84.1,
"chatbot_arena_elo": null,
"aider_polyglot_score": null,
"livecodebench_score": 0.703,
"mmmu_score": null,
"benchmark_breakdown": "{\"agentic_index\":45.2,\"coding_index\":68.1,...}",
"price_input_usd_per_million": 15,
"price_output_usd_per_million": 75,
"price_blended_usd_per_million": 30,
"throughput_tokens_per_second": 58.3,
"latency_first_token_ms": 1204,
"hosting_providers": "[\"Anthropic\",\"Amazon Bedrock\",\"Google Vertex AI\"]",
"cheapest_provider": "Amazon Bedrock",
"fastest_provider": null,
"license": "proprietary",
"is_open_weight": false,
"profile_url": "https://artificialanalysis.ai/models/claude-4-opus",
"scraped_at": "2026-05-31T08:00:00.000Z"
}

Notes on specific fields:

  • chatbot_arena_elo and aider_polyglot_score are always null — these metrics are not tracked by Artificial Analysis and would require separate scrapers from Chatbot Arena and Aider.chat.
  • benchmark_breakdown is a JSON string containing additional sub-benchmarks (agentic_index, coding_index, math_index, HLE, AIME-2025, IFBench, SciCode, LCR, Omniscience).
  • hosting_providers is a JSON string array of all providers offering this model.
  • fastest_provider is always null — per-provider throughput breakdown is not available on the listing page.

Notes

  • The actor makes a single HTTP request to https://artificialanalysis.ai/models. No proxy required.
  • The full dataset (~370 models) is available in one request. Use maxItems: 500 to get everything.
  • Prices and benchmarks on Artificial Analysis update frequently — run the actor periodically for up-to-date data.