Artificial Analysis AI Model Benchmark Scraper
Pricing
Pay per event
Artificial Analysis AI Model Benchmark Scraper
Scrapes LLM benchmark scores, pricing, and performance data from Artificial Analysis — the leading independent evaluator of AI models.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Scrapes LLM benchmark scores, pricing, and performance data from Artificial Analysis — the leading independent evaluator of AI models.
What this actor does
Extracts structured data for ~370 AI language models from Artificial Analysis, including:
- Benchmark scores: Quality index, MMLU-Pro, GPQA Diamond, HumanEval, LiveCodeBench, MATH-500, MMMU-Pro, and more
- Pricing: Input, output, and blended cost per million tokens
- Performance: Median throughput (tokens/sec) and time-to-first-token latency
- Provider info: All hosting providers, cheapest provider by blended price
- Model metadata: Creator/lab, release date, parameter count, context window, license, open-weight status
All data is extracted in a single request to the /models page, which serves the full model dataset inline as a React Server Component payload. No per-model crawling needed.
Use cases
- Model selection: Compare cost-vs-quality trade-offs across providers
- Price monitoring: Track pricing changes across OpenAI, Anthropic, Google, Meta, and 40+ hosting providers
- Research and benchmarking: Import baseline scores into your own evaluation pipeline
- Cost optimization: Find the cheapest or fastest provider for a given quality target
Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
maxItems | integer | Yes | 10 | Maximum number of model records to return. Set to a large number (e.g. 500) to retrieve all models. |
Output
Each dataset item represents one AI model. Example record:
{"model_slug": "claude-4-opus","model_name": "Claude 4 Opus","provider": "Anthropic","release_date": "2025-05-22","parameter_count": null,"context_window_tokens": 200000,"aa_quality_index": 57.4,"mmlu_pro_score": 0.812,"gpqa_diamond_score": 0.738,"humaneval_score": 0.921,"math_score": 84.1,"chatbot_arena_elo": null,"aider_polyglot_score": null,"livecodebench_score": 0.703,"mmmu_score": null,"benchmark_breakdown": "{\"agentic_index\":45.2,\"coding_index\":68.1,...}","price_input_usd_per_million": 15,"price_output_usd_per_million": 75,"price_blended_usd_per_million": 30,"throughput_tokens_per_second": 58.3,"latency_first_token_ms": 1204,"hosting_providers": "[\"Anthropic\",\"Amazon Bedrock\",\"Google Vertex AI\"]","cheapest_provider": "Amazon Bedrock","fastest_provider": null,"license": "proprietary","is_open_weight": false,"profile_url": "https://artificialanalysis.ai/models/claude-4-opus","scraped_at": "2026-05-31T08:00:00.000Z"}
Notes on specific fields:
chatbot_arena_eloandaider_polyglot_scoreare alwaysnull— these metrics are not tracked by Artificial Analysis and would require separate scrapers from Chatbot Arena and Aider.chat.benchmark_breakdownis a JSON string containing additional sub-benchmarks (agentic_index, coding_index, math_index, HLE, AIME-2025, IFBench, SciCode, LCR, Omniscience).hosting_providersis a JSON string array of all providers offering this model.fastest_provideris alwaysnull— per-provider throughput breakdown is not available on the listing page.
Notes
- The actor makes a single HTTP request to
https://artificialanalysis.ai/models. No proxy required. - The full dataset (~370 models) is available in one request. Use
maxItems: 500to get everything. - Prices and benchmarks on Artificial Analysis update frequently — run the actor periodically for up-to-date data.