ARC Prize Leaderboard Scraper
Pricing
Pay per event
ARC Prize Leaderboard Scraper
Scrapes ARC Prize leaderboard data (ARC-AGI-1/2/3 benchmarks) for all AI models including scores, costs, providers, and rankings
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
12 days ago
Last modified
Categories
Share
Extract AI model rankings, scores, costs, and metadata from the ARC Prize leaderboard β covering all ARC-AGI-1, ARC-AGI-2, and ARC-AGI-3 benchmark datasets.
π What does it do?
The ARC Prize Leaderboard Scraper extracts structured benchmark data from arcprize.org/leaderboard, the official leaderboard for the ARC Prize competition β the leading benchmark for measuring progress toward artificial general intelligence (AGI).
Give it a list of benchmark versions (v1, v2, v3) and it returns every model's performance data including scores, costs per task, provider information, model types, and release dates.
What you get for each leaderboard entry:
- Model name and display label
- Provider/organization name
- ARC-AGI benchmark version and dataset
- Score (0β1 accuracy)
- Cost per task (v1/v2) or total evaluation cost (v3)
- Model type (Base LLM, CoT, Custom, etc.)
- Model group/family
- Release date
π₯ Who is it for?
π¬ AI researchers and academics
Tracking the frontier of AI capabilities? Use this actor to collect time-series data on how different model families progress on ARC-AGI benchmarks without manually scraping the leaderboard.
π Data scientists and analysts
Building dashboards comparing LLM capabilities, cost-efficiency frontiers, or vendor performance? Get structured, queryable JSON output for all models across all benchmark versions.
π€ AI product teams and investors
Monitoring competitor model performance on the hardest reasoning benchmarks, tracking cost-efficiency trends, or building automated capability-tracking pipelines.
π° AI journalists and content creators
Writing about AGI progress? Pull fresh leaderboard data programmatically to power articles, newsletters, or automated reports.
π« Educators and course creators
Teaching AI capabilities and limitations? Use live leaderboard data in lectures, assignments, and demos.
π Why use it?
- Direct JSON endpoint β ARC Prize exposes clean public JSON endpoints; no HTML parsing or browser automation needed
- All benchmark versions β covers ARC-AGI-1, ARC-AGI-2, and ARC-AGI-3 in a single run
- Structured output β fully typed fields, consistent schema across versions
- Always fresh β fetches live data from arcprize.org on every run
- Low cost β pure HTTP requests, runs in under 30 seconds with minimal compute
π Data fields extracted
| Field | Type | Description |
|---|---|---|
version | string | Benchmark version: v1, v2, or v3 |
datasetId | string | Internal dataset identifier (e.g., v1_Semi_Private) |
datasetDisplayName | string | Human-readable dataset name (ARC-AGI-1, ARC-AGI-2, ARC-AGI-3) |
modelId | string | Unique model identifier |
modelDisplayName | string | Human-readable model name |
modelType | string | null | Model type: Base LLM, CoT, Custom, CoT + Synthesis, etc. |
modelGroup | string | null | Model family/group name |
providerId | string | Provider identifier (e.g., Anthropic, OpenAI) |
providerDisplayName | string | Human-readable provider name |
score | number | Accuracy score (0β1, where 1.0 = 100%) |
costPerTask | number | null | Cost in USD per task solved (v1/v2); null for v3 |
totalCost | number | null | Total evaluation cost in USD (v3); null for v1/v2 |
modelReleaseDate | string | null | Model release date (ISO 8601) |
display | boolean | Whether this entry is shown on the public leaderboard |
resultsUrl | string | URL to detailed results (if available) |
leaderboardUrl | string | URL to the ARC Prize leaderboard |
π° How much does it cost?
This scraper uses Pay-Per-Event (PPE) pricing β you only pay for entries actually extracted.
| What you pay for | Cost |
|---|---|
| Run started (one-time) | $0.005 |
| Per leaderboard entry extracted | $0.0005 |
Example costs:
- All 3 benchmark versions (~300 entries total): ~$0.155
- One benchmark version (~100β150 entries): ~$0.055β0.080
- Monthly monitoring run (weekly, all versions): ~$0.62/month
βοΈ Input configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
datasets | array | ["v1","v2","v3"] | Which benchmark versions to scrape |
includeHidden | boolean | false | Include entries not shown on public leaderboard |
maxRequestRetries | integer | 3 | Retry attempts for failed HTTP requests |
π Example input
Scrape all three ARC-AGI benchmark leaderboards:
{"datasets": ["v1", "v2", "v3"],"includeHidden": false}
Scrape only ARC-AGI-2 including hidden entries:
{"datasets": ["v2"],"includeHidden": true}
π€ Example output
{"version": "v1","datasetId": "v1_Semi_Private","datasetDisplayName": "ARC-AGI-1","modelId": "Claude 3.7","modelDisplayName": "Claude 3.7","modelType": "Base LLM","modelGroup": null,"providerId": "Anthropic","providerDisplayName": "Anthropic","score": 0.136,"costPerTask": 0.058,"totalCost": null,"modelReleaseDate": "2025-02-24T00:00:00.000Z","display": true,"resultsUrl": "","leaderboardUrl": "https://arcprize.org/leaderboard"}
π οΈ How to use
Follow these steps to get leaderboard data from the Apify Store:
- Open the actor β go to ARC Prize Leaderboard Scraper on the Apify Store and click Try for free.
- Configure input β in the Input tab, choose which benchmark versions (
v1,v2,v3) to scrape and whether to include hidden entries. - Run β click Start and wait for the actor to finish (typically under 30 seconds).
- Download results β go to the Dataset tab to view extracted entries. Export as JSON, CSV, or JSONL using the Export button, or fetch via the Apify API.
- Schedule recurring runs (optional) β click Schedule to run automatically (e.g., weekly) and always have fresh leaderboard data.
- Connect to downstream tools β use the Apify integrations to send data to Google Sheets, Slack, Webhooks, or any HTTP endpoint after each run.
π Integrations
Connect ARC Prize Leaderboard Scraper to your existing tools and workflows:
Google Sheets
Use the Apify β Google Sheets integration to automatically append fresh leaderboard data to a spreadsheet after each run. Ideal for building live-updating dashboards or sharing data with your team.
Slack notifications
Trigger a Slack message whenever the leaderboard updates (e.g., a new model breaks a top-10 score). Wire up the Apify β Slack integration in the Integrations tab.
Webhooks
After each run completes, fire a webhook to any HTTP endpoint β your own backend, a Zapier/Make workflow, or an n8n automation. Configure in the actor's Integrations tab β Webhook.
Apify API + Python / Node.js
Embed leaderboard scraping in your own data pipeline using the Apify Python client or Node.js client. See the API usage examples section below.
Make (Integromat) / Zapier
Use Apify's native Make and Zapier connectors to route leaderboard data into spreadsheets, databases, or notification services without writing code.
AI agents via MCP
Expose live benchmark data to Claude, Cursor, or VS Code AI features β see the MCP integration section below.
π§ Technical details
- Architecture: Pure HTTP, no browser needed
- Source:
arcprize.org/media/data/leaderboard/{v1,v2,v3}.json - Typical runtime: < 30 seconds
- Memory: 256 MB
- Rate limits: Public JSON endpoints, no rate limiting observed
π€ MCP integration (Claude, Cursor, VS Code)
Use this actor as a live data source inside AI coding assistants via the Apify MCP server.
Claude Code (terminal)
Install the Apify MCP server into Claude Code with one command:
$claude mcp add apify -- npx -y @apify/mcp-server@latest
Then set your API token:
$export APIFY_API_KEY=your_apify_api_token
In any Claude conversation you can then ask:
- "Run the arcprize-leaderboard-scraper actor and show me the top 10 models by score on ARC-AGI-2."
- "Show me all models from Anthropic on ARC-AGI-1 with their scores, sorted by score descending."
- "Which models have scores above 50% on ARC-AGI-2? Run the scraper and filter the results."
Claude Desktop
Add the Apify MCP server to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{"mcpServers": {"apify": {"command": "npx","args": ["-y", "@apify/mcp-server@latest"],"env": {"APIFY_API_KEY": "your_apify_api_token"}}}}
Cursor / VS Code
Add to your MCP settings (.cursor/mcp.json or VS Code MCP config):
{"mcpServers": {"apify": {"command": "npx","args": ["-y", "@apify/mcp-server@latest"],"env": {"APIFY_API_KEY": "your_apify_api_token"}}}}
Once configured, your AI assistant can call run_actor with actor ID automation-lab/arcprize-leaderboard-scraper and input like {"datasets": ["v1","v2","v3"]} to fetch live leaderboard data mid-conversation.
π€ FAQ β Frequently asked questions
What is ARC-AGI? The Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) is a benchmark created by FranΓ§ois Chollet to measure general reasoning abilities β tasks that require pattern recognition and abstract reasoning rather than knowledge retrieval.
What's the difference between ARC-AGI-1, -2, and -3?
- ARC-AGI-1: Original 2020 benchmark; many top models now exceed 85% accuracy
- ARC-AGI-2: Harder 2025 version; current best models score under 30%
- ARC-AGI-3: Hardest 2025/2026 version; frontier models score under 1%
Why are some entries hidden?
Hidden entries (display: false) include superseded models, internal test runs, or entries the ARC Prize team chose not to highlight. Enable includeHidden: true to see them.
How often is the leaderboard updated? arcprize.org updates their JSON files when new evaluation results are submitted. Run this actor regularly (e.g., weekly via Apify schedules) to track changes over time.
Why does v3 use totalCost instead of costPerTask?
ARC-AGI-3 reports total evaluation cost rather than per-task cost, reflecting the full infrastructure cost of a complete evaluation run.
π» API usage examples
You can trigger this actor programmatically via the Apify API or SDKs.
Node.js (ApifyClient):
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('automation-lab/arcprize-leaderboard-scraper').call({datasets: ['v1', 'v2', 'v3'],includeHidden: false,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Python (ApifyClient):
from apify_client import ApifyClientclient = ApifyClient('YOUR_API_TOKEN')run = client.actor('automation-lab/arcprize-leaderboard-scraper').call(run_input={'datasets': ['v1', 'v2', 'v3'],'includeHidden': False,})items = client.dataset(run['defaultDatasetId']).list_items().itemsprint(items)
cURL:
curl -X POST \"https://api.apify.com/v2/acts/automation-lab~arcprize-leaderboard-scraper/runs?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"datasets":["v1","v2","v3"],"includeHidden":false}'
π Related actors
- EvalPlus Leaderboard Scraper β Scrapes the EvalPlus code generation benchmark leaderboard
- AlpacaEval Leaderboard Scraper β Scrapes the AlpacaEval instruction-following leaderboard
- LiveBench Scraper β Scrapes the LiveBench LLM benchmark leaderboard
- EQ-Bench Scraper β Scrapes the EQ-Bench emotional intelligence benchmark leaderboard
βοΈ Legality and terms of use
This actor accesses publicly available JSON endpoints on arcprize.org β the same data that powers the public leaderboard website. No authentication is required, and the data is intentionally made public for research and benchmarking transparency.
- No login, credentials, or bypassing of access controls is involved
- The data is publicly published by the ARC Prize organization
- Usage should comply with arcprize.org's terms of service
- Do not use for commercial redistribution of the data without permission from the ARC Prize Foundation