Chatbot Arena Scraper
Pricing
Pay per event
Chatbot Arena Scraper
Scrapes the Chatbot Arena (arena.ai) leaderboard to extract LLM model rankings, Elo scores, confidence intervals, vote counts, and category-specific ratings from human preference battles.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
Scrapes the Chatbot Arena leaderboard to extract LLM model rankings, Elo scores, confidence intervals, vote counts, pricing, and context window sizes from human preference battles.
What does Chatbot Arena Scraper do?
Chatbot Arena (arena.ai) is the most widely cited independent benchmark for comparing LLMs, based on millions of anonymous human votes in side-by-side model battles. This actor extracts the full leaderboard data across all 10 categories:
- Text — general text generation
- Code — code generation
- Vision — multimodal/image understanding
- Document — document processing
- Text-to-Image — image generation
- Image Edit — image editing
- Search — web search integration
- Text-to-Video — video generation
- Image-to-Video — video from images
- Video Edit — video editing
Who is it for?
AI engineers and ML teams selecting models for production systems. Compare hundreds of LLMs by Elo score, pricing, and context window to find the optimal model for your latency, cost, and quality requirements.
Product managers and CTOs making build-vs-buy decisions around AI capabilities. Track how providers like OpenAI, Anthropic, Google, and Meta rank against each other across text, code, vision, and image generation.
AI researchers and academics who need standardized, machine-readable benchmark data for papers, reports, and meta-analyses. Export clean JSON or CSV directly from the dataset — no manual data entry from screenshots.
Data engineers and automation builders feeding model performance data into dashboards, Slack bots, or automated model-selection pipelines. Schedule regular runs to keep your internal data warehouse current.
Investors and analysts tracking the competitive landscape of the AI industry. Monitor which providers are gaining or losing ground in human preference rankings across different modalities.
Why scrape Chatbot Arena?
Chatbot Arena is the gold standard for LLM evaluation. Unlike static benchmarks, it uses real human preferences from blind side-by-side comparisons. The Elo scoring system provides a reliable, continuously updated ranking that reflects actual user experience.
Key advantages of using this actor:
- All 10 categories — text, code, vision, document, image generation, video, search, and editing
- Structured data — clean JSON with provider, license, Elo score, confidence interval, votes, pricing, and context window
- Fast and lightweight — HTTP + Cheerio parsing, no browser needed. Runs in under 5 seconds at 256MB memory
- API-first — integrate directly into CI/CD pipelines, dashboards, or automation workflows
- Batch processing — scrape all categories in a single run, get 1,000+ model rankings at once
- Scheduled runs — track ranking trends over time with daily or weekly schedules
Output data
Each result includes:
| Field | Type | Description |
|---|---|---|
rank | integer | Position on the leaderboard |
model | string | Model name (e.g. claude-opus-4-7-thinking) |
provider | string | Company or provider (e.g. Anthropic, Google, OpenAI) |
license | string | License type (Proprietary, Open, MIT, Apache 2.0, etc.) |
score | string | Elo score with confidence interval (e.g. 1503 ±8) |
eloRating | number | Numeric Elo rating |
confidenceInterval | string | CI range (e.g. ±8, +19/-19) |
votes | string | Total number of human votes |
pricePer1MTokens | string | Price per 1M tokens, input/output (e.g. $5 / $25) |
contextLength | string | Context window size (e.g. 1M, 200K) |
category | string | Leaderboard category |
url | string | Source leaderboard URL |
scrapedAt | string | ISO 8601 timestamp |
How much does it cost to scrape Chatbot Arena leaderboard data?
Pricing uses Pay Per Event (PPE) — you only pay for what you actually extract, with no minimum charge beyond the start fee.
| Event | Cost |
|---|---|
| Actor start | $0.005 (once per run) |
| Per model result | $0.0005 |
Example costs:
- Scrape top 20 text models → ~$0.015
- Scrape all text models (~340) → ~$0.175
- Scrape all 10 categories (~1,000+ models) → ~$0.505
Free plan estimate: Apify's free plan includes $5/month in platform credits, enough to run this actor daily on the text category at no charge.
How to use
Using the Apify Console (no code)
- Go to Chatbot Arena Scraper on the Apify Store
- Click Try for free
- Select the categories you want to scrape (default:
text) - Set Max results per category (default: all models)
- Click Start and wait for the run to finish (typically under 10 seconds)
- Click Export to download results as JSON, CSV, or Excel
Typical workflow
- Choose your categories of interest (e.g.,
textandcodefor LLM selection) - Run the actor to get current rankings
- Export to Google Sheets or your database
- Schedule weekly runs to track trends over time
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
categories | string[] | No | ["text"] | Which leaderboard categories to scrape. Options: text, code, vision, document, text-to-image, image-edit, search, text-to-video, image-to-video, video-edit. |
maxResults | integer | No | 0 (all) | Maximum number of models to extract per category. Set to 0 for all models. |
proxyConfiguration | object | No | Apify Proxy | Proxy settings. Uses Apify Proxy by default. |
Supported categories
| Category | URL Path | Typical models | Description |
|---|---|---|---|
text | /leaderboard/text | ~340 | General text generation and conversation |
code | /leaderboard/code | ~65 | Code generation and programming tasks |
vision | /leaderboard/vision | ~90 | Multimodal image understanding |
document | /leaderboard/document | ~40 | Document processing and analysis |
text-to-image | /leaderboard/text-to-image | ~55 | Image generation from text prompts |
image-edit | /leaderboard/image-edit | ~25 | Image editing and manipulation |
search | /leaderboard/search | ~30 | Web search integration |
text-to-video | /leaderboard/text-to-video | ~30 | Video generation from text |
image-to-video | /leaderboard/image-to-video | ~20 | Video generation from images |
video-edit | /leaderboard/video-edit | ~10 | Video editing |
Output example
{"rank": 1,"model": "claude-opus-4-7-thinking","provider": "Anthropic","license": "Proprietary","score": "1503 ±8","eloRating": 1503,"confidenceInterval": "±8","votes": "4,924","pricePer1MTokens": "$5 / $25","contextLength": "1M","category": "text","url": "https://arena.ai/leaderboard/text","scrapedAt": "2026-04-23T19:01:06.184Z"}
Image generation category output (no pricing/context fields):
{"rank": 1,"model": "gpt-image-2 (medium)","provider": "OpenAI","license": "Proprietary","score": "1507 ±9","eloRating": 1507,"confidenceInterval": "±9","votes": "15,391","pricePer1MTokens": null,"contextLength": null,"category": "text-to-image","url": "https://arena.ai/leaderboard/text-to-image","scrapedAt": "2026-04-23T19:01:06.184Z"}
Tips and tricks
- Scraping all 10 categories in a single run gives you a comprehensive cross-domain view of model capabilities.
- Schedule weekly runs to build a historical dataset of how model rankings evolve over time.
- Use the
providerfield to filter results for a specific company (e.g., only Anthropic or OpenAI models). - The
votesfield indicates how statistically reliable a ranking is — models with more votes have more stable Elo scores. - Compare
eloRatingandpricePer1MTokensto identify the best price-performance models. - Image and video categories (
text-to-image,image-edit,text-to-video,image-to-video,video-edit) do not includepricePer1MTokensorcontextLength— these fields will benull.
Integrations
Google Sheets — live model comparison dashboard
Connect the actor's dataset to Google Sheets via the Apify Google Sheets integration. Schedule daily runs to keep a live spreadsheet of model rankings that your team can filter by provider, category, or score range.
Slack — ranking change alerts
Use the Apify Slack integration to post a summary when rankings change. For example, notify your #ai-models channel when a new model enters the top 10 in any category.
Webhooks — feed your internal API
Configure an Apify webhook to POST results to your internal model registry or dashboard API after each run. Use the structured JSON output to update your model selection logic automatically.
n8n / Make / Zapier — automated workflows
Connect this actor to 5,000+ apps via automation platforms. Example: run the scraper weekly, compare results with the previous week's data, and email a summary of ranking changes to your AI team lead.
Airtable — model comparison database
Export results to Airtable to build a searchable model comparison database with filtering by provider, license type, price range, and Elo score.
API usage
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_APIFY_API_TOKEN' });const run = await client.actor('automation-lab/chatbot-arena-scraper').call({categories: ['text', 'code'],maxResults: 20,});const { items } = await client.dataset(run.defaultDatasetId).listItems();for (const item of items) {console.log(`#${item.rank} ${item.model} (${item.provider}) — Elo: ${item.eloRating}, Votes: ${item.votes}`);}
Python
from apify_client import ApifyClientclient = ApifyClient(token="YOUR_APIFY_API_TOKEN")run = client.actor("automation-lab/chatbot-arena-scraper").call(run_input={"categories": ["text", "code"],"maxResults": 20,})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"#{item['rank']} {item['model']} ({item['provider']}) — Elo: {item['eloRating']}, Votes: {item['votes']}")
cURL
# Start the actor runcurl -X POST \"https://api.apify.com/v2/acts/automation-lab~chatbot-arena-scraper/runs?token=YOUR_APIFY_API_TOKEN" \-H "Content-Type: application/json" \-d '{"categories": ["text"],"maxResults": 20}'# Get results (replace DATASET_ID from the response above)curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_APIFY_API_TOKEN"
Use with Claude AI (MCP)
This actor is available as a tool in Claude AI through the Model Context Protocol (MCP). Add it to Claude Desktop, Cursor, Windsurf, or any MCP-compatible client.
Setup for Claude Code
$claude mcp add --transport http apify "https://mcp.apify.com"
Setup for Claude Desktop, Cursor, or VS Code
Add this to your MCP config file:
{"mcpServers": {"apify": {"url": "https://mcp.apify.com"}}}
Example prompts
- "Get the top 20 LLMs from Chatbot Arena and compare their Elo scores and pricing."
- "Which open-source models rank highest on the Chatbot Arena code leaderboard?"
- "Scrape the Chatbot Arena vision and text-to-image leaderboards and show me which providers dominate each category."
Learn more in the Apify MCP documentation.
Legality
This actor scrapes publicly available leaderboard data from arena.ai. The data is publicly accessible without authentication, and the actor makes standard HTTP requests at a reasonable rate (one request per category).
The scraped data consists of aggregated benchmark results (model rankings, scores, vote counts) that are freely shared by the Chatbot Arena project for public use. No personal data or copyrighted content is collected.
For more information about web scraping legality, see the Apify web scraping legality guide.
FAQ
Q: How often is the Chatbot Arena leaderboard updated? A: The leaderboard updates continuously as users submit new votes through arena.ai battles. Rankings can shift daily, especially for newer models with fewer votes. Schedule regular runs to track changes.
Q: Why are pricePer1MTokens and contextLength null for some categories?
A: Image and video categories (text-to-image, image-edit, text-to-video, image-to-video, video-edit) do not include pricing or context window columns on arena.ai. These fields are only available for text-based categories (text, code, vision, document, search).
Q: The actor returned 0 results for a category. What happened? A: This can happen if arena.ai temporarily changes their HTML structure, if the category page is down, or if your proxy was blocked. Check the run log for error messages. Try running again — transient network issues are usually resolved on retry.
Q: Can I scrape all categories at once?
A: Yes. Set categories to an empty array or include all 10 categories in the list. A full scrape across all categories typically extracts 700-1,000+ models and completes in under 30 seconds.
Q: What does the confidence interval (e.g., ±8) mean? A: The confidence interval reflects the statistical uncertainty in the Elo rating. A smaller interval (e.g., ±3) means the ranking is more stable and based on many votes. A larger interval (e.g., ±20) means the model has fewer votes and its ranking may shift significantly with new data.
Related actors
- YouTube Transcript Scraper — Extract transcripts and captions from YouTube videos in bulk
- Google Ads Scraper — Scrape Google Ads results for competitive intelligence
- Trustpilot Reviews Scraper — Extract reviews and ratings from Trustpilot business pages
- Domain Availability Checker — Check domain name availability in bulk
- IBAN Validator — Validate and parse IBAN numbers in batch