Pricing

Pay per event

Chatbot Arena Scraper

Scrapes the Chatbot Arena (arena.ai) leaderboard to extract LLM model rankings, Elo scores, confidence intervals, vote counts, and category-specific ratings from human preference battles.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

25 days ago

Last modified

What does Chatbot Arena Scraper do?

Chatbot Arena (arena.ai) is the most widely cited independent benchmark for comparing LLMs, based on millions of anonymous human votes in side-by-side model battles. This actor extracts the full leaderboard data across all 10 categories:

Text — general text generation
Code — code generation
Vision — multimodal/image understanding
Document — document processing
Text-to-Image — image generation
Image Edit — image editing
Search — web search integration
Text-to-Video — video generation
Image-to-Video — video from images
Video Edit — video editing

Who is it for?

AI engineers and ML teams selecting models for production systems. Compare hundreds of LLMs by Elo score, pricing, and context window to find the optimal model for your latency, cost, and quality requirements.

Product managers and CTOs making build-vs-buy decisions around AI capabilities. Track how providers like OpenAI, Anthropic, Google, and Meta rank against each other across text, code, vision, and image generation.

AI researchers and academics who need standardized, machine-readable benchmark data for papers, reports, and meta-analyses. Export clean JSON or CSV directly from the dataset — no manual data entry from screenshots.

Data engineers and automation builders feeding model performance data into dashboards, Slack bots, or automated model-selection pipelines. Schedule regular runs to keep your internal data warehouse current.

Investors and analysts tracking the competitive landscape of the AI industry. Monitor which providers are gaining or losing ground in human preference rankings across different modalities.

Why scrape Chatbot Arena?

Chatbot Arena is the gold standard for LLM evaluation. Unlike static benchmarks, it uses real human preferences from blind side-by-side comparisons. The Elo scoring system provides a reliable, continuously updated ranking that reflects actual user experience.

Key advantages of using this actor:

All 10 categories — text, code, vision, document, image generation, video, search, and editing
Structured data — clean JSON with provider, license, Elo score, confidence interval, votes, pricing, and context window
Fast and lightweight — HTTP + Cheerio parsing, no browser needed. Runs in under 5 seconds at 256MB memory
API-first — integrate directly into CI/CD pipelines, dashboards, or automation workflows
Batch processing — scrape all categories in a single run, get 1,000+ model rankings at once
Scheduled runs — track ranking trends over time with daily or weekly schedules

Output data

Each result includes:

Field	Type	Description
`rank`	integer	Position on the leaderboard
`model`	string	Model name (e.g. `claude-opus-4-7-thinking`)
`provider`	string	Company or provider (e.g. `Anthropic`, `Google`, `OpenAI`)
`license`	string	License type (`Proprietary`, `Open`, `MIT`, `Apache 2.0`, etc.)
`score`	string	Elo score with confidence interval (e.g. `1503 ±8`)
`eloRating`	number	Numeric Elo rating
`confidenceInterval`	string	CI range (e.g. `±8`, `+19/-19`)
`votes`	string	Total number of human votes
`pricePer1MTokens`	string	Price per 1M tokens, input/output (e.g. `$5 / $25`)
`contextLength`	string	Context window size (e.g. `1M`, `200K`)
`category`	string	Leaderboard category
`url`	string	Source leaderboard URL
`scrapedAt`	string	ISO 8601 timestamp

How much does it cost to scrape Chatbot Arena leaderboard data?

Pricing uses Pay Per Event (PPE) — you only pay for what you actually extract, with no minimum charge beyond the start fee.

Event	Cost
Actor start	$0.005 (once per run)
Per model result	$0.0005

Example costs:

Scrape top 20 text models → ~$0.015
Scrape all text models (~340) → ~$0.175
Scrape all 10 categories (~1,000+ models) → ~$0.505

Free plan estimate: Apify's free plan includes $5/month in platform credits, enough to run this actor daily on the text category at no charge.

How to use

Using the Apify Console (no code)

Go to Chatbot Arena Scraper on the Apify Store
Click Try for free
Select the categories you want to scrape (default: text)
Set Max results per category (default: all models)
Click Start and wait for the run to finish (typically under 10 seconds)
Click Export to download results as JSON, CSV, or Excel

Typical workflow

Choose your categories of interest (e.g., text and code for LLM selection)
Run the actor to get current rankings
Export to Google Sheets or your database
Schedule weekly runs to track trends over time

Input parameters

Parameter	Type	Required	Default	Description
`categories`	string[]	No	`["text"]`	Which leaderboard categories to scrape. Options: `text`, `code`, `vision`, `document`, `text-to-image`, `image-edit`, `search`, `text-to-video`, `image-to-video`, `video-edit`.
`maxResults`	integer	No	0 (all)	Maximum number of models to extract per category. Set to `0` for all models.
`proxyConfiguration`	object	No	Apify Proxy	Proxy settings. Uses Apify Proxy by default.

Supported categories

Category	URL Path	Typical models	Description
`text`	`/leaderboard/text`	~340	General text generation and conversation
`code`	`/leaderboard/code`	~65	Code generation and programming tasks
`vision`	`/leaderboard/vision`	~90	Multimodal image understanding
`document`	`/leaderboard/document`	~40	Document processing and analysis
`text-to-image`	`/leaderboard/text-to-image`	~55	Image generation from text prompts
`image-edit`	`/leaderboard/image-edit`	~25	Image editing and manipulation
`search`	`/leaderboard/search`	~30	Web search integration
`text-to-video`	`/leaderboard/text-to-video`	~30	Video generation from text
`image-to-video`	`/leaderboard/image-to-video`	~20	Video generation from images
`video-edit`	`/leaderboard/video-edit`	~10	Video editing

Output example

{
    "rank": 1,
    "model": "claude-opus-4-7-thinking",
    "provider": "Anthropic",
    "license": "Proprietary",
    "score": "1503 ±8",
    "eloRating": 1503,
    "confidenceInterval": "±8",
    "votes": "4,924",
    "pricePer1MTokens": "$5 / $25",
    "contextLength": "1M",
    "category": "text",
    "url": "https://arena.ai/leaderboard/text",
    "scrapedAt": "2026-04-23T19:01:06.184Z"
}

Image generation category output (no pricing/context fields):

{
    "rank": 1,
    "model": "gpt-image-2 (medium)",
    "provider": "OpenAI",
    "license": "Proprietary",
    "score": "1507 ±9",
    "eloRating": 1507,
    "confidenceInterval": "±9",
    "votes": "15,391",
    "pricePer1MTokens": null,
    "contextLength": null,
    "category": "text-to-image",
    "url": "https://arena.ai/leaderboard/text-to-image",
    "scrapedAt": "2026-04-23T19:01:06.184Z"
}

Tips and tricks

Scraping all 10 categories in a single run gives you a comprehensive cross-domain view of model capabilities.
Schedule weekly runs to build a historical dataset of how model rankings evolve over time.
Use the provider field to filter results for a specific company (e.g., only Anthropic or OpenAI models).
The votes field indicates how statistically reliable a ranking is — models with more votes have more stable Elo scores.
Compare eloRating and pricePer1MTokens to identify the best price-performance models.
Image and video categories (text-to-image, image-edit, text-to-video, image-to-video, video-edit) do not include pricePer1MTokens or contextLength — these fields will be null.

Integrations

Google Sheets — live model comparison dashboard

Connect the actor's dataset to Google Sheets via the Apify Google Sheets integration. Schedule daily runs to keep a live spreadsheet of model rankings that your team can filter by provider, category, or score range.

Slack — ranking change alerts

Use the Apify Slack integration to post a summary when rankings change. For example, notify your #ai-models channel when a new model enters the top 10 in any category.

Webhooks — feed your internal API

Configure an Apify webhook to POST results to your internal model registry or dashboard API after each run. Use the structured JSON output to update your model selection logic automatically.

n8n / Make / Zapier — automated workflows

Connect this actor to 5,000+ apps via automation platforms. Example: run the scraper weekly, compare results with the previous week's data, and email a summary of ranking changes to your AI team lead.

Airtable — model comparison database

Export results to Airtable to build a searchable model comparison database with filtering by provider, license type, price range, and Elo score.

API usage

Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_API_TOKEN' });

const run = await client.actor('automation-lab/chatbot-arena-scraper').call({
    categories: ['text', 'code'],
    maxResults: 20,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
    console.log(`#${item.rank} ${item.model} (${item.provider}) — Elo: ${item.eloRating}, Votes: ${item.votes}`);
}

Python

from apify_client import ApifyClient

client = ApifyClient(token="YOUR_APIFY_API_TOKEN")

run = client.actor("automation-lab/chatbot-arena-scraper").call(run_input={
    "categories": ["text", "code"],
    "maxResults": 20,
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"#{item['rank']} {item['model']} ({item['provider']}) — Elo: {item['eloRating']}, Votes: {item['votes']}")

cURL

# Start the actor run
curl -X POST \
  "https://api.apify.com/v2/acts/automation-lab~chatbot-arena-scraper/runs?token=YOUR_APIFY_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "categories": ["text"],
    "maxResults": 20
  }'

# Get results (replace DATASET_ID from the response above)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_APIFY_API_TOKEN"

Use with Claude AI (MCP)

This actor is available as a tool in Claude AI through the Model Context Protocol (MCP). Add it to Claude Desktop, Cursor, Windsurf, or any MCP-compatible client.

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com"

Setup for Claude Desktop, Cursor, or VS Code

Add this to your MCP config file:

{
    "mcpServers": {
        "apify": {
            "url": "https://mcp.apify.com"
        }
    }
}

Example prompts

"Get the top 20 LLMs from Chatbot Arena and compare their Elo scores and pricing."
"Which open-source models rank highest on the Chatbot Arena code leaderboard?"
"Scrape the Chatbot Arena vision and text-to-image leaderboards and show me which providers dominate each category."

Learn more in the Apify MCP documentation.

Legality

This actor scrapes publicly available leaderboard data from arena.ai. The data is publicly accessible without authentication, and the actor makes standard HTTP requests at a reasonable rate (one request per category).

The scraped data consists of aggregated benchmark results (model rankings, scores, vote counts) that are freely shared by the Chatbot Arena project for public use. No personal data or copyrighted content is collected.

For more information about web scraping legality, see the Apify web scraping legality guide.

FAQ

Q: How often is the Chatbot Arena leaderboard updated? A: The leaderboard updates continuously as users submit new votes through arena.ai battles. Rankings can shift daily, especially for newer models with fewer votes. Schedule regular runs to track changes.

Q: Why are pricePer1MTokens and contextLength null for some categories? A: Image and video categories (text-to-image, image-edit, text-to-video, image-to-video, video-edit) do not include pricing or context window columns on arena.ai. These fields are only available for text-based categories (text, code, vision, document, search).

Q: The actor returned 0 results for a category. What happened? A: This can happen if arena.ai temporarily changes their HTML structure, if the category page is down, or if your proxy was blocked. Check the run log for error messages. Try running again — transient network issues are usually resolved on retry.

Q: Can I scrape all categories at once? A: Yes. Set categories to an empty array or include all 10 categories in the list. A full scrape across all categories typically extracts 700-1,000+ models and completes in under 30 seconds.

Q: What does the confidence interval (e.g., ±8) mean? A: The confidence interval reflects the statistical uncertainty in the Elo rating. A smaller interval (e.g., ±3) means the ranking is more stable and based on many votes. A larger interval (e.g., ±20) means the model has fewer votes and its ranking may shift significantly with new data.

YouTube Transcript Scraper — Extract transcripts and captions from YouTube videos in bulk
Google Ads Scraper — Scrape Google Ads results for competitive intelligence
Trustpilot Reviews Scraper — Extract reviews and ratings from Trustpilot business pages
Domain Availability Checker — Check domain name availability in bulk
IBAN Validator — Validate and parse IBAN numbers in batch

LMArena LLM Leaderboard Scraper

jungle_synthesizer/lmarena-llm-leaderboard-scraper

Scrape the LMArena (Chatbot Arena) ELO leaderboard — ranks, ratings, vote counts, and confidence intervals across all arena variants (text, code, vision, document, image, video, and more). Returns one row per model per leaderboard variant.

BowTiedRaccoon

GSM Arena Phone Scraper - Cheap 🔍🚀📱

scrapestorm/gsm-arena-phone-scraper---cheap

🔍 Scrape Smartphones at Scale – GSM Arena 📱 Enter a GSM Arena brand URL to collect smartphone data at scale, including phone model, display specs, battery capacity, storage, RAM, release date & product URL 🔗📊 Perfect for mobile market research, e-commerce catalogs & tech trend analysis 🚀

Storm_Scraper

LLM Radar - AI Model Pricing, Benchmarks & Status Actor API

datahq/llm-radar

Real-time pricing for 110+ AI models, live LMSYS Arena ELO scores, and provider operational status from 11 providers. One API call.

DataHQ

Chatbot Builder API

vivid_astronaut/chatbot-builder

Fabio Suizu

Llm Benchmarks

david_flagg/llm-benchmarks

Unified LLM data — pricing, benchmarks, specs, and local deployment info for 300+ models. Compare cost, Open LLM Leaderboard scores, Arena ratings, context lengths, GGUF availability, and VRAM estimates in one dataset.

David Flagg

GSM Arena Phone Scraper - Low-cost💲🔥🚀📱

delectable_incubator/gsm-arena-phone-scraper-low-cost

Scrape smartphone data from GSM Arena 📱🔍 with a powerful mobile device scraper. Extract phone models, display specifications, battery capacity, RAM, storage, release dates, chipset details & URLs. Ideal for mobile market research, tech comparisons, e-commerce catalogs & smartphone trend analysis

Prime Scrape

Pinecone GPT Chatbot

tri_angle/pinecone-gpt-chatbot

Pinecone GPT Chatbot combines OpenAI's GPT models with Pinecone's database to generate insightful responses. Its interactive chatbot interface presents precise and comprehensive answers to user queries. Benefit from semantic understanding, efficient workflows, and enriched knowledge integration!

Tri⟁angle

4.9

No Code Ai Chatbot Blueprint

ellustar/no-code-ai-chatbot-blueprint

No-Code AI Chatbot Blueprint is a ready-to-use actor that helps you design, customize, and deploy AI chatbots without coding. Perfect for customer support, FAQs, and business automation with flexible prompts, data upload options, and scalable workflows.

Ellustar

LLM-Ready Web Scraper

devoted_helix/llm-web-scraper

Convert web pages to clean, LLM-friendly text. Perfect for RAG pipelines, AI chatbot training, and fine-tuning datasets. Removes ads,menus, and clutter automatically.

batuhan senavci

Newsapi Ai

dc-codes426/newsapi-ai

LLM Agent for searching the news. Interact with structured or natural language, and receive responses in structured or natural language. Perfect for a chatbot or for your AI agents that need to look up the news.