Pricing

from $1.00 / 1,000 usage units

Multi-Model LLM Compare - GPT, Claude, Gemini & Llama

Run one prompt across many language models at once and get every response side by side, with token counts, cost, and latency per model in a single dataset.

Pricing

from $1.00 / 1,000 usage units

Rating

0.0

(0)

Developer

Andrew

Actor stats

Bookmarked

Total users

Monthly active users

12 days ago

Last modified

Multi-Model LLM Compare

Send one prompt to many language models at once and get every answer side by side in a single table. Compare GPT, Claude, Gemini, Llama, DeepSeek, Kimi, Qwen, GLM, and 300+ more models on quality, speed, and cost, without juggling separate accounts or API keys.

What you get

One row per model, with:

The full text response from each model
Prompt tokens, completion tokens, and total tokens
Latency in milliseconds and the provider that served it
Finish reason, status, and any error message

Use cases

Pick the best and cheapest model for a task before you commit to it in production
Benchmark answer quality, speed, and price across providers on your own prompts
A/B test a system prompt across models
Build evaluation datasets for prompt engineering

How to use

Enter your Prompt
List the Models to compare, for example openai/gpt-4o-mini and anthropic/claude-haiku-4.5. Leave it empty to use a default cross-provider set
Optionally add a System prompt, a Temperature, and a Max output tokens cap
Run the actor. Each model becomes one row in the Dataset tab, ready to sort by cost, speed, or token count

Input example

{
  "prompt": "Explain quantum entanglement to a 10 year old in 3 sentences.",
  "models": [
    "openai/gpt-4o-mini",
    "anthropic/claude-haiku-4.5",
    "google/gemini-2.5-flash",
    "meta-llama/llama-3.3-70b-instruct"
  ],
  "maxTokens": 300
}

Output example

{
  "prompt": "Explain quantum entanglement to a 10 year old in 3 sentences.",
  "model": "google/gemini-2.5-flash",
  "provider": "Google",
  "response": "Imagine you have two special coins that always land on the same side...",
  "promptTokens": 15,
  "completionTokens": 65,
  "totalTokens": 80,
  "latencyMs": 1001,
  "finishReason": "stop",
  "status": "success",
  "error": null
}

Pricing

This actor is usage-based - you pay in proportion to how much work each model call actually does, not a flat fee per response.

Short answers from small models cost very little; long answers from premium models cost more. A quick comparison across a few small models is typically a fraction of a cent each.
You are only charged for models that successfully respond. A model that errors out costs you nothing.
The token counts in each row (prompt, completion, total) show how much each model produced, so you can see what drives your spend and pick the most efficient model for your task.
Tip: set Max output tokens to cap how much each model can generate and keep your spend predictable - billing scales with the amount each model produces.

What it costs in practice

Price depends on which models you pick and how long their answers are. As a rough guide, here's the approximate cost per model response:

Answer length	Small & fast models (Gemini Flash, GPT-4o mini, Llama, DeepSeek)	Mid models (Claude Haiku, Kimi, GLM)	Premium models (Claude Sonnet/Opus, GPT-5, Gemini Pro)
Short (~150 words)	under $0.005	~$0.005-0.01	~$0.02-0.05
Long (~600 words)	~$0.01-0.02	~$0.02-0.04	~$0.06-0.15

A few example runs:

Quick check - the 4 default models (Gemini Flash, GPT-4o mini, Claude Haiku, Llama) on a short answer → about $0.01-0.02 for the whole run
One premium model, detailed answer - Claude Opus on a ~600-word response → about $0.12-0.15
Broad comparison - 6 mixed models on medium answers → roughly $0.10-0.20 total

These are approximate - your actual cost tracks the token counts shown in each row. Lower Max output tokens to bring costs down, and remember you're only billed for models that actually respond.

Free tier

You can try this actor for free, with limits:

Up to 3 models per run
Up to 512 output tokens per model
3 runs per day, with a 30-minute cooldown between runs

Upgrade to a paid plan to compare up to 12 models at once, set output length up to 8192 tokens, and run without daily limits or cooldowns.

Notes

Each model runs independently, so one failing model never blocks the others. Failed models return a row with status set to error and a short message.

Bulk LLM Runner — GPT, Claude, Gemini, Perplexity (No API Key)

fayoussef/bulk-llm-runner

Run hundreds of prompts in parallel across GPT, Claude, Gemini and Perplexity Sonar — plus 400+ other LLMs — without API key. Built-in web search, PDF reading, vision, JSON output and side-by-side model comparison.

youssef farhan

181

5.0

AI Model Comparison

onescales/ai-model-comparison

Compare responses from multiple AI models side by side and let AI analyze them to deliver the single best answer.

One Scales

299

5.0

AI Model Price Monitor - GPT, Claude, Gemini & 300+ models

genetheaiguy/ai-model-price-monitor

Live per-token API pricing for 300+ AI models — GPT, Claude, Gemini, Llama, Mistral & more — normalized to $/1M tokens, with price-change detection between runs. MCP-ready.

Gene Swank

LLM Token Counter & Cost Estimator (Claude/GPT/Gemini/Llama)

gochujang/llm-token-counter

Count tokens for any text across 16+ models (Claude Opus/Sonnet/Haiku, GPT-4o, o3, Gemini 1.5, Llama, Mistral) and estimate per-million-token cost. Claude via Anthropic API (BYO key), GPT via tiktoken, others via heuristic. $0.001 per text counted.

Hojun Lee

Actor A/B Tester — Compare Two Actors Side by Side

ryanclinton/actor-ab-tester

Run two Apify actors with identical input in parallel and compare results side by side. Measures result count, field coverage, execution speed, and compute cost. Declares a winner with percentage diffs. Returns JSON/CSV/Excel.

Ryan Clinton

Ai Model Pricing Availability

haehnchen/ai-model-pricing-availability

Compare AI model prices across providers. Find where a LLM model is available and compare input/output pricing.

Haehnchen

Apify Actor Comparison Report

agentictools/actor-comparison

Compare two or more Apify Actors side by side on usage, ratings, pricing, and freshness.

Ken Agland

Review & Reputation Intelligence MCP Server

onetapstudio/review-reputation-mcp

Pull reviews from Google, Yelp, and Trustpilot for any business. Get sentiment scores, compare brands side by side, and track what customers are saying across platforms. Built for marketing teams and agencies.

Adam Hartman

Competitive Intelligence MCP Server

onetapstudio/competitive-intelligence-mcp

Monitor competitors across their website, job postings, social media, and tech stack from one tool call. Detect changes, compare up to 5 competitors side by side, and identify what technologies they use.

Adam Hartman