Cohere Models Scraper avatar

Cohere Models Scraper

Pricing

Pay per event

Go to Apify Store
Cohere Models Scraper

Cohere Models Scraper

Extract the full Cohere model catalog — Command, Embed, Rerank, Aya, Transcription — with context windows, API IDs, and cloud platform availability (AWS Bedrock, Azure, Oracle OCI).

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share

Scrape the complete Cohere AI model catalog from docs.cohere.com/docs/models — model names, families, context windows, API IDs, capabilities, modalities, and platform availability (Amazon Bedrock, Azure AI Foundry, Oracle OCI).

What does it do?

This actor fetches the official Cohere documentation page and parses all 8 model tables into structured dataset records. Every Cohere model across all families is extracted in a single run:

  • Command — Cohere's flagship chat/instruction models (command-a, command-r, command-r-plus, command-r7b, translate, reasoning, vision)
  • Embed — Text and multimodal embedding models (embed-v4, embed-english/multilingual v3)
  • Rerank — Document re-ranking models (rerank-v4 pro/fast, rerank-v3)
  • Aya — Open-weight multilingual models (tiny-aya series, c4ai-aya-expanse)
  • Transcription — Speech-to-text models (cohere-transcribe)

Each record includes the model's API identifier, context window, maximum output tokens, supported endpoints, embedding dimensions (for embed models), and availability on major cloud platforms.

Who is it for?

  • 🤖 AI engineers evaluating Cohere models for RAG pipelines, agent frameworks, or production deployments — get a quick structured overview without reading through docs
  • 📊 Enterprise AI teams comparing Cohere's catalog across cloud providers (Bedrock, Azure, OCI) for compliance or procurement decisions
  • 🔬 ML researchers tracking model releases and context length improvements across Cohere's families
  • 🏗️ Platform builders maintaining model catalogs in their own tools, databases, or LLMOps dashboards
  • 📈 Competitive analysts monitoring the LLM landscape alongside other providers (groq, mistral, together, openrouter, fireworks, deepinfra)

Why use this scraper?

  • Complete catalog — all 5 model families (Command, Embed, Rerank, Aya, Transcription) in one run
  • Structured data — flat output ready for spreadsheets, databases, and APIs
  • Platform availability — Amazon Bedrock model IDs, Azure AI Foundry names, Oracle OCI identifiers
  • No API key needed — data comes from public documentation, no Cohere account required
  • Always fresh — run on-demand or on a schedule to track model updates
  • Fast — completes in under 30 seconds with 256 MB memory

What data does it extract?

FieldDescriptionExample
modelIdCohere API model identifiercommand-a-03-2025
familyModel familyCommand
statusAvailability statusLive, Deprecated Sept 15, 2025
descriptionOfficial model descriptionCommand A is our most performant...
modalitySupported input modalitiesText, Images
contextLengthContext window size256k
maxOutputTokensMaximum generation length8k
endpointsSupported API endpointsChat
dimensionsEmbedding vector size (embed models)1024
similarityMetricDistance metric (embed models)Cosine Similarity
maxFileSizeInput file limit (transcription)25MB
amazonBedrockModelIdModel ID on Amazon Bedrockcohere.command-r-plus-v1:0
amazonSageMakerSageMaker availabilityUnique per deployment
azureAIFoundryModel name on Azure AIUnique per deployment
oracleOCIOracle OCI model identifiercohere.command-a-03-2025
sourceUrlDocumentation page URLhttps://docs.cohere.com/docs/models

How much does it cost to scrape Cohere models?

This actor uses Pay-Per-Event (PPE) pricing. You are charged a small start fee plus a per-model fee for each result.

PlanStart feePer model32 models total
FREE$0.005$0.00115~$0.04
BRONZE$0.005$0.001~$0.04
SILVER$0.005$0.00078~$0.03
GOLD$0.005$0.0006~$0.02
PLATINUM$0.005$0.0004~$0.02
DIAMOND$0.005$0.00028~$0.01

The complete Cohere model catalog (32 models) costs less than $0.05 per run. On higher-tier plans, it's under $0.02.

The Apify Free plan includes $5 of monthly usage — enough for 100+ complete catalog runs at no charge.

How to use it

  1. Go to the actor page on Apify Store
  2. Click Try for free
  3. Click Start — no input configuration needed (the actor has no required fields)
  4. Wait ~15–30 seconds for the run to complete
  5. Download results from the Dataset tab as JSON, CSV, or Excel

Input parameters

ParameterTypeDefaultDescription
maxRequestRetriesInteger3Number of retry attempts for failed HTTP requests

The actor requires no configuration for a default run. Adjust maxRequestRetries only if you experience network issues.

Output format

Each dataset item represents one Cohere model. Results are flat JSON objects — no nested structures — ready for direct use in spreadsheets, APIs, and databases.

Example output item:

{
"modelId": "command-a-03-2025",
"family": "Command",
"status": "Live",
"description": "Command A is our most performant model to date, excelling at tool use, agents, retrieval augmented generation (RAG), and multilingual use cases.",
"modality": "Text",
"contextLength": "256k",
"maxOutputTokens": "8k",
"endpoints": "Chat",
"dimensions": null,
"similarityMetric": null,
"maxFileSize": null,
"amazonBedrockModelId": "(Coming Soon)",
"amazonSageMaker": "Unique per deployment",
"azureAIFoundry": "Unique per deployment",
"oracleOCI": "cohere.command-a-03-2025",
"sourceUrl": "https://docs.cohere.com/docs/models"
}

Tips and best practices

  • 💡 Schedule weekly runs to automatically track Cohere model releases and deprecations
  • 💡 Filter by family in post-processing to build separate lists (e.g., only embed models for a RAG tool selector)
  • 💡 Join with pricing data from the Cohere pricing page to build a complete cost comparison table
  • 💡 Combine with other LLM scrapers — use our Groq Models Scraper, Mistral Models Scraper, and others to build a unified multi-provider catalog
  • 💡 Export to Google Sheets using the Apify Google Sheets integration for a live-updating model comparison table

Integrations

📅 Scheduled model catalog refresh

Set up a weekly schedule in Apify Console to keep your model database fresh:

  1. Open the actor → SchedulesAdd Schedule
  2. Set cron: 0 9 * * 1 (every Monday at 9am)
  3. Connect the dataset output to your pipeline via webhook or Apify API

🔗 Google Sheets live catalog

Use the Apify Google Sheets integration to push model data into a spreadsheet:

  1. Run this actor
  2. Add a Google Sheets integration in the actor's Integrations tab
  3. Map fields to columns — get a live-updating Cohere model catalog in your team's shared sheet

🗃️ Database sync

Combine with Apify's REST API to sync the model catalog into your database on every run:

// Fetch latest run dataset via Apify API
const response = await fetch(
`https://api.apify.com/v2/actor-tasks/{TASK_ID}/runs/last/dataset/items`,
{ headers: { Authorization: `Bearer ${APIFY_TOKEN}` } }
);
const models = await response.json();
// Insert/upsert into your DB

🤖 LLM selection pipeline

Use this actor to power a dynamic LLM selector in your application — always show the latest available Cohere models without hardcoding:

import requests
def get_latest_cohere_models():
url = "https://api.apify.com/v2/acts/automation-lab~cohere-models-scraper/runs/last/dataset/items"
headers = {"Authorization": f"Bearer {APIFY_TOKEN}"}
resp = requests.get(url, headers=headers)
models = resp.json()
return {m['modelId']: m for m in models if m['status'] == 'Live'}

Data freshness

The Cohere model catalog changes periodically. Cohere typically:

  • Adds new models every 1–3 months (Command refreshes, new Embed/Rerank versions)
  • Deprecates older models with a notice period (e.g., "Deprecated Sept 15, 2025")
  • Updates platform availability (Bedrock, Azure, OCI) separately from core model releases

Recommended refresh schedules:

Use caseRecommended schedule
Model selector in production appWeekly
Research / analysisMonthly
One-off auditOn-demand

The status field captures the current lifecycle state. Models with "Live" status are currently available via the Cohere API. Models with a deprecation date in status will stop working after that date.

Sample use cases

Here are concrete workflows developers build with this data:

🔍 Model selection assistant

Build a chatbot that recommends the right Cohere model based on requirements (context length, modality, platform): fetch the catalog, filter by requirements, return top matches.

📋 Internal model registry

Keep an internal database of approved LLM models. Run this scraper weekly, compare with last week's snapshot, and alert your team when new models are added or deprecated.

💰 Cost calculator

Combine context length and endpoint data with Cohere's pricing page to build a cost estimator: "If I process 1M tokens with embed-v4.0 vs embed-english-v3.0, what's the price difference?"

📊 Competitive intelligence dashboard

Schedule multiple LLM provider scrapers (Cohere, Groq, Mistral, Together AI) to run daily and feed results into a dashboard that tracks who has the longest context windows, newest models, and broadest cloud availability.

API usage

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('automation-lab/cohere-models-scraper').call({});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Fetched ${items.length} Cohere models`);

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("automation-lab/cohere-models-scraper").call(run_input={})
items = client.dataset(run["defaultDatasetId"]).list_items().items
print(f"Fetched {len(items)} Cohere models")

cURL

# Start a run
curl -X POST "https://api.apify.com/v2/acts/automation-lab~cohere-models-scraper/runs" \
-H "Authorization: Bearer YOUR_APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{}'
# Fetch results (replace RUN_ID)
curl "https://api.apify.com/v2/actor-runs/RUN_ID/dataset/items" \
-H "Authorization: Bearer YOUR_APIFY_TOKEN"

Use with MCP (Claude, Cursor, VS Code)

You can use this actor directly in Claude Code, Claude Desktop, Cursor, or VS Code via the Apify MCP server.

Claude Code (CLI)

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/cohere-models-scraper"

Claude Desktop / Cursor / VS Code (JSON config)

{
"mcpServers": {
"apify": {
"type": "http",
"url": "https://mcp.apify.com?tools=automation-lab/cohere-models-scraper",
"headers": {
"Authorization": "Bearer YOUR_APIFY_TOKEN"
}
}
}
}

Example prompts for Claude

Once connected:

  • "Run the Cohere Models Scraper and show me all live Command models with their context lengths"
  • "Which Cohere models are available on Amazon Bedrock? Run the scraper and filter the results"
  • "Compare the embed models — show modelId, dimensions, and context length in a table"
  • "Run the scraper and list all deprecated models with their deprecation dates"

Legality

Yes. This actor scrapes docs.cohere.com/docs/models, which is publicly accessible documentation provided by Cohere for the purpose of helping developers use their platform.

The actor:

  • Makes only a single HTTP GET request to a public documentation page
  • Does not bypass any authentication, paywalls, or rate limits
  • Does not collect user data or personal information
  • Does not interfere with Cohere's systems

Always review Cohere's Terms of Service and their documentation's terms before using this data commercially.

FAQ

Q: How often does the Cohere model catalog change? A: Cohere typically releases or deprecates models every 1–3 months. We recommend running the scraper weekly if you need to stay current. You can set up an automatic schedule in Apify Console.

Q: Why does some amazonBedrockModelId show "(Coming Soon)" instead of null? A: The value "(Coming Soon)" is preserved as-is from the Cohere docs because it is meaningful — it tells you the model will be available on Bedrock soon. Only "N/A" values are returned as null.

Q: What does "Unique per deployment" mean for SageMaker/Azure fields? A: It means the model is available on that platform, but the model ID is assigned uniquely when you deploy it (not a static string). Check the platform documentation for the deployment process.

Q: The run failed — what should I do? A: Check the run log for HTTP errors. If the Cohere docs page is temporarily unavailable, retry in a few minutes. If the page structure has changed (rare), please report the issue by contacting us.

Q: Can I filter for only live/active models? A: Yes — filter by status === 'Live' in your downstream processing. Models with null status (like rerank models which don't track status separately) are also operational.

Combine with other LLM provider scrapers from Automation Lab to build a complete AI model comparison tool: