Deprecated

Pricing

Pay per usage

See alternative Actors

Go to Apify Store

Cohere Models Scraper

Deprecated

See alternative Actors

Extract the full Cohere model catalog — Command, Embed, Rerank, Aya, Transcription — with context windows, API IDs, and cloud platform availability (AWS Bedrock, Azure, Oracle OCI).

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

11 days ago

Last modified

What does it do?

This actor fetches the official Cohere documentation page and parses all 8 model tables into structured dataset records. Every Cohere model across all families is extracted in a single run:

Command — Cohere's flagship chat/instruction models (command-a, command-r, command-r-plus, command-r7b, translate, reasoning, vision)
Embed — Text and multimodal embedding models (embed-v4, embed-english/multilingual v3)
Rerank — Document re-ranking models (rerank-v4 pro/fast, rerank-v3)
Aya — Open-weight multilingual models (tiny-aya series, c4ai-aya-expanse)
Transcription — Speech-to-text models (cohere-transcribe)

Each record includes the model's API identifier, context window, maximum output tokens, supported endpoints, embedding dimensions (for embed models), and availability on major cloud platforms.

Who is it for?

🤖 AI engineers evaluating Cohere models for RAG pipelines, agent frameworks, or production deployments — get a quick structured overview without reading through docs
📊 Enterprise AI teams comparing Cohere's catalog across cloud providers (Bedrock, Azure, OCI) for compliance or procurement decisions
🔬 ML researchers tracking model releases and context length improvements across Cohere's families
🏗️ Platform builders maintaining model catalogs in their own tools, databases, or LLMOps dashboards
📈 Competitive analysts monitoring the LLM landscape alongside other providers (groq, mistral, together, openrouter, fireworks, deepinfra)

Why use this scraper?

✅ Complete catalog — all 5 model families (Command, Embed, Rerank, Aya, Transcription) in one run
✅ Structured data — flat output ready for spreadsheets, databases, and APIs
✅ Platform availability — Amazon Bedrock model IDs, Azure AI Foundry names, Oracle OCI identifiers
✅ No API key needed — data comes from public documentation, no Cohere account required
✅ Always fresh — run on-demand or on a schedule to track model updates
✅ Fast — completes in under 30 seconds with 256 MB memory

What data does it extract?

Field	Description	Example
`modelId`	Cohere API model identifier	`command-a-03-2025`
`family`	Model family	`Command`
`status`	Availability status	`Live`, `Deprecated Sept 15, 2025`
`description`	Official model description	`Command A is our most performant...`
`modality`	Supported input modalities	`Text, Images`
`contextLength`	Context window size	`256k`
`maxOutputTokens`	Maximum generation length	`8k`
`endpoints`	Supported API endpoints	`Chat`
`dimensions`	Embedding vector size (embed models)	`1024`
`similarityMetric`	Distance metric (embed models)	`Cosine Similarity`
`maxFileSize`	Input file limit (transcription)	`25MB`
`amazonBedrockModelId`	Model ID on Amazon Bedrock	`cohere.command-r-plus-v1:0`
`amazonSageMaker`	SageMaker availability	`Unique per deployment`
`azureAIFoundry`	Model name on Azure AI	`Unique per deployment`
`oracleOCI`	Oracle OCI model identifier	`cohere.command-a-03-2025`
`sourceUrl`	Documentation page URL	`https://docs.cohere.com/docs/models`

How much does it cost to scrape Cohere models?

This actor uses Pay-Per-Event (PPE) pricing. You are charged a small start fee plus a per-model fee for each result.

Plan	Start fee	Per model	32 models total
FREE	$0.005	$0.00115	~$0.04
BRONZE	$0.005	$0.001	~$0.04
SILVER	$0.005	$0.00078	~$0.03
GOLD	$0.005	$0.0006	~$0.02
PLATINUM	$0.005	$0.0004	~$0.02
DIAMOND	$0.005	$0.00028	~$0.01

The complete Cohere model catalog (32 models) costs less than $0.05 per run. On higher-tier plans, it's under $0.02.

The Apify Free plan includes $5 of monthly usage — enough for 100+ complete catalog runs at no charge.

How to use it

Go to the actor page on Apify Store
Click Try for free
Click Start — no input configuration needed (the actor has no required fields)
Wait ~15–30 seconds for the run to complete
Download results from the Dataset tab as JSON, CSV, or Excel

Input parameters

Parameter	Type	Default	Description
`maxRequestRetries`	Integer	3	Number of retry attempts for failed HTTP requests

The actor requires no configuration for a default run. Adjust maxRequestRetries only if you experience network issues.

Output format

Each dataset item represents one Cohere model. Results are flat JSON objects — no nested structures — ready for direct use in spreadsheets, APIs, and databases.

Example output item:

{
  "modelId": "command-a-03-2025",
  "family": "Command",
  "status": "Live",
  "description": "Command A is our most performant model to date, excelling at tool use, agents, retrieval augmented generation (RAG), and multilingual use cases.",
  "modality": "Text",
  "contextLength": "256k",
  "maxOutputTokens": "8k",
  "endpoints": "Chat",
  "dimensions": null,
  "similarityMetric": null,
  "maxFileSize": null,
  "amazonBedrockModelId": "(Coming Soon)",
  "amazonSageMaker": "Unique per deployment",
  "azureAIFoundry": "Unique per deployment",
  "oracleOCI": "cohere.command-a-03-2025",
  "sourceUrl": "https://docs.cohere.com/docs/models"
}

Tips and best practices

💡 Schedule weekly runs to automatically track Cohere model releases and deprecations
💡 Filter by family in post-processing to build separate lists (e.g., only embed models for a RAG tool selector)
💡 Join with pricing data from the Cohere pricing page to build a complete cost comparison table
💡 Combine with other LLM scrapers — use our Groq Models Scraper, Mistral Models Scraper, and others to build a unified multi-provider catalog
💡 Export to Google Sheets using the Apify Google Sheets integration for a live-updating model comparison table

Integrations

📅 Scheduled model catalog refresh

Set up a weekly schedule in Apify Console to keep your model database fresh:

Open the actor → Schedules → Add Schedule
Set cron: 0 9 * * 1 (every Monday at 9am)
Connect the dataset output to your pipeline via webhook or Apify API

🔗 Google Sheets live catalog

Use the Apify Google Sheets integration to push model data into a spreadsheet:

Run this actor
Add a Google Sheets integration in the actor's Integrations tab
Map fields to columns — get a live-updating Cohere model catalog in your team's shared sheet

🗃️ Database sync

Combine with Apify's REST API to sync the model catalog into your database on every run:

// Fetch latest run dataset via Apify API
const response = await fetch(
  `https://api.apify.com/v2/actor-tasks/{TASK_ID}/runs/last/dataset/items`,
  { headers: { Authorization: `Bearer ${APIFY_TOKEN}` } }
);
const models = await response.json();
// Insert/upsert into your DB

🤖 LLM selection pipeline

Use this actor to power a dynamic LLM selector in your application — always show the latest available Cohere models without hardcoding:

import requests

def get_latest_cohere_models():
    url = "https://api.apify.com/v2/acts/automation-lab~cohere-models-scraper/runs/last/dataset/items"
    headers = {"Authorization": f"Bearer {APIFY_TOKEN}"}
    resp = requests.get(url, headers=headers)
    models = resp.json()
    return {m['modelId']: m for m in models if m['status'] == 'Live'}

Data freshness

The Cohere model catalog changes periodically. Cohere typically:

Adds new models every 1–3 months (Command refreshes, new Embed/Rerank versions)
Deprecates older models with a notice period (e.g., "Deprecated Sept 15, 2025")
Updates platform availability (Bedrock, Azure, OCI) separately from core model releases

Recommended refresh schedules:

Use case	Recommended schedule
Model selector in production app	Weekly
Research / analysis	Monthly
One-off audit	On-demand

The status field captures the current lifecycle state. Models with "Live" status are currently available via the Cohere API. Models with a deprecation date in status will stop working after that date.

Sample use cases

Here are concrete workflows developers build with this data:

🔍 Model selection assistant

Build a chatbot that recommends the right Cohere model based on requirements (context length, modality, platform): fetch the catalog, filter by requirements, return top matches.

📋 Internal model registry

Keep an internal database of approved LLM models. Run this scraper weekly, compare with last week's snapshot, and alert your team when new models are added or deprecated.

💰 Cost calculator

Combine context length and endpoint data with Cohere's pricing page to build a cost estimator: "If I process 1M tokens with embed-v4.0 vs embed-english-v3.0, what's the price difference?"

📊 Competitive intelligence dashboard

Schedule multiple LLM provider scrapers (Cohere, Groq, Mistral, Together AI) to run daily and feed results into a dashboard that tracks who has the longest context windows, newest models, and broadest cloud availability.

API usage

Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('automation-lab/cohere-models-scraper').call({});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Fetched ${items.length} Cohere models`);

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("automation-lab/cohere-models-scraper").call(run_input={})
items = client.dataset(run["defaultDatasetId"]).list_items().items
print(f"Fetched {len(items)} Cohere models")

cURL

# Start a run
curl -X POST "https://api.apify.com/v2/acts/automation-lab~cohere-models-scraper/runs" \
  -H "Authorization: Bearer YOUR_APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{}'

# Fetch results (replace RUN_ID)
curl "https://api.apify.com/v2/actor-runs/RUN_ID/dataset/items" \
  -H "Authorization: Bearer YOUR_APIFY_TOKEN"

Use with MCP (Claude, Cursor, VS Code)

You can use this actor directly in Claude Code, Claude Desktop, Cursor, or VS Code via the Apify MCP server.

Claude Code (CLI)

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/cohere-models-scraper"

Claude Desktop / Cursor / VS Code (JSON config)

{
  "mcpServers": {
    "apify": {
      "type": "http",
      "url": "https://mcp.apify.com?tools=automation-lab/cohere-models-scraper",
      "headers": {
        "Authorization": "Bearer YOUR_APIFY_TOKEN"
      }
    }
  }
}

Example prompts for Claude

Once connected:

"Run the Cohere Models Scraper and show me all live Command models with their context lengths"
"Which Cohere models are available on Amazon Bedrock? Run the scraper and filter the results"
"Compare the embed models — show modelId, dimensions, and context length in a table"
"Run the scraper and list all deprecated models with their deprecation dates"

Legality

Yes. This actor scrapes docs.cohere.com/docs/models, which is publicly accessible documentation provided by Cohere for the purpose of helping developers use their platform.

The actor:

Makes only a single HTTP GET request to a public documentation page
Does not bypass any authentication, paywalls, or rate limits
Does not collect user data or personal information
Does not interfere with Cohere's systems

Always review Cohere's Terms of Service and their documentation's terms before using this data commercially.

FAQ

Q: How often does the Cohere model catalog change? A: Cohere typically releases or deprecates models every 1–3 months. We recommend running the scraper weekly if you need to stay current. You can set up an automatic schedule in Apify Console.

Q: Why does some amazonBedrockModelId show "(Coming Soon)" instead of null? A: The value "(Coming Soon)" is preserved as-is from the Cohere docs because it is meaningful — it tells you the model will be available on Bedrock soon. Only "N/A" values are returned as null.

Q: What does "Unique per deployment" mean for SageMaker/Azure fields? A: It means the model is available on that platform, but the model ID is assigned uniquely when you deploy it (not a static string). Check the platform documentation for the deployment process.

Q: The run failed — what should I do? A: Check the run log for HTTP errors. If the Cohere docs page is temporarily unavailable, retry in a few minutes. If the page structure has changed (rare), please report the issue by contacting us.

Q: Can I filter for only live/active models? A: Yes — filter by status === 'Live' in your downstream processing. Models with null status (like rerank models which don't track status separately) are also operational.

Combine with other LLM provider scrapers from Automation Lab to build a complete AI model comparison tool:

Groq Models Scraper — Groq's ultra-fast inference models
Mistral Models Scraper — Mistral AI models and pricing
Together AI Models Scraper — Together AI's open model catalog
OpenRouter Models Scraper — 200+ models from OpenRouter
Fireworks AI Models Scraper — Fireworks AI model catalog
DeepInfra Models Scraper — DeepInfra's serverless model list
Cloudflare Workers AI Scraper — Workers AI model catalog

OpenRouter AI Model Pricing Scraper

parseforge/openrouter-models-pricing-scraper

Scrape AI model catalog and pricing from OpenRouter public API. Get prompt/completion price per token, context length, modality, top providers, and supported features for 300+ AI models. No API key required.

ParseForge

OpenRouter Model Scraper

datapilot/openrouter-model-scraper

OpenRouter Models Scraper extracts AI model metadata from OpenRouter API, including pricing, context length, providers, modalities, token limits, vision/tool support, JSON support, and model architecture. Supports keyword filtering, proxy rotation, and structured dataset

Data Pilot

Pinecone Integration

apify/pinecone-integration

This integration transfers data from Apify Actors to a Pinecone and is a good starting point for a question-answering, search, or RAG use case.

Apify

555

3.2

(6)

Qdrant Integration

apify/qdrant-integration

Transfer data from Apify Actors to a Qdrant vector database.

Apify

4.7

(2)

Weaviate Integration

apify/weaviate-integration

This integration transfers data from Apify Actors to a Weaviate and is a good starting point for a question-answering, search, or RAG use case.

Apify

4.6

(3)

Rag Embedding Generator

labrat011/rag-embedding-generator

Generate vector embeddings from text or chunked datasets using OpenAI or Cohere. Chains with RAG Content Chunker for end-to-end RAG pipelines. Outputs raw vectors ready for any vector database.

mick_

RAG Pipeline

labrat011/rag-pipeline

One-click RAG pipeline: chunks text, generates embeddings, and stores vectors in Pinecone or Qdrant. Provide your content and API keys -- the orchestrator handles the rest.

mick_

Chroma Integration

apify/chroma-integration

This integration transfers data from Apify Actors to a Chroma and is a good starting point for a question-answering, search, or RAG use case.

Apify

4.5

(2)

bioRxiv + medRxiv Scraper for RAG

devanshlive/biorxiv-medrxiv-rag-extractor

Scrape bioRxiv and medRxiv preprints by server, category, and date range. Returns RAG-ready JSON with JATS full-text chunks (cl100k_base, 512/50) when available and abstract fallback otherwise. Drop-in for LangChain, LlamaIndex, Qdrant, Pinecone, Weaviate, pgvector. $0.02 per preprint.

Devansh Tiwari

PubMed Scraper for RAG: Papers as Chunked JSON

devanshlive/pubmed-rag-extractor

Scrape PubMed citations by search term, MeSH, and article type. Returns RAG-ready JSON with full-text chunks from PMC Open Access (cl100k_base, 512/50) and abstract fallback. Drop-in for LangChain, LlamaIndex, Qdrant, Pinecone, Weaviate, pgvector. Skip GROBID / Pubmed Parser. $0.02 per paper.

Devansh Tiwari

arXiv Scraper for RAG: Papers as Chunked JSON

devanshlive/arxiv-rag-extractor

Scrape arXiv papers by date and category. Strips LaTeX and returns RAG-ready JSON with tokenizer-aware chunks (cl100k_base, 512/50). Drop-in for LangChain, LlamaIndex, Qdrant, Pinecone, Weaviate, pgvector, Chroma. Skip GROBID / Nougat / pandoc. $0.015 per paper.

Devansh Tiwari

Cohere Models Scraper

What does it do?

Who is it for?

Why use this scraper?

What data does it extract?

How much does it cost to scrape Cohere models?

How to use it

Input parameters

Output format

Tips and best practices

Integrations

📅 Scheduled model catalog refresh

🔗 Google Sheets live catalog

🗃️ Database sync

🤖 LLM selection pipeline

Data freshness

Sample use cases

🔍 Model selection assistant

📋 Internal model registry

💰 Cost calculator

📊 Competitive intelligence dashboard

API usage

Node.js

Python

cURL

Use with MCP (Claude, Cursor, VS Code)

Claude Code (CLI)

Claude Desktop / Cursor / VS Code (JSON config)

Example prompts for Claude

Legality

FAQ

Related scrapers

You might also like

OpenRouter AI Model Pricing Scraper

OpenRouter Model Scraper

Pinecone Integration

Qdrant Integration

Weaviate Integration

Rag Embedding Generator

RAG Pipeline

Chroma Integration

bioRxiv + medRxiv Scraper for RAG

PubMed Scraper for RAG: Papers as Chunked JSON

arXiv Scraper for RAG: Papers as Chunked JSON