Unified.to
All articles

How to Get a Cohere API Key — and Connect It to Your Product


February 20, 2026

Updated May 2026

Cohere provides enterprise-grade language models and a set of dedicated AI endpoints that go beyond standard text generation. Unlike other providers in this series, Cohere treats embeddings, reranking, and classification as first-class products with their own dedicated endpoints and models — not just side effects of a general-purpose chat API.

If you're building an AI-native SaaS product, retrieval system, or internal assistant, you'll need a Cohere API key.

This guide covers:

  1. Creating your Cohere account
  2. Generating and securing your API key
  3. Setting up billing
  4. Testing your first API call
  5. Using Cohere through Unified's Generative AI API
  6. Connecting Cohere to SaaS platforms via [Unified MCP](/mcp)

Cohere's Product Surface

Before getting your key, it's worth understanding what Cohere actually offers — because it's structurally different from other LLM providers:

Command models — text generation and conversation via the Chat API. The current flagship is Command A.

Embed models — dedicated embedding endpoint for semantic search, clustering, and RAG pipelines. Cohere's Embed is a native embedding product, not a generic LLM approximating embeddings.

Rerank — a dedicated endpoint for reordering search results by semantic relevance. Designed specifically for search pipelines.

Classify — supervised text classification endpoint.

One API key covers all of these. The endpoint you call determines which product you're using.

Step 1: Create a Cohere Account

Go to https://dashboard.cohere.com and sign up with email, Google, or GitHub.

After verifying your email you'll be redirected to your dashboard. Cohere automatically generates a trial API key when you create an account — you don't need to create one manually for your first key.

Step 2: Locate Your API Key

In the dashboard sidebar, click API Keys.

Your trial key will be visible there. Copy it and store it securely.

Key names cannot contain spaces — use hyphens or underscores when naming keys (e.g., prod-backend, staging_worker).

Key format: Cohere API keys are opaque tokens with no standardized prefix — they don't start with sk- or any identifiable pattern. Store them in a password manager or secrets vault.

Step 3: Create Additional Keys (Optional)

Click Create API Key to generate additional keys for different environments. Name each clearly:

  • development
  • staging
  • production

You can also create keys via the Cohere CLI:

co key create my-prod-key

You must be signed in to the CLI for this command. The same no-spaces rule applies.

To verify any key is valid and active:

curl https://api.cohere.com/v1/check-api-key \
  -X POST \
  -H "Authorization: Bearer $CO_API_KEY"

Step 4: Understand Trial vs Production Keys

Trial keyProduction key
CostFreePay-per-token
Monthly cap1,000 API calls/monthNo cap
Chat RPM20 req/min per model500 req/min (Command A/R family); contact sales for Command A+/A Reasoning
Embed2,000 inputs/min2,000 inputs/min
Rerank10 req/min1,000 req/min
Tokenize100 req/min2,000 req/min
Billing requiredNoYes
Trial keys are suitable for evaluation and development. For production workloads, upgrade to a production key by adding a payment method in Billing → Add payment method.

Step 5: Store Your Key Securely

Cohere's Python SDK reads CO_API_KEY by default — use that as your canonical environment variable name:

# macOS / Linux
export CO_API_KEY="your-api-key"

# Windows PowerShell
setx CO_API_KEY "your-api-key"

Note on environment variable names: CO_API_KEY is the canonical name that Cohere's SDK auto-detects. COHERE_API_KEY also works but is a fallback — you'd need to pass it explicitly. For maximum compatibility with Cohere's own tooling, standardize on CO_API_KEY.

For production: use a secrets manager — AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault. Never commit keys to Git or embed them in frontend code.

Step 6: Test Your API Key

Cohere's current SDK is ClientV2 with co.chat(). The old co.generate() pattern is deprecated as of August 26, 2025 — don't use it in new code.

Python (ClientV2):

import cohere

co = cohere.ClientV2()  # reads CO_API_KEY automatically

response = co.chat(
    model="command-a-03-2025",
    messages=[
        {"role": "user", "content": "Explain what an API key is in two sentences."}
    ]
)
print(response.message.content[0].text)

curl:

curl https://api.cohere.com/v1/chat \
  -H "Authorization: Bearer $CO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "command-a-03-2025",
    "messages": [
      {"role": "user", "content": "Explain what an API key is in two sentences."}
    ]
  }'

A successful response returns the model output in JSON.

Which Cohere Model Should You Use?

Command models (text generation):

ModelAPI stringInput / MTokOutput / MTokBest for
Command Acommand-a-03-2025Contact salesContact salesRecommended default. General-purpose, agentic AI, multilingual.
Command R+ 08-2024command-r-plus-08-2024$2.50$10.00Complex RAG, multi-step tool use, long-context tasks.
Command R 08-2024command-r-08-2024$0.15$0.60Cost-sensitive workloads. RAG and tool use at lower cost.
Command R7B 12-2024command-r7b-12-2024lowerlowerLightweight, high-volume workloads.
Embedding models:
ModelAPI stringPriceBest for
Embed v3 (English)embed-english-v3.0$0.10/MTokSemantic search, clustering, RAG retrieval
Embed v3 (Multilingual)embed-multilingual-v3.0$0.10/MTokSame as above, cross-language
Rerank:
  • $2.00 per 1,000 search queries — use for reordering search results by semantic relevance

Prices per million tokens as of May 2026. Verify at cohere.com/pricing before production budget planning. Command A+ and some specialist variants are contact-sales only.

Important: always use explicit, versioned model IDs — command-a-03-2025 not command-a. Unversioned aliases are deprecated. Verify current IDs via GET https://api.cohere.com/v1/models.

Migrating from co.generate()****: if your code still uses co.generate(), migrate to co.chat() with a Command model. Cohere's Generate API was deprecated August 26, 2025. See Cohere's "Migrating from Generate to Chat" docs.

Is the Cohere API Free?

Yes — trial keys are free with no credit card required. Limits:

  • 1,000 API calls per month across all endpoints
  • 20 RPM for Command models
  • 10 RPM for Rerank
  • 2,000 inputs/min for Embed

Trial keys are suitable for evaluation and development. For production, upgrade to a production key by adding billing.

Security Best Practices

  • Use CO_API_KEY as the canonical env var — it's what Cohere's SDK reads by default
  • Use separate keys per environment — dev, staging, and prod should never share a key
  • Never expose keys in frontend code — all Cohere API calls must go through a trusted backend
  • Rotate keys quarterly — and immediately if you suspect compromise
  • Monitor usage — watch for unexpected spikes in the dashboard
  • Revoke unused keys — unused keys are an unnecessary attack surface
  • Use a secrets manager in production — AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault

Rate Limits

Cohere enforces rate limits by key type, endpoint, and model. The dimensions are:

  • Requests per second / minute — per endpoint and model
  • Monthly call cap — 1,000 calls/month for trial keys; no cap for production

When you exceed limits you receive a 429 Too Many Requests response. Implement exponential backoff.

Check Cohere's "Different Types of API Keys and Rate Limits" docs for current per-model limits — specific numbers can change as Cohere adjusts capacity.

How Cohere API Keys Compare to Other LLM Providers

ProviderAuth headerEndpointEnv varKey formatOpenAI SDK compatible?
OpenAIAuthorization: Bearerapi.openai.com/v1OPENAI_API_KEYsk- / sk-proj-✓ native
Anthropic Claudex-api-keyapi.anthropic.com/v1ANTHROPIC_API_KEYsk-ant-
Google Geminix-goog-api-keygenerativelanguage.googleapis.com/v1betaGEMINI_API_KEYAIza...
X.ai GrokAuthorization: Bearerapi.x.ai/v1XAI_API_KEYxai-...
GroqAuthorization: Bearerapi.groq.com/openai/v1GROQ_API_KEYgsk_...
Mistral AIAuthorization: Bearerapi.mistral.ai/v1MISTRAL_API_KEYopaque token
CohereAuthorization: Bearerapi.cohere.com/v1CO_API_KEYopaque token✗ (own SDK)
DeepSeekAuthorization: Bearerapi.deepseek.com/v1DEEPSEEK_API_KEYsk-...
Hugging FaceAuthorization: Bearerapi-inference.huggingface.coHF_TOKENhf_...partial (chat only)
Azure OpenAIapi-key{resource}.openai.azure.comAZURE_OPENAI_API_KEYnot publicly documented✓ (via Azure SDK)
AnyScale¹Authorization: Bearerapi.endpoints.anyscale.com/v1ANYSCALE_API_KEYesecret_...
¹ Effective August 1, 2024, Anyscale Endpoints is available exclusively via the Hosted Anyscale Platform; multi-tenant LLM access was removed.

Cohere is the only provider in this series with a dedicated embedding endpoint, a dedicated reranking endpoint, and a dedicated classification endpoint as first-class products. If embeddings or search ranking are core to your product, Cohere's Embed and Rerank are native solutions — not a generic model approximating these capabilities.

Using Cohere in a Multi-Model Architecture

Most AI-native SaaS teams don't rely on a single provider long term. Common reasons: cost optimization, fallback when one provider degrades, model specialization, enterprise customer preference.

Instead of building separate integrations for Cohere, OpenAI, Anthropic, Gemini, and Mistral, you can integrate once against Unified's Generative AI API.

Build once across LLM providers

Unified's Generative AI API standardizes three core objects across providers — including Cohere:

Model — id, max_tokens, temperature support

Prompt — model_id, messages, temperature, max_tokens, responses, tokens_used

Embedding — model_id, content, dimension, embeddings, tokens_used

The Embedding object maps directly to Cohere's native Embed endpoint — when you call Unified for embeddings with a Cohere connection, you're calling Cohere's dedicated embedding service, not a generic model approximating embeddings.

This allows you to:

  • Switch between Cohere and other providers without rewriting integration code
  • Run the same prompt across models and compare outputs
  • Route requests dynamically based on cost or availability
  • Generate embeddings consistently across providers using Cohere's native Embed product

Let Cohere take action via Unified MCP

Text generation is one layer. Production AI features require structured, authorized reads and writes against customer SaaS platforms — listing CRM deals, retrieving candidates, fetching files, updating records, writing notes.

Unified's MCP server connects Cohere to customer integrations through tool-calling.

High-level flow:

  1. Fetch tools formatted for Cohere: GET /tools?type=cohere
  2. Include tools in your co.chat() request
  3. When Cohere returns a tool call, execute it: POST /tools/{id}/call
  4. Return the tool result back to Cohere

MCP URLs (regional):

  • https://mcp-api.unified.to/mcp
  • https://mcp-api-eu.unified.to/mcp

MCP safety controls:

  • hide_sensitive=true — removes PII fields from tool results
  • permissions=... — restricts what the connection can do
  • tools=... — limits the tool set to reduce model overload
  • defer_tools — lowers tool token usage

LLMs have tool limits. Scoping is not optional — it's required for stable deployments.

Troubleshooting Common Errors

401 Unauthorized Key is invalid, inactive, or revoked. Check the Authorization: Bearer format and look for trailing whitespace. Regenerate at dashboard.cohere.com under API Keys if needed.

400 Bad Request Request payload is malformed. Most common cause: using the generate() endpoint pattern instead of chat(), or wrong role fields in the messages array (e.g., using "system" where "user" is required).

404 Not Found Model string is wrong or the endpoint URL is incorrect. Avoid unversioned aliases — use explicit IDs like command-a-03-2025. Verify current IDs via GET /v1/models.

429 Too Many Requests Rate limit exceeded. On trial keys, the 1,000 calls/month cap applies globally. Implement exponential backoff. Upgrade to a production key for higher limits.

co.generate() DeprecationWarning You're using the old v1 SDK pattern. Migrate to ClientV2 + co.chat(). The Generate API was deprecated August 26, 2025.

FAQ

Is the Cohere API free? Yes — trial keys are free with 1,000 API calls/month and no credit card required. For production workloads, add billing to upgrade to a production key.

What does a Cohere API key look like? An opaque token with no standardized prefix — it won't start with sk- or any identifiable pattern.

Which environment variable should I use — CO_API_KEY or COHERE_API_KEY****? Use CO_API_KEY — it's the canonical name that Cohere's Python SDK reads automatically. COHERE_API_KEY works as a fallback if you pass it explicitly, but isn't the default convention.

Why shouldn't I use co.generate()****? The Generate API was deprecated on August 26, 2025. Use ClientV2 + co.chat() with a versioned Command model ID instead. See Cohere's "Migrating from Generate to Chat" docs.

What's the difference between Cohere and other LLM providers? Cohere is the only provider in this series with dedicated, first-class endpoints for embeddings (Embed), search ranking (Rerank), and classification (Classify). Other providers offer embeddings as a side capability; Cohere built native products for each use case.

Can I use the OpenAI SDK with Cohere? Not directly. Cohere has its own SDK and API shape. Use cohere.ClientV2() or the raw REST API.

What's the trial key monthly limit? 1,000 API calls per month across all endpoints — Chat, Embed, Rerank, Classify combined. For higher limits, upgrade to a production key.

How do I verify my key is working? Call POST https://api.cohere.com/v1/check-api-key with your key in the Authorization: Bearer header. A 2xx response confirms the key is valid and active.

How do I set CO_API_KEY in Python?

export CO_API_KEY="your-api-key"

Then in Python:

import cohere

co = cohere.ClientV2()  # reads CO_API_KEY automatically

response = co.chat(
    model="command-a-03-2025",
    messages=[{"role": "user", "content": "Hello, Cohere!"}]
)
print(response.message.content[0].text)

Key takeaway

Calling Cohere directly is straightforward. Building multi-model routing, native embedding pipelines, search-ranking workflows, and enterprise-grade SaaS integrations requires infrastructure.

Unified was built for AI-native SaaS teams that need:

  • Real-time data access
  • Unified pass-through architecture
  • Zero storage of customer data
  • Usage-based pricing aligned with API volume
  • MCP-compatible integration across 450+ integrations

Cohere generates intelligence — and with its native Embed and Rerank endpoints, powers the retrieval layer too. Unified connects both to structured SaaS data and authorized actions.

→ Start your 30-day free trial

→ Book a demo

All articles