How to Get a Cohere API Key — and Connect It to Your Product
February 20, 2026
Updated May 2026
Cohere provides enterprise-grade language models and a set of dedicated AI endpoints that go beyond standard text generation. Unlike other providers in this series, Cohere treats embeddings, reranking, and classification as first-class products with their own dedicated endpoints and models — not just side effects of a general-purpose chat API.
If you're building an AI-native SaaS product, retrieval system, or internal assistant, you'll need a Cohere API key.
This guide covers:
- Creating your Cohere account
- Generating and securing your API key
- Setting up billing
- Testing your first API call
- Using Cohere through Unified's Generative AI API
- Connecting Cohere to SaaS platforms via [Unified MCP](/mcp)
Cohere's Product Surface
Before getting your key, it's worth understanding what Cohere actually offers — because it's structurally different from other LLM providers:
Command models — text generation and conversation via the Chat API. The current flagship is Command A.
Embed models — dedicated embedding endpoint for semantic search, clustering, and RAG pipelines. Cohere's Embed is a native embedding product, not a generic LLM approximating embeddings.
Rerank — a dedicated endpoint for reordering search results by semantic relevance. Designed specifically for search pipelines.
Classify — supervised text classification endpoint.
One API key covers all of these. The endpoint you call determines which product you're using.
Step 1: Create a Cohere Account
Go to https://dashboard.cohere.com and sign up with email, Google, or GitHub.
After verifying your email you'll be redirected to your dashboard. Cohere automatically generates a trial API key when you create an account — you don't need to create one manually for your first key.
Step 2: Locate Your API Key
In the dashboard sidebar, click API Keys.
Your trial key will be visible there. Copy it and store it securely.
Key names cannot contain spaces — use hyphens or underscores when naming keys (e.g., prod-backend, staging_worker).
Key format: Cohere API keys are opaque tokens with no standardized prefix — they don't start with sk- or any identifiable pattern. Store them in a password manager or secrets vault.
Step 3: Create Additional Keys (Optional)
Click Create API Key to generate additional keys for different environments. Name each clearly:
developmentstagingproduction
You can also create keys via the Cohere CLI:
co key create my-prod-key
You must be signed in to the CLI for this command. The same no-spaces rule applies.
To verify any key is valid and active:
curl https://api.cohere.com/v1/check-api-key \
-X POST \
-H "Authorization: Bearer $CO_API_KEY"
Step 4: Understand Trial vs Production Keys
| Trial key | Production key | |
|---|---|---|
| Cost | Free | Pay-per-token |
| Monthly cap | 1,000 API calls/month | No cap |
| Chat RPM | 20 req/min per model | 500 req/min (Command A/R family); contact sales for Command A+/A Reasoning |
| Embed | 2,000 inputs/min | 2,000 inputs/min |
| Rerank | 10 req/min | 1,000 req/min |
| Tokenize | 100 req/min | 2,000 req/min |
| Billing required | No | Yes |
| Trial keys are suitable for evaluation and development. For production workloads, upgrade to a production key by adding a payment method in Billing → Add payment method. |
Step 5: Store Your Key Securely
Cohere's Python SDK reads CO_API_KEY by default — use that as your canonical environment variable name:
# macOS / Linux
export CO_API_KEY="your-api-key"
# Windows PowerShell
setx CO_API_KEY "your-api-key"
Note on environment variable names: CO_API_KEY is the canonical name that Cohere's SDK auto-detects. COHERE_API_KEY also works but is a fallback — you'd need to pass it explicitly. For maximum compatibility with Cohere's own tooling, standardize on CO_API_KEY.
For production: use a secrets manager — AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault. Never commit keys to Git or embed them in frontend code.
Step 6: Test Your API Key
Cohere's current SDK is ClientV2 with co.chat(). The old co.generate() pattern is deprecated as of August 26, 2025 — don't use it in new code.
Python (ClientV2):
import cohere
co = cohere.ClientV2() # reads CO_API_KEY automatically
response = co.chat(
model="command-a-03-2025",
messages=[
{"role": "user", "content": "Explain what an API key is in two sentences."}
]
)
print(response.message.content[0].text)
curl:
curl https://api.cohere.com/v1/chat \
-H "Authorization: Bearer $CO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "command-a-03-2025",
"messages": [
{"role": "user", "content": "Explain what an API key is in two sentences."}
]
}'
A successful response returns the model output in JSON.
Which Cohere Model Should You Use?
Command models (text generation):
| Model | API string | Input / MTok | Output / MTok | Best for |
|---|---|---|---|---|
| Command A | command-a-03-2025 | Contact sales | Contact sales | Recommended default. General-purpose, agentic AI, multilingual. |
| Command R+ 08-2024 | command-r-plus-08-2024 | $2.50 | $10.00 | Complex RAG, multi-step tool use, long-context tasks. |
| Command R 08-2024 | command-r-08-2024 | $0.15 | $0.60 | Cost-sensitive workloads. RAG and tool use at lower cost. |
| Command R7B 12-2024 | command-r7b-12-2024 | lower | lower | Lightweight, high-volume workloads. |
| Embedding models: |
| Model | API string | Price | Best for |
|---|---|---|---|
| Embed v3 (English) | embed-english-v3.0 | $0.10/MTok | Semantic search, clustering, RAG retrieval |
| Embed v3 (Multilingual) | embed-multilingual-v3.0 | $0.10/MTok | Same as above, cross-language |
| Rerank: |
$2.00 per 1,000 search queries— use for reordering search results by semantic relevance
Prices per million tokens as of May 2026. Verify at cohere.com/pricing before production budget planning. Command A+ and some specialist variants are contact-sales only.
Important: always use explicit, versioned model IDs — command-a-03-2025 not command-a. Unversioned aliases are deprecated. Verify current IDs via GET https://api.cohere.com/v1/models.
Migrating from co.generate()****: if your code still uses co.generate(), migrate to co.chat() with a Command model. Cohere's Generate API was deprecated August 26, 2025. See Cohere's "Migrating from Generate to Chat" docs.
Is the Cohere API Free?
Yes — trial keys are free with no credit card required. Limits:
- 1,000 API calls per month across all endpoints
- 20 RPM for Command models
- 10 RPM for Rerank
- 2,000 inputs/min for Embed
Trial keys are suitable for evaluation and development. For production, upgrade to a production key by adding billing.
Security Best Practices
- Use
CO_API_KEYas the canonical env var — it's what Cohere's SDK reads by default - Use separate keys per environment — dev, staging, and prod should never share a key
- Never expose keys in frontend code — all Cohere API calls must go through a trusted backend
- Rotate keys quarterly — and immediately if you suspect compromise
- Monitor usage — watch for unexpected spikes in the dashboard
- Revoke unused keys — unused keys are an unnecessary attack surface
- Use a secrets manager in production — AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault
Rate Limits
Cohere enforces rate limits by key type, endpoint, and model. The dimensions are:
- Requests per second / minute — per endpoint and model
- Monthly call cap — 1,000 calls/month for trial keys; no cap for production
When you exceed limits you receive a 429 Too Many Requests response. Implement exponential backoff.
Check Cohere's "Different Types of API Keys and Rate Limits" docs for current per-model limits — specific numbers can change as Cohere adjusts capacity.
How Cohere API Keys Compare to Other LLM Providers
| Provider | Auth header | Endpoint | Env var | Key format | OpenAI SDK compatible? |
|---|---|---|---|---|---|
| OpenAI | Authorization: Bearer | api.openai.com/v1 | OPENAI_API_KEY | sk- / sk-proj- | ✓ native |
| Anthropic Claude | x-api-key | api.anthropic.com/v1 | ANTHROPIC_API_KEY | sk-ant- | ✗ |
| Google Gemini | x-goog-api-key | generativelanguage.googleapis.com/v1beta | GEMINI_API_KEY | AIza... | ✗ |
| X.ai Grok | Authorization: Bearer | api.x.ai/v1 | XAI_API_KEY | xai-... | ✓ |
| Groq | Authorization: Bearer | api.groq.com/openai/v1 | GROQ_API_KEY | gsk_... | ✓ |
| Mistral AI | Authorization: Bearer | api.mistral.ai/v1 | MISTRAL_API_KEY | opaque token | ✓ |
| Cohere | Authorization: Bearer | api.cohere.com/v1 | CO_API_KEY | opaque token | ✗ (own SDK) |
| DeepSeek | Authorization: Bearer | api.deepseek.com/v1 | DEEPSEEK_API_KEY | sk-... | ✓ |
| Hugging Face | Authorization: Bearer | api-inference.huggingface.co | HF_TOKEN | hf_... | partial (chat only) |
| Azure OpenAI | api-key | {resource}.openai.azure.com | AZURE_OPENAI_API_KEY | not publicly documented | ✓ (via Azure SDK) |
| AnyScale¹ | Authorization: Bearer | api.endpoints.anyscale.com/v1 | ANYSCALE_API_KEY | esecret_... | ✓ |
| ¹ Effective August 1, 2024, Anyscale Endpoints is available exclusively via the Hosted Anyscale Platform; multi-tenant LLM access was removed. |
Cohere is the only provider in this series with a dedicated embedding endpoint, a dedicated reranking endpoint, and a dedicated classification endpoint as first-class products. If embeddings or search ranking are core to your product, Cohere's Embed and Rerank are native solutions — not a generic model approximating these capabilities.
Using Cohere in a Multi-Model Architecture
Most AI-native SaaS teams don't rely on a single provider long term. Common reasons: cost optimization, fallback when one provider degrades, model specialization, enterprise customer preference.
Instead of building separate integrations for Cohere, OpenAI, Anthropic, Gemini, and Mistral, you can integrate once against Unified's Generative AI API.
Build once across LLM providers
Unified's Generative AI API standardizes three core objects across providers — including Cohere:
Model — id, max_tokens, temperature support
Prompt — model_id, messages, temperature, max_tokens, responses, tokens_used
Embedding — model_id, content, dimension, embeddings, tokens_used
The Embedding object maps directly to Cohere's native Embed endpoint — when you call Unified for embeddings with a Cohere connection, you're calling Cohere's dedicated embedding service, not a generic model approximating embeddings.
This allows you to:
- Switch between Cohere and other providers without rewriting integration code
- Run the same prompt across models and compare outputs
- Route requests dynamically based on cost or availability
- Generate embeddings consistently across providers using Cohere's native Embed product
Let Cohere take action via Unified MCP
Text generation is one layer. Production AI features require structured, authorized reads and writes against customer SaaS platforms — listing CRM deals, retrieving candidates, fetching files, updating records, writing notes.
Unified's MCP server connects Cohere to customer integrations through tool-calling.
High-level flow:
- Fetch tools formatted for Cohere:
GET /tools?type=cohere - Include tools in your
co.chat()request - When Cohere returns a tool call, execute it:
POST /tools/{id}/call - Return the tool result back to Cohere
MCP URLs (regional):
https://mcp-api.unified.to/mcphttps://mcp-api-eu.unified.to/mcp
MCP safety controls:
hide_sensitive=true— removes PII fields from tool resultspermissions=...— restricts what the connection can dotools=...— limits the tool set to reduce model overloaddefer_tools— lowers tool token usage
LLMs have tool limits. Scoping is not optional — it's required for stable deployments.
Troubleshooting Common Errors
401 Unauthorized
Key is invalid, inactive, or revoked. Check the Authorization: Bearer format and look for trailing whitespace. Regenerate at dashboard.cohere.com under API Keys if needed.
400 Bad Request
Request payload is malformed. Most common cause: using the generate() endpoint pattern instead of chat(), or wrong role fields in the messages array (e.g., using "system" where "user" is required).
404 Not Found
Model string is wrong or the endpoint URL is incorrect. Avoid unversioned aliases — use explicit IDs like command-a-03-2025. Verify current IDs via GET /v1/models.
429 Too Many Requests Rate limit exceeded. On trial keys, the 1,000 calls/month cap applies globally. Implement exponential backoff. Upgrade to a production key for higher limits.
co.generate() DeprecationWarning
You're using the old v1 SDK pattern. Migrate to ClientV2 + co.chat(). The Generate API was deprecated August 26, 2025.
FAQ
Is the Cohere API free? Yes — trial keys are free with 1,000 API calls/month and no credit card required. For production workloads, add billing to upgrade to a production key.
What does a Cohere API key look like?
An opaque token with no standardized prefix — it won't start with sk- or any identifiable pattern.
Which environment variable should I use — CO_API_KEY or COHERE_API_KEY****?
Use CO_API_KEY — it's the canonical name that Cohere's Python SDK reads automatically. COHERE_API_KEY works as a fallback if you pass it explicitly, but isn't the default convention.
Why shouldn't I use co.generate()****?
The Generate API was deprecated on August 26, 2025. Use ClientV2 + co.chat() with a versioned Command model ID instead. See Cohere's "Migrating from Generate to Chat" docs.
What's the difference between Cohere and other LLM providers? Cohere is the only provider in this series with dedicated, first-class endpoints for embeddings (Embed), search ranking (Rerank), and classification (Classify). Other providers offer embeddings as a side capability; Cohere built native products for each use case.
Can I use the OpenAI SDK with Cohere?
Not directly. Cohere has its own SDK and API shape. Use cohere.ClientV2() or the raw REST API.
What's the trial key monthly limit? 1,000 API calls per month across all endpoints — Chat, Embed, Rerank, Classify combined. For higher limits, upgrade to a production key.
How do I verify my key is working?
Call POST https://api.cohere.com/v1/check-api-key with your key in the Authorization: Bearer header. A 2xx response confirms the key is valid and active.
How do I set CO_API_KEY in Python?
export CO_API_KEY="your-api-key"
Then in Python:
import cohere
co = cohere.ClientV2() # reads CO_API_KEY automatically
response = co.chat(
model="command-a-03-2025",
messages=[{"role": "user", "content": "Hello, Cohere!"}]
)
print(response.message.content[0].text)
Key takeaway
Calling Cohere directly is straightforward. Building multi-model routing, native embedding pipelines, search-ranking workflows, and enterprise-grade SaaS integrations requires infrastructure.
Unified was built for AI-native SaaS teams that need:
- Real-time data access
- Unified pass-through architecture
- Zero storage of customer data
- Usage-based pricing aligned with API volume
- MCP-compatible integration across 450+ integrations
Cohere generates intelligence — and with its native Embed and Rerank endpoints, powers the retrieval layer too. Unified connects both to structured SaaS data and authorized actions.