Model configuration
LLM tiers, embedding models, and OpenAI-compatible providers when self-hosting ctx|.
When self-hosting ctx|, you configure LLM and embedding models via environment variables. The service supports any OpenAI-compatible provider: OpenRouter, OpenAI, Vertex AI, Bedrock, Ollama, and others.
Quick Start (OpenRouter)
The default setup uses OpenRouter. Set your API key and start the stack:
MODEL_PROVIDER_API_KEY=sk-or-v1-... docker compose --profile deploy up -dLLM and embeddings both use OpenRouter by default. No extra configuration needed.
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
MODEL_PROVIDER_API_KEY | Yes (LLM) | — | API key for the LLM provider. Also used for embeddings unless overridden. |
MODEL_PROVIDER_URL | No | https://openrouter.ai/api/v1 | Base URL for the LLM provider. |
MODEL_FAST_NAME | No | xiaomi/mimo-v2-flash | LLM model for the fast tier (cheap, quick). |
MODEL_MEDIUM_NAME | No | google/gemini-3-flash-preview | LLM model for the medium tier (balanced). |
MODEL_HIGH_NAME | No | z-ai/glm-5 | LLM model for the high tier (best quality). |
MODEL_EMBEDDING_PROVIDER_URL | No | {MODEL_PROVIDER_URL}/embeddings | Embedding endpoint. Override for e.g. Ollama at http://host:11434/v1/embeddings. |
MODEL_EMBEDDING_PROVIDER_API_KEY | No | MODEL_PROVIDER_API_KEY | Embedding API key. Use a separate key if embeddings use a different provider. |
MODEL_EMBEDDING_NAME | No | openai/text-embedding-3-large | Embedding model ID. |
How to Pick Models
LLM tiers
The service uses three tiers for different workloads:
| Tier | Use case | Default model |
|---|---|---|
| fast | Quick tasks, naming, planning | MiMo V2 Flash |
| medium | Main agent, balanced cost/quality | Gemini 3 Flash |
| high | Complex reasoning, best quality | GLM-5 |
Override any tier with MODEL_FAST_NAME, MODEL_MEDIUM_NAME, or MODEL_HIGH_NAME. Use model IDs from your provider (e.g. OpenRouter format: org/model-name).
Embedding model
Important: The embedding model must support 2000 dimensions. The schema stores vectors in vector(2000); incompatible models will fail.
- OpenAI-compatible providers (OpenRouter, OpenAI, Vertex, Bedrock): Use models that accept a
dimensionsparameter, e.g.openai/text-embedding-3-large(3072 dims, truncates to 2000). - Ollama: Use models with native 2000-dim support, e.g.
qwen3-embedding. SetMODEL_EMBEDDING_PROVIDER_URL=http://host:11434/v1/embeddingsandMODEL_EMBEDDING_NAME=qwen3-embedding.
All providers use the same OpenAI-compatible embeddings API; Ollama exposes it at /v1/embeddings.
Provider Examples
OpenRouter (default)
MODEL_PROVIDER_API_KEY=sk-or-v1-...
MODEL_PROVIDER_URL=https://openrouter.ai/api/v1Ollama (local embeddings)
MODEL_EMBEDDING_PROVIDER_URL=http://localhost:11434/v1/embeddings
MODEL_EMBEDDING_NAME=qwen3-embedding
# For Docker: use http://host.docker.internal:11434/v1/embeddingsSeparate embedding provider
Use OpenRouter for LLM and a different provider for embeddings:
MODEL_PROVIDER_API_KEY=sk-or-v1-...
MODEL_EMBEDDING_PROVIDER_URL=https://api.openai.com/v1/embeddings
MODEL_EMBEDDING_PROVIDER_API_KEY=sk-...
MODEL_EMBEDDING_NAME=text-embedding-3-large