Model configuration

LLM tiers, embedding models, and OpenAI-compatible providers when self-hosting ctx|.

When self-hosting ctx|, you configure LLM and embedding models via environment variables. The service supports any OpenAI-compatible provider: OpenRouter, OpenAI, Vertex AI, Bedrock, Ollama, and others.

Quick Start (OpenRouter)

The default setup uses OpenRouter. Set your API key and start the stack:

MODEL_PROVIDER_API_KEY=sk-or-v1-... docker compose --profile deploy up -d

LLM and embeddings both use OpenRouter by default. No extra configuration needed.

Environment Variables

VariableRequiredDefaultDescription
MODEL_PROVIDER_API_KEYYes (LLM)API key for the LLM provider. Also used for embeddings unless overridden.
MODEL_PROVIDER_URLNohttps://openrouter.ai/api/v1Base URL for the LLM provider.
MODEL_FAST_NAMENoxiaomi/mimo-v2-flashLLM model for the fast tier (cheap, quick).
MODEL_MEDIUM_NAMENogoogle/gemini-3-flash-previewLLM model for the medium tier (balanced).
MODEL_HIGH_NAMENoz-ai/glm-5LLM model for the high tier (best quality).
MODEL_EMBEDDING_PROVIDER_URLNo{MODEL_PROVIDER_URL}/embeddingsEmbedding endpoint. Override for e.g. Ollama at http://host:11434/v1/embeddings.
MODEL_EMBEDDING_PROVIDER_API_KEYNoMODEL_PROVIDER_API_KEYEmbedding API key. Use a separate key if embeddings use a different provider.
MODEL_EMBEDDING_NAMENoopenai/text-embedding-3-largeEmbedding model ID.

How to Pick Models

LLM tiers

The service uses three tiers for different workloads:

TierUse caseDefault model
fastQuick tasks, naming, planningMiMo V2 Flash
mediumMain agent, balanced cost/qualityGemini 3 Flash
highComplex reasoning, best qualityGLM-5

Override any tier with MODEL_FAST_NAME, MODEL_MEDIUM_NAME, or MODEL_HIGH_NAME. Use model IDs from your provider (e.g. OpenRouter format: org/model-name).

Embedding model

Important: The embedding model must support 2000 dimensions. The schema stores vectors in vector(2000); incompatible models will fail.

  • OpenAI-compatible providers (OpenRouter, OpenAI, Vertex, Bedrock): Use models that accept a dimensions parameter, e.g. openai/text-embedding-3-large (3072 dims, truncates to 2000).
  • Ollama: Use models with native 2000-dim support, e.g. qwen3-embedding. Set MODEL_EMBEDDING_PROVIDER_URL=http://host:11434/v1/embeddings and MODEL_EMBEDDING_NAME=qwen3-embedding.

All providers use the same OpenAI-compatible embeddings API; Ollama exposes it at /v1/embeddings.

Provider Examples

OpenRouter (default)

MODEL_PROVIDER_API_KEY=sk-or-v1-...
MODEL_PROVIDER_URL=https://openrouter.ai/api/v1

Ollama (local embeddings)

MODEL_EMBEDDING_PROVIDER_URL=http://localhost:11434/v1/embeddings
MODEL_EMBEDDING_NAME=qwen3-embedding
# For Docker: use http://host.docker.internal:11434/v1/embeddings

Separate embedding provider

Use OpenRouter for LLM and a different provider for embeddings:

MODEL_PROVIDER_API_KEY=sk-or-v1-...
MODEL_EMBEDDING_PROVIDER_URL=https://api.openai.com/v1/embeddings
MODEL_EMBEDDING_PROVIDER_API_KEY=sk-...
MODEL_EMBEDDING_NAME=text-embedding-3-large

Resources