Model configuration

LLM tiers, embedding models, and OpenAI-compatible providers when self-hosting ctx|.

When self-hosting ctx|, you configure LLM and embedding models via environment variables. The service supports any OpenAI-compatible HTTP API. Use MODEL_PROVIDER to select OpenRouter-specific features, Azure (api-key auth), or Bedrock (Bearer API key or IAM SigV4).

MODEL_PROVIDER values

ValueAuth (chat + embeddings)MODEL_PROVIDER_URL defaultNotes
openai-likeAuthorization: Bearerhttps://openrouter.ai/api/v1Generic compatible API; no OpenRouter-only extras.
openrouterAuthorization: BearersameEnables plugins, prompt cache, reasoning flags, medium-tier model fallbacks.
azureapi-key header onlyrequiredSet your Azure OpenAI OpenAI-compatible base URL; override model name env vars for deployment IDs.
bedrockBearer or IAM SigV4requiredUse Bedrock OpenAI-compatible base (e.g. Mantle https://bedrock-mantle.<region>.api.aws/v1). With MODEL_PROVIDER_API_KEY, sends Bearer; without key, uses AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY and SigV4.

If MODEL_PROVIDER is unset, it defaults to openai-like. To keep previous behavior (OpenRouter URL heuristics that added extras whenever the host was OpenRouter), set MODEL_PROVIDER=openrouter.

Vertex and other gateways that expect Bearer + base URL use openai-like (or openrouter if routed via OpenRouter).

Quick Start (OpenRouter)

The default setup uses OpenRouter. Set your API key and start the stack:

MODEL_PROVIDER_API_KEY=sk-or-v1-... docker compose --profile deploy up -d

LLM and embeddings both use OpenRouter by default. No extra configuration needed.

Environment Variables

VariableRequiredDefaultDescription
MODEL_PROVIDERNoopenai-likeopenai-like | openrouter | azure | bedrock - see table above.
MODEL_PROVIDER_API_KEYUsually*-API key or Bedrock long-lived token (Bearer). For Bedrock IAM, omit and set AWS env keys.
MODEL_PROVIDER_URLNo**OpenRouter base when not azure/bedrockBase URL for the LLM provider. Required for azure and bedrock.
MODEL_BEDROCK_AWS_REGIONNo-Override AWS region for Bedrock SigV4; else derived from URL or AWS_REGION.
MODEL_FAST_NAMENogoogle/gemini-3-flash-previewLLM model for the fast tier (cheap, quick).
MODEL_MEDIUM_NAMENogoogle/gemini-3-flash-previewLLM model for the medium tier (balanced).
MODEL_HIGH_NAMENomoonshotai/kimi-k2.6LLM model for the high tier (best quality).
MODEL_EMBEDDING_PROVIDER_URLNo{MODEL_PROVIDER_URL}/embeddingsEmbedding endpoint. Override for e.g. Ollama at http://host:11434/v1/embeddings.
MODEL_EMBEDDING_PROVIDER_API_KEYNoMODEL_PROVIDER_API_KEYEmbedding API key. Use a separate key if embeddings use a different provider.
MODEL_EMBEDDING_NAMENoopenai/text-embedding-3-largeEmbedding model ID.

*Required for LLM when not using Bedrock IAM-only; embedding schema validates similarly for bedrock.

**Default MODEL_PROVIDER_URL is https://openrouter.ai/api/v1 for openai-like and openrouter so bundled default model IDs stay valid.

How to Pick Models

LLM tiers

The service uses three tiers for different workloads:

TierUse caseDefault model
fastQuick tasks, naming, planningGemini 3 Flash
mediumMain agent, balanced cost/qualityGemini 3 Flash
highComplex reasoning, best qualityKimi K2.6

Override any tier with MODEL_FAST_NAME, MODEL_MEDIUM_NAME, or MODEL_HIGH_NAME. Use model IDs from your provider (e.g. OpenRouter format: org/model-name).

When MODEL_PROVIDER=openrouter (or fully managed deployments that set it), the medium tier sends OpenRouter’s models array so that if the primary model errors (rate limits, downtime, moderation), routing tries the next distinct model IDs in order (MODEL_FAST_NAME, then MODEL_HIGH_NAME). Duplicate IDs in that chain are removed automatically (defaults use the same fast and medium model, so the fallback chain is typically Gemini 3 Flash → Kimi K2.6). This does not apply to openai-like.

Embedding model

Important: The embedding model must support 2000 dimensions. The schema stores vectors in vector(2000); incompatible models will fail.

  • OpenAI-compatible providers (OpenRouter, OpenAI, Vertex, Bedrock): Use models that accept a dimensions parameter, e.g. openai/text-embedding-3-large (3072 dims, truncates to 2000).
  • Ollama: Use models with native 2000-dim support, e.g. qwen3-embedding. Set MODEL_EMBEDDING_PROVIDER_URL=http://host:11434/v1/embeddings and MODEL_EMBEDDING_NAME=qwen3-embedding.

All providers use the same OpenAI-compatible embeddings API; Ollama exposes it at /v1/embeddings.

Provider Examples

OpenRouter (default host)

MODEL_PROVIDER_API_KEY=sk-or-v1-...
# Optional explicit mode (enables OpenRouter extras; default URL is OpenRouter either way):
# MODEL_PROVIDER=openrouter
MODEL_PROVIDER_URL=https://openrouter.ai/api/v1

Azure OpenAI

MODEL_PROVIDER=azure
MODEL_PROVIDER_API_KEY=<your-azure-api-key>
MODEL_PROVIDER_URL=https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT
# Set model env vars to your deployment names as needed.

Amazon Bedrock (OpenAI-compatible)

MODEL_PROVIDER=bedrock
MODEL_PROVIDER_URL=https://bedrock-mantle.us-east-1.api.aws/v1
# API key path (e.g. Bedrock long-term key):
MODEL_PROVIDER_API_KEY=...
# Or IAM: omit MODEL_PROVIDER_API_KEY; set AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and optional MODEL_BEDROCK_AWS_REGION

Ollama (local embeddings)

MODEL_EMBEDDING_PROVIDER_URL=http://localhost:11434/v1/embeddings
MODEL_EMBEDDING_NAME=qwen3-embedding
# For Docker: use http://host.docker.internal:11434/v1/embeddings

Separate embedding provider

Use OpenRouter for LLM and a different provider for embeddings:

MODEL_PROVIDER_API_KEY=sk-or-v1-...
MODEL_EMBEDDING_PROVIDER_URL=https://api.openai.com/v1/embeddings
MODEL_EMBEDDING_PROVIDER_API_KEY=sk-...
MODEL_EMBEDDING_NAME=text-embedding-3-large

Resources