Production-grade Kubernetes operator for LiteLLM — declarative AI gateway deployments, bidirectional config sync, and first-class OpenShift support.
The community LiteLLM Helm chart deploys the proxy, but leaves you with a hard trade-off: manage models and keys through the Admin UI (convenient, but not GitOps-friendly) or through proxy_server_config.yaml (reproducible, but no UI). Pick one and you lose the other.
This operator dissolves that trade-off. Every resource — instances, organizations, models, teams, users, keys, credentials, guardrails — is a first-class Kubernetes CRD, reconciled continuously against the LiteLLM REST API. Git is the source of truth; Admin-UI drift is detected on every sync interval and resolved per your policy (crd-wins, api-wins, or manual). You get GitOps and the Admin UI, backed by the same state.
It also handles the parts a Helm chart can't: finalizer-based cleanup that deletes upstream API objects, generated virtual keys stored as garbage-collected Kubernetes Secrets, enterprise license activation, rollback-on-failure, OpenShift-native routing, six-backend response caching, and external secret-manager integration so provider API keys never touch etcd.
kubectl apply
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Kubernetes API │
│ │
│ LiteLLMInstance LiteLLMOrganization LiteLLMModel │
│ LiteLLMTeam LiteLLMUser LiteLLMCustomer │
│ LiteLLMVirtualKey LiteLLMCredential LiteLLMGuardrail│
└──────────────────────────────┬──────────────────────────────┘
│ watches / reconciles
▼
┌────────────────────┐
│ LiteLLM Operator │
└─────────┬──────────┘
┌───────────────────┬─┴─┬───────────────────────┐
│ Deployment │ │ LiteLLM REST API │
│ ConfigMap │ │ (bidirectional sync) │
│ Secrets / HPA │ │ crd-wins · api-wins │
│ Ingress / Route / │ │ preserve · prune │
│ HTTPRoute │ │ adopt │
│ ServiceMonitor │ │ │
│ PrometheusRule │ │ │
│ Grafana dashboard │ │ │
└─────────┬─────────┘ └───────────┬───────────┘
▼ ▼
┌──────────────────────────────────────────┐
│ LiteLLM Proxy + Postgres + Redis │
└──────────────────────────────────────────┘
| Area | What you get |
|---|---|
| Infrastructure | Declarative Deployment, ConfigMap, Service, Secrets, HPA v2, PDB, NetworkPolicy; migration Jobs per image tag; auto-rollback on ProgressDeadlineExceeded; topology spread constraints; runAsNonRoot mode using the official non-root image |
| Networking | Kubernetes Ingress, OpenShift Route, Gateway API HTTPRoute — pick one declaratively per instance |
| Multi-tenancy | Full Organization → Team → User → Key hierarchy with budgets, member management (crd / sso / mixed modes), and org-scoped model access |
| API-managed CRDs | LiteLLMOrganization, LiteLLMModel, LiteLLMTeam, LiteLLMUser, LiteLLMCustomer, LiteLLMVirtualKey — reconciled via the LiteLLM REST API with finalizer-based cleanup and spec-hash change detection |
| Config-managed CRDs | LiteLLMCredential (reusable provider API keys via credentialRef) and LiteLLMGuardrail (Aporia, Lakera, Presidio, Bedrock, LLM Guard, Guardrails AI, Azure, Google Text Moderation, custom) — materialized into proxy_server_config.yaml, keys injected via secretKeyRef (never read into operator memory) |
| Bidirectional sync | Periodic drift detection with crd-wins / api-wins / manual resolution and preserve / prune / adopt policies for unmanaged resources |
| VirtualKey lifecycle | Generated API keys stored in owner-referenced Kubernetes Secrets; rotation and revocation follow CRD deletion |
| Authentication | SSO for Azure Entra, Okta, Google, generic OIDC; SCIM v2 provisioning; JWT and OAuth2 auth for M2M flows; custom SSO handlers via ConfigMap or image |
| Security | IP allowlisting with X-Forwarded-For support, RBAC via spec.rbac, external secret managers (AWS Secrets Manager / KMS, Azure Key Vault, Google Secret Manager / KMS, HashiCorp Vault) with IRSA and workload-identity support |
| Reliability | 6-backend response caching (Redis / S3 / GCS / Qdrant / Redis-semantic / local), fallback chains (default, per-model, content-policy, context-window), per-error-type retry policies, tag-based routing, per-provider budget caps |
| Observability | ServiceMonitor + PrometheusRule with 6 built-in alerts and runbook annotations; auto-provisioned Grafana dashboard ConfigMap |
| Data | Optional CloudNativePG integration with ScheduledBackup (snapshot or barmanObjectStore) |
| Admin UI | Disable, admin-only mode, DB-backed model management, personal-key gating, custom docs URL, logo, email branding, color themes via ConfigMap |
| Distribution | OLM bundle for OperatorHub / OpenShift Catalog and a Helm chart for clusters without OLM |
| Enterprise | Convention-based license Secret detection ({instance}-license or litellm-license) with EnterpriseLicenseRequired status conditions when unlicensed enterprise features are requested |
Full documentation: see the
docs/folder — guides for SSO, config sync, caching, RBAC, observability, secret managers, and per-CRD reference pages underdocs/reference/.
| CRD | Short Name | Description |
|---|---|---|
LiteLLMInstance |
li |
Deploys a LiteLLM proxy with database, Redis, networking, and SSO |
LiteLLMOrganization |
lo |
Creates an organization for multi-tenant isolation with budget and model access |
LiteLLMModel |
lm |
Registers a model (e.g., openai/gpt-4o) with the proxy |
LiteLLMTeam |
lt |
Creates a team with budget limits and member management |
LiteLLMUser |
lu |
Creates a user (service accounts, bot users, non-SSO environments) |
LiteLLMCustomer |
lcust |
Manages an external end-user (SaaS customer) with budgets and rate limits |
LiteLLMCredential |
lc |
Defines a reusable provider credential (API key + optional base URL) shared across models |
LiteLLMGuardrail |
lg |
Defines a content moderation / safety integration (Aporia, Lakera, Presidio, Bedrock, etc.) |
LiteLLMVirtualKey |
lk |
Generates an API key scoped to a team/user with budget and rate limits |
All secondary resources reference a LiteLLMInstance via spec.instanceRef. Teams can optionally reference a LiteLLMOrganization via spec.organizationRef.
- Go 1.22+
- Docker 17.03+
- kubectl v1.28+
- Access to a Kubernetes v1.28+ cluster
- A PostgreSQL database for LiteLLM state storage
make installmake deploy IMG=ghcr.io/palenaai/litellm-operator:latestkubectl create secret generic litellm-db-credentials \
--from-literal=DATABASE_URL='postgresql://user:pass@host:5432/litellm'apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
name: my-gateway
spec:
replicas: 2
masterKey:
autoGenerate: true
database:
external:
connectionSecretRef:
name: litellm-db-credentials
key: DATABASE_URL
service:
type: ClusterIP
port: 4000apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMModel
metadata:
name: gpt4o
spec:
instanceRef:
name: my-gateway
modelName: gpt-4o
litellmParams:
model: openai/gpt-4o
apiKeySecretRef:
name: openai-credentials
key: OPENAI_API_KEYWhen many models share the same provider API key (e.g., several OpenAI deployments), define the credential once and reference it from each LiteLLMModel via credentialRef:
apiVersion: v1
kind: Secret
metadata:
name: openai-credentials
type: Opaque
stringData:
api-key: sk-...
---
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMCredential
metadata:
name: openai-prod
spec:
instanceRef:
name: my-gateway
# The name used under `credential_list` in the generated proxy config.
# Models reference this via `litellm_params.litellm_credential_name`.
credentialName: openai-prod
apiKeySecretRef:
name: openai-credentials
key: api-key
# Optional extras merged into credential_info (api_base / api_version /
# free-form params are supported — params cannot override reserved keys).
apiBase: https://api.openai.com/v1
---
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMModel
metadata:
name: gpt4o
spec:
instanceRef:
name: my-gateway
modelName: gpt-4o
litellmParams:
model: openai/gpt-4o
credentialRef:
name: openai-prod # takes precedence over inline apiKeySecretRef/apiBaseThe operator injects the API key into the proxy pod via a secretKeyRef-backed env var (CREDENTIAL_OPENAI_PROD_API_KEY) and writes a matching os.environ/… reference to the generated proxy_server_config.yaml — the key value itself is never read into the operator's memory. Rotating the key is a Secret update: the operator rolls out a new Deployment pod to pick up the new value.
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMTeam
metadata:
name: engineering
spec:
instanceRef:
name: my-gateway
teamAlias: engineering
models: [gpt-4o]
maxBudgetMonthly: 1000
budgetDuration: "30d"
members:
- email: [email protected]
role: user
---
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMVirtualKey
metadata:
name: eng-ci-key
spec:
instanceRef:
name: my-gateway
keyAlias: eng-ci-key
teamRef:
name: engineering
models: [gpt-4o]
maxBudget: "100"Create an organization to group teams under a shared budget and model access policy:
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMOrganization
metadata:
name: acme-corp
spec:
instanceRef:
name: my-gateway
organizationAlias: acme-corp
models: [gpt-4o, claude-4-sonnet]
maxBudget: 5000
budgetDuration: "30d"
members:
- email: [email protected]
role: org_admin
- email: [email protected]
role: internal_user
---
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMTeam
metadata:
name: acme-engineering
spec:
instanceRef:
name: my-gateway
organizationRef:
name: acme-corp
teamAlias: acme-engineering
models: [gpt-4o]For OpenShift or clusters enforcing Pod Security Standards, enable non-root mode:
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
name: my-gateway
spec:
security:
runAsNonRoot: true
# ... rest of specThis automatically switches to the official litellm-non_root image (runs as nobody, UID 65534) and applies a restricted pod security context compatible with OpenShift's restricted SCC.
Restrict API access to specific IP addresses or CIDR ranges:
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
name: my-gateway
spec:
security:
ipAllowlist:
enabled: true
allowedIPs:
- "10.0.0.0/8"
- "192.168.1.0/24"
- "203.0.113.50"
useXForwardedFor: true # required behind load balancers
# ... rest of specEnforce route restrictions and key generation controls:
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
name: my-gateway
spec:
rbac:
enabled: true
adminOnlyRoutes:
- /model/new
- /model/delete
allowedRoutes:
- /chat/completions
- /embeddings
- /key/info
defaultTeamDisabled: true # force team-based keys
keyGeneration: # enterprise
teamKeyGeneration:
allowedTeamMemberRoles: ["admin"]
rolePermissions: # enterprise
internal_user:
routes: ["/key/generate", "/key/info"]
models: ["gpt-4", "claude-3-haiku"]
# ... rest of specFor OpenShift clusters, create a Route instead of an Ingress:
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
name: my-gateway
spec:
route:
enabled: true
host: litellm.apps.example.com
tlsTermination: edge # edge | passthrough | reencrypt
# ... rest of specFor clusters using the Gateway API (Istio, Envoy Gateway, Cilium, etc.):
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
name: my-gateway
spec:
gatewayHTTPRoute:
enabled: true
host: litellm.example.com
parentRefs:
- name: my-gateway # Name of the Gateway resource
namespace: istio-system # Optional: namespace of the Gateway
sectionName: https # Optional: specific listener on the Gateway
# ... rest of specEnable ServiceMonitor, alerting rules, and a Grafana dashboard:
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
name: my-gateway
spec:
observability:
serviceMonitor:
enabled: true
interval: "30s"
prometheusRule:
enabled: true
# disabledAlerts: ["LiteLLMHighCPUUsage"] # optionally disable specific alerts
grafanaDashboard:
enabled: true
folder: "LiteLLM"
# ... rest of specBuilt-in alerts: LiteLLMInstanceDown (critical), LiteLLMInstanceDegraded, LiteLLMPodRestarting, LiteLLMPodNotReady, LiteLLMHighMemoryUsage, LiteLLMHighCPUUsage. Each alert includes a runbook annotation with troubleshooting commands.
When using CloudNativePG for the database, enable scheduled backups:
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
name: my-gateway
spec:
database:
cloudnativepg:
clusterName: litellm-db
backup:
enabled: true
schedule: "0 2 * * *" # daily at 2am
retention: 7
method: snapshot # snapshot or barmanObjectStore
# ... rest of specRoute requests to different model deployments based on tags. Useful for free/paid tiers or team-specific model access:
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
name: my-gateway
spec:
routerSettings:
enableTagFiltering: true
# ... rest of spec
---
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMModel
metadata:
name: gpt4-paid
spec:
instanceRef:
name: my-gateway
modelName: gpt-4
litellmParams:
model: openai/gpt-4
apiKeySecretRef:
name: openai-credentials
key: OPENAI_API_KEY
tags: ["paid"]
---
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMTeam
metadata:
name: paid-tier
spec:
instanceRef:
name: my-gateway
teamAlias: paid-tier
tags: ["paid"]Requests from the paid-tier team are routed to model deployments tagged paid. Use tagFilteringMatchAny: true in routerSettings to match requests having ANY of the specified tags (default is ALL must match).
Configure model fallback routing so requests automatically try alternative models on failure:
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
name: my-gateway
spec:
fallbacks:
# Global fallbacks applied on any error
defaultFallbacks: ["gpt-4-mini", "claude-3-haiku"]
# Per-model fallbacks for general errors
modelFallbacks:
- model: gpt-4
fallbacks: ["gpt-4-mini", "claude-3-haiku"]
# Fallbacks for content policy violations
contentPolicyFallbacks:
- model: gpt-4
fallbacks: ["claude-3-sonnet"]
# Fallbacks for context window exceeded
contextWindowFallbacks:
- model: gpt-4
fallbacks: ["gpt-4-32k", "claude-3-sonnet"]
maxFallbacks: 3
routerSettings:
# Retry policy by error type (retries on same model before fallback)
retryPolicy:
TimeoutError: 2
RateLimitError: 3
ContentPolicyViolationError: 0
# Per-model-group retry overrides
modelGroupRetryPolicy:
gpt-4:
TimeoutError: 1
RateLimitError: 0
# ... rest of specConfigure response caching to reduce latency and costs:
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
name: my-gateway
spec:
caching:
enabled: true
type: redis # redis, redis-semantic, s3, gcs, qdrant, local
ttl: 600 # cache TTL in seconds
namespace: "my-app" # key isolation namespace
mode: default_on # default_on or default_off
supportedCallTypes: # restrict to specific call types
- acompletion
- aembedding
redis:
host: redis.example.com
port: 6379
passwordSecretRef:
name: redis-secret
key: password
ssl: true
# ... rest of specWhen type: redis and no caching.redis block is provided, the operator reuses the instance's existing spec.redis connection — no need to duplicate Redis details.
Other backends: s3 (with bucket, region, AWS credentials), gcs (with bucket, GCS service account), qdrant (semantic caching with embeddings), local (in-memory, no external dependencies).
Automatically rollback failed deployments:
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
name: my-gateway
spec:
upgrade:
strategy: rolling
autoRollback: true
healthCheckTimeout: "300s"
# ... rest of specWhen enabled, the operator tracks the last successful deployment revision. If a new deployment exceeds the progress deadline, the operator triggers a rollback and sets a status condition.
To activate LiteLLM Enterprise features, create a Secret with your license key. The operator detects it automatically and injects the LITELLM_LICENSE environment variable into the proxy Deployment.
Per-instance license (takes precedence):
kubectl create secret generic my-gateway-license \
--from-literal=license-key='your-litellm-enterprise-license-key'Namespace-wide license (fallback for all instances in the namespace):
kubectl create secret generic litellm-license \
--from-literal=license-key='your-litellm-enterprise-license-key'The operator checks for {instance-name}-license first, then falls back to litellm-license. License status is reported in .status.license:
kubectl get litellminstance my-gateway -o jsonpath='{.status.license}'
# {"active":true,"secretName":"my-gateway-license"}If a downstream resource (Model, Team, User, VirtualKey) requires an enterprise feature and no license is present, the operator sets Reason: EnterpriseLicenseRequired on the resource's status condition without retrying.
By default, the operator watches all namespaces. To restrict it to specific namespaces:
Helm:
helm install litellm-operator deploy/charts/litellm-operator/ \
--set watchNamespaces="team-a,team-b"Flag:
/manager --watch-namespaces=team-a,team-bEnvironment variable (set automatically by OLM for OwnNamespace/SingleNamespace install modes):
WATCH_NAMESPACE=team-a,team-bThe generated API key is stored in a Secret (default name: {name}-key):
kubectl get secret eng-ci-key-key -o jsonpath='{.data.api_key}' | base64 -dmake install # Install CRDs
make deploy # Deploy operatoroperator-sdk run bundle ghcr.io/palenaai/litellm-operator-bundle:v0.11.1helm install litellm-operator deploy/charts/litellm-operator/make build # Build operator binary
make docker-build IMG=... # Build container imagemake test # Unit + integration tests (envtest)
make test-e2e # End-to-end tests (requires cluster)make generate # DeepCopy functions
make manifests # CRD YAMLs, RBAC, webhooksmake install # Install CRDs first
make run # Run operator outside the clusterKey design points:
- LiteLLMInstance controller manages Deployment, ConfigMap, Service, Secrets, Ingress, HPA, PDB, NetworkPolicy, migration Jobs, ServiceMonitor, PrometheusRule, Grafana dashboard ConfigMaps, and CNPG ScheduledBackups
- Secondary controllers (Organization, Model, Team, User, VirtualKey) resolve their
instanceRefto discover the LiteLLM API endpoint and master key, then sync state via the REST API - Finalizers ensure cleanup: deleting a CRD calls the corresponding LiteLLM API delete endpoint before removing the Kubernetes resource
- Spec hash annotations (
litellm.palena.ai/sync-hash) enable change detection to avoid unnecessary API calls
api/v1alpha1/ CRD type definitions
internal/controller/ Reconciliation controllers
internal/litellm/ LiteLLM REST API client
internal/resources/ Kubernetes resource generators
config/crd/bases/ Generated CRD manifests
config/samples/ Example custom resources
bundle/ OLM bundle manifests
deploy/charts/ Helm chart
Copyright 2026. Licensed under the Apache License, Version 2.0.
