LiteLLM Operator

Production-grade Kubernetes operator for LiteLLM — declarative AI gateway deployments, bidirectional config sync, and first-class OpenShift support.

Why this operator?

The community LiteLLM Helm chart deploys the proxy, but leaves you with a hard trade-off: manage models and keys through the Admin UI (convenient, but not GitOps-friendly) or through proxy_server_config.yaml (reproducible, but no UI). Pick one and you lose the other.

This operator dissolves that trade-off. Every resource — instances, organizations, models, teams, users, keys, credentials, guardrails — is a first-class Kubernetes CRD, reconciled continuously against the LiteLLM REST API. Git is the source of truth; Admin-UI drift is detected on every sync interval and resolved per your policy (crd-wins, api-wins, or manual). You get GitOps and the Admin UI, backed by the same state.

It also handles the parts a Helm chart can't: finalizer-based cleanup that deletes upstream API objects, generated virtual keys stored as garbage-collected Kubernetes Secrets, enterprise license activation, rollback-on-failure, OpenShift-native routing, six-backend response caching, and external secret-manager integration so provider API keys never touch etcd.

Architecture at a glance

                        kubectl apply
                             │
                             ▼
 ┌─────────────────────────────────────────────────────────────┐
 │                        Kubernetes API                       │
 │                                                             │
 │   LiteLLMInstance     LiteLLMOrganization   LiteLLMModel    │
 │   LiteLLMTeam         LiteLLMUser           LiteLLMCustomer │
 │   LiteLLMVirtualKey   LiteLLMCredential     LiteLLMGuardrail│
 └──────────────────────────────┬──────────────────────────────┘
                                │   watches / reconciles
                                ▼
                     ┌────────────────────┐
                     │  LiteLLM Operator  │
                     └─────────┬──────────┘
         ┌───────────────────┬─┴─┬───────────────────────┐
         │ Deployment        │   │  LiteLLM REST API     │
         │ ConfigMap         │   │  (bidirectional sync) │
         │ Secrets / HPA     │   │  crd-wins · api-wins  │
         │ Ingress / Route / │   │  preserve · prune     │
         │ HTTPRoute         │   │  adopt                │
         │ ServiceMonitor    │   │                       │
         │ PrometheusRule    │   │                       │
         │ Grafana dashboard │   │                       │
         └─────────┬─────────┘   └───────────┬───────────┘
                   ▼                         ▼
         ┌──────────────────────────────────────────┐
         │  LiteLLM Proxy  +  Postgres  +  Redis    │
         └──────────────────────────────────────────┘

Features

Area	What you get
Infrastructure	Declarative Deployment, ConfigMap, Service, Secrets, HPA v2, PDB, NetworkPolicy; migration Jobs per image tag; auto-rollback on `ProgressDeadlineExceeded`; topology spread constraints; `runAsNonRoot` mode using the official non-root image
Networking	Kubernetes Ingress, OpenShift Route, Gateway API HTTPRoute — pick one declaratively per instance
Multi-tenancy	Full Organization → Team → User → Key hierarchy with budgets, member management (`crd` / `sso` / `mixed` modes), and org-scoped model access
API-managed CRDs	`LiteLLMOrganization`, `LiteLLMModel`, `LiteLLMTeam`, `LiteLLMUser`, `LiteLLMCustomer`, `LiteLLMVirtualKey` — reconciled via the LiteLLM REST API with finalizer-based cleanup and spec-hash change detection
Config-managed CRDs	`LiteLLMCredential` (reusable provider API keys via `credentialRef`) and `LiteLLMGuardrail` (Aporia, Lakera, Presidio, Bedrock, LLM Guard, Guardrails AI, Azure, Google Text Moderation, custom) — materialized into `proxy_server_config.yaml`, keys injected via `secretKeyRef` (never read into operator memory)
Bidirectional sync	Periodic drift detection with `crd-wins` / `api-wins` / `manual` resolution and `preserve` / `prune` / `adopt` policies for unmanaged resources
VirtualKey lifecycle	Generated API keys stored in owner-referenced Kubernetes Secrets; rotation and revocation follow CRD deletion
Authentication	SSO for Azure Entra, Okta, Google, generic OIDC; SCIM v2 provisioning; JWT and OAuth2 auth for M2M flows; custom SSO handlers via ConfigMap or image
Security	IP allowlisting with `X-Forwarded-For` support, RBAC via `spec.rbac`, external secret managers (AWS Secrets Manager / KMS, Azure Key Vault, Google Secret Manager / KMS, HashiCorp Vault) with IRSA and workload-identity support
Reliability	6-backend response caching (Redis / S3 / GCS / Qdrant / Redis-semantic / local), fallback chains (default, per-model, content-policy, context-window), per-error-type retry policies, tag-based routing, per-provider budget caps
Observability	ServiceMonitor + PrometheusRule with 6 built-in alerts and runbook annotations; auto-provisioned Grafana dashboard ConfigMap
Data	Optional CloudNativePG integration with `ScheduledBackup` (snapshot or `barmanObjectStore`)
Admin UI	Disable, admin-only mode, DB-backed model management, personal-key gating, custom docs URL, logo, email branding, color themes via ConfigMap
Distribution	OLM bundle for OperatorHub / OpenShift Catalog and a Helm chart for clusters without OLM
Enterprise	Convention-based license Secret detection (`{instance}-license` or `litellm-license`) with `EnterpriseLicenseRequired` status conditions when unlicensed enterprise features are requested

Full documentation: see the docs/ folder — guides for SSO, config sync, caching, RBAC, observability, secret managers, and per-CRD reference pages under docs/reference/.

Custom Resource Definitions

CRD	Short Name	Description
`LiteLLMInstance`	`li`	Deploys a LiteLLM proxy with database, Redis, networking, and SSO
`LiteLLMOrganization`	`lo`	Creates an organization for multi-tenant isolation with budget and model access
`LiteLLMModel`	`lm`	Registers a model (e.g., `openai/gpt-4o`) with the proxy
`LiteLLMTeam`	`lt`	Creates a team with budget limits and member management
`LiteLLMUser`	`lu`	Creates a user (service accounts, bot users, non-SSO environments)
`LiteLLMCustomer`	`lcust`	Manages an external end-user (SaaS customer) with budgets and rate limits
`LiteLLMCredential`	`lc`	Defines a reusable provider credential (API key + optional base URL) shared across models
`LiteLLMGuardrail`	`lg`	Defines a content moderation / safety integration (Aporia, Lakera, Presidio, Bedrock, etc.)
`LiteLLMVirtualKey`	`lk`	Generates an API key scoped to a team/user with budget and rate limits

All secondary resources reference a LiteLLMInstance via spec.instanceRef. Teams can optionally reference a LiteLLMOrganization via spec.organizationRef.

Prerequisites

Go 1.22+
Docker 17.03+
kubectl v1.28+
Access to a Kubernetes v1.28+ cluster
A PostgreSQL database for LiteLLM state storage

Quick Start

1. Install CRDs

make install

2. Deploy the operator

make deploy IMG=ghcr.io/palenaai/litellm-operator:latest

3. Create a database secret

kubectl create secret generic litellm-db-credentials \
  --from-literal=DATABASE_URL='postgresql://user:pass@host:5432/litellm'

4. Deploy a LiteLLM instance

apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
  name: my-gateway
spec:
  replicas: 2
  masterKey:
    autoGenerate: true
  database:
    external:
      connectionSecretRef:
        name: litellm-db-credentials
        key: DATABASE_URL
  service:
    type: ClusterIP
    port: 4000

5. Register a model

apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMModel
metadata:
  name: gpt4o
spec:
  instanceRef:
    name: my-gateway
  modelName: gpt-4o
  litellmParams:
    model: openai/gpt-4o
    apiKeySecretRef:
      name: openai-credentials
      key: OPENAI_API_KEY

Reusable Credentials

When many models share the same provider API key (e.g., several OpenAI deployments), define the credential once and reference it from each LiteLLMModel via credentialRef:

apiVersion: v1
kind: Secret
metadata:
  name: openai-credentials
type: Opaque
stringData:
  api-key: sk-...
---
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMCredential
metadata:
  name: openai-prod
spec:
  instanceRef:
    name: my-gateway
  # The name used under `credential_list` in the generated proxy config.
  # Models reference this via `litellm_params.litellm_credential_name`.
  credentialName: openai-prod
  apiKeySecretRef:
    name: openai-credentials
    key: api-key
  # Optional extras merged into credential_info (api_base / api_version /
  # free-form params are supported — params cannot override reserved keys).
  apiBase: https://api.openai.com/v1
---
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMModel
metadata:
  name: gpt4o
spec:
  instanceRef:
    name: my-gateway
  modelName: gpt-4o
  litellmParams:
    model: openai/gpt-4o
    credentialRef:
      name: openai-prod   # takes precedence over inline apiKeySecretRef/apiBase

The operator injects the API key into the proxy pod via a secretKeyRef-backed env var (CREDENTIAL_OPENAI_PROD_API_KEY) and writes a matching os.environ/… reference to the generated proxy_server_config.yaml — the key value itself is never read into the operator's memory. Rotating the key is a Secret update: the operator rolls out a new Deployment pod to pick up the new value.

6. Create a team and API key

apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMTeam
metadata:
  name: engineering
spec:
  instanceRef:
    name: my-gateway
  teamAlias: engineering
  models: [gpt-4o]
  maxBudgetMonthly: 1000
  budgetDuration: "30d"
  members:
    - email: [email protected]
      role: user
---
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMVirtualKey
metadata:
  name: eng-ci-key
spec:
  instanceRef:
    name: my-gateway
  keyAlias: eng-ci-key
  teamRef:
    name: engineering
  models: [gpt-4o]
  maxBudget: "100"

Multi-Tenant Organizations

Create an organization to group teams under a shared budget and model access policy:

apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMOrganization
metadata:
  name: acme-corp
spec:
  instanceRef:
    name: my-gateway
  organizationAlias: acme-corp
  models: [gpt-4o, claude-4-sonnet]
  maxBudget: 5000
  budgetDuration: "30d"
  members:
    - email: [email protected]
      role: org_admin
    - email: [email protected]
      role: internal_user
---
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMTeam
metadata:
  name: acme-engineering
spec:
  instanceRef:
    name: my-gateway
  organizationRef:
    name: acme-corp
  teamAlias: acme-engineering
  models: [gpt-4o]

OpenShift / Non-Root Environments

For OpenShift or clusters enforcing Pod Security Standards, enable non-root mode:

apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
  name: my-gateway
spec:
  security:
    runAsNonRoot: true
  # ... rest of spec

This automatically switches to the official litellm-non_root image (runs as nobody, UID 65534) and applies a restricted pod security context compatible with OpenShift's restricted SCC.

IP Allowlisting (Enterprise)

Restrict API access to specific IP addresses or CIDR ranges:

apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
  name: my-gateway
spec:
  security:
    ipAllowlist:
      enabled: true
      allowedIPs:
        - "10.0.0.0/8"
        - "192.168.1.0/24"
        - "203.0.113.50"
      useXForwardedFor: true  # required behind load balancers
  # ... rest of spec

RBAC (Role-Based Access Control)

Enforce route restrictions and key generation controls:

apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
  name: my-gateway
spec:
  rbac:
    enabled: true
    adminOnlyRoutes:
      - /model/new
      - /model/delete
    allowedRoutes:
      - /chat/completions
      - /embeddings
      - /key/info
    defaultTeamDisabled: true   # force team-based keys
    keyGeneration:              # enterprise
      teamKeyGeneration:
        allowedTeamMemberRoles: ["admin"]
    rolePermissions:            # enterprise
      internal_user:
        routes: ["/key/generate", "/key/info"]
        models: ["gpt-4", "claude-3-haiku"]
  # ... rest of spec

OpenShift Route

For OpenShift clusters, create a Route instead of an Ingress:

apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
  name: my-gateway
spec:
  route:
    enabled: true
    host: litellm.apps.example.com
    tlsTermination: edge   # edge | passthrough | reencrypt
  # ... rest of spec

Gateway API HTTPRoute

For clusters using the Gateway API (Istio, Envoy Gateway, Cilium, etc.):

apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
  name: my-gateway
spec:
  gatewayHTTPRoute:
    enabled: true
    host: litellm.example.com
    parentRefs:
      - name: my-gateway       # Name of the Gateway resource
        namespace: istio-system # Optional: namespace of the Gateway
        sectionName: https     # Optional: specific listener on the Gateway
  # ... rest of spec

Observability (Prometheus + Grafana)

Enable ServiceMonitor, alerting rules, and a Grafana dashboard:

apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
  name: my-gateway
spec:
  observability:
    serviceMonitor:
      enabled: true
      interval: "30s"
    prometheusRule:
      enabled: true
      # disabledAlerts: ["LiteLLMHighCPUUsage"]  # optionally disable specific alerts
    grafanaDashboard:
      enabled: true
      folder: "LiteLLM"
  # ... rest of spec

Built-in alerts: LiteLLMInstanceDown (critical), LiteLLMInstanceDegraded, LiteLLMPodRestarting, LiteLLMPodNotReady, LiteLLMHighMemoryUsage, LiteLLMHighCPUUsage. Each alert includes a runbook annotation with troubleshooting commands.

CloudNativePG Backups

When using CloudNativePG for the database, enable scheduled backups:

apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
  name: my-gateway
spec:
  database:
    cloudnativepg:
      clusterName: litellm-db
      backup:
        enabled: true
        schedule: "0 2 * * *"   # daily at 2am
        retention: 7
        method: snapshot        # snapshot or barmanObjectStore
  # ... rest of spec

Tag-Based Routing

Route requests to different model deployments based on tags. Useful for free/paid tiers or team-specific model access:

apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
  name: my-gateway
spec:
  routerSettings:
    enableTagFiltering: true
  # ... rest of spec
---
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMModel
metadata:
  name: gpt4-paid
spec:
  instanceRef:
    name: my-gateway
  modelName: gpt-4
  litellmParams:
    model: openai/gpt-4
    apiKeySecretRef:
      name: openai-credentials
      key: OPENAI_API_KEY
  tags: ["paid"]
---
apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMTeam
metadata:
  name: paid-tier
spec:
  instanceRef:
    name: my-gateway
  teamAlias: paid-tier
  tags: ["paid"]

Requests from the paid-tier team are routed to model deployments tagged paid. Use tagFilteringMatchAny: true in routerSettings to match requests having ANY of the specified tags (default is ALL must match).

Fallback Chains

Configure model fallback routing so requests automatically try alternative models on failure:

apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
  name: my-gateway
spec:
  fallbacks:
    # Global fallbacks applied on any error
    defaultFallbacks: ["gpt-4-mini", "claude-3-haiku"]

    # Per-model fallbacks for general errors
    modelFallbacks:
      - model: gpt-4
        fallbacks: ["gpt-4-mini", "claude-3-haiku"]

    # Fallbacks for content policy violations
    contentPolicyFallbacks:
      - model: gpt-4
        fallbacks: ["claude-3-sonnet"]

    # Fallbacks for context window exceeded
    contextWindowFallbacks:
      - model: gpt-4
        fallbacks: ["gpt-4-32k", "claude-3-sonnet"]

    maxFallbacks: 3

  routerSettings:
    # Retry policy by error type (retries on same model before fallback)
    retryPolicy:
      TimeoutError: 2
      RateLimitError: 3
      ContentPolicyViolationError: 0
    # Per-model-group retry overrides
    modelGroupRetryPolicy:
      gpt-4:
        TimeoutError: 1
        RateLimitError: 0
  # ... rest of spec

Response Caching

Configure response caching to reduce latency and costs:

apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
  name: my-gateway
spec:
  caching:
    enabled: true
    type: redis             # redis, redis-semantic, s3, gcs, qdrant, local
    ttl: 600                # cache TTL in seconds
    namespace: "my-app"     # key isolation namespace
    mode: default_on        # default_on or default_off
    supportedCallTypes:     # restrict to specific call types
      - acompletion
      - aembedding
    redis:
      host: redis.example.com
      port: 6379
      passwordSecretRef:
        name: redis-secret
        key: password
      ssl: true
  # ... rest of spec

When type: redis and no caching.redis block is provided, the operator reuses the instance's existing spec.redis connection — no need to duplicate Redis details.

Other backends: s3 (with bucket, region, AWS credentials), gcs (with bucket, GCS service account), qdrant (semantic caching with embeddings), local (in-memory, no external dependencies).

Auto-Rollback

Automatically rollback failed deployments:

apiVersion: litellm.palena.ai/v1alpha1
kind: LiteLLMInstance
metadata:
  name: my-gateway
spec:
  upgrade:
    strategy: rolling
    autoRollback: true
    healthCheckTimeout: "300s"
  # ... rest of spec

When enabled, the operator tracks the last successful deployment revision. If a new deployment exceeds the progress deadline, the operator triggers a rollback and sets a status condition.

Enterprise License

To activate LiteLLM Enterprise features, create a Secret with your license key. The operator detects it automatically and injects the LITELLM_LICENSE environment variable into the proxy Deployment.

Per-instance license (takes precedence):

kubectl create secret generic my-gateway-license \
  --from-literal=license-key='your-litellm-enterprise-license-key'

Namespace-wide license (fallback for all instances in the namespace):

kubectl create secret generic litellm-license \
  --from-literal=license-key='your-litellm-enterprise-license-key'

The operator checks for {instance-name}-license first, then falls back to litellm-license. License status is reported in .status.license:

kubectl get litellminstance my-gateway -o jsonpath='{.status.license}'
# {"active":true,"secretName":"my-gateway-license"}

If a downstream resource (Model, Team, User, VirtualKey) requires an enterprise feature and no license is present, the operator sets Reason: EnterpriseLicenseRequired on the resource's status condition without retrying.

Namespace-Scoped Watching

By default, the operator watches all namespaces. To restrict it to specific namespaces:

Helm:

helm install litellm-operator deploy/charts/litellm-operator/ \
  --set watchNamespaces="team-a,team-b"

Flag:

/manager --watch-namespaces=team-a,team-b

Environment variable (set automatically by OLM for OwnNamespace/SingleNamespace install modes):

WATCH_NAMESPACE=team-a,team-b

7. Retrieve a generated API key

The generated API key is stored in a Secret (default name: {name}-key):

kubectl get secret eng-ci-key-key -o jsonpath='{.data.api_key}' | base64 -d

Installation Methods

Direct (Makefile)

make install       # Install CRDs
make deploy        # Deploy operator

OLM (OpenShift / clusters with OLM)

operator-sdk run bundle ghcr.io/palenaai/litellm-operator-bundle:v0.11.1

Helm

helm install litellm-operator deploy/charts/litellm-operator/

Development

Build

make build                    # Build operator binary
make docker-build IMG=...     # Build container image

Test

make test          # Unit + integration tests (envtest)
make test-e2e      # End-to-end tests (requires cluster)

Generate

make generate      # DeepCopy functions
make manifests     # CRD YAMLs, RBAC, webhooks

Run locally (against current kubeconfig cluster)

make install       # Install CRDs first
make run           # Run operator outside the cluster

Architecture

Key design points:

LiteLLMInstance controller manages Deployment, ConfigMap, Service, Secrets, Ingress, HPA, PDB, NetworkPolicy, migration Jobs, ServiceMonitor, PrometheusRule, Grafana dashboard ConfigMaps, and CNPG ScheduledBackups
Secondary controllers (Organization, Model, Team, User, VirtualKey) resolve their instanceRef to discover the LiteLLM API endpoint and master key, then sync state via the REST API
Finalizers ensure cleanup: deleting a CRD calls the corresponding LiteLLM API delete endpoint before removing the Kubernetes resource
Spec hash annotations (litellm.palena.ai/sync-hash) enable change detection to avoid unnecessary API calls

Project Structure

api/v1alpha1/          CRD type definitions
internal/controller/   Reconciliation controllers
internal/litellm/      LiteLLM REST API client
internal/resources/    Kubernetes resource generators
config/crd/bases/      Generated CRD manifests
config/samples/        Example custom resources
bundle/                OLM bundle manifests
deploy/charts/         Helm chart

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.devcontainer		.devcontainer
.github		.github
api/v1alpha1		api/v1alpha1
bundle		bundle
cmd		cmd
config		config
deploy/charts/litellm-operator		deploy/charts/litellm-operator
dist		dist
docs		docs
hack		hack
internal		internal
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.licenserc.yaml		.licenserc.yaml
CHANGELOG.md		CHANGELOG.md
CLA.md		CLA.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
PROJECT		PROJECT
README.md		README.md
SECURITY.md		SECURITY.md
bundle.Dockerfile		bundle.Dockerfile
go.mod		go.mod
go.sum		go.sum
renovate.json		renovate.json

Folders and files

Latest commit

History

Repository files navigation

LiteLLM Operator

Why this operator?

Architecture at a glance

Features

Custom Resource Definitions

Prerequisites

Quick Start

1. Install CRDs

2. Deploy the operator

3. Create a database secret

4. Deploy a LiteLLM instance

5. Register a model

Reusable Credentials

6. Create a team and API key

Multi-Tenant Organizations

OpenShift / Non-Root Environments

IP Allowlisting (Enterprise)

RBAC (Role-Based Access Control)

OpenShift Route

Gateway API HTTPRoute

Observability (Prometheus + Grafana)

CloudNativePG Backups

Tag-Based Routing

Fallback Chains

Response Caching

Auto-Rollback

Enterprise License

Namespace-Scoped Watching

7. Retrieve a generated API key

Installation Methods

Direct (Makefile)

OLM (OpenShift / clusters with OLM)

Helm

Development

Build

Test

Generate

Run locally (against current kubeconfig cluster)

Architecture

Project Structure

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages