Skip to content

feat: Add default system users for operator and exporter authentication #110

@jdheyburn

Description

@jdheyburn

Summary

Create two default system users (_operator and _exporter) that are automatically injected into every ValkeyCluster's ACL configuration. This ensures the operator and metrics exporter can authenticate to Valkey independently of any user-defined ACLs, so that changes to spec.users[] cannot accidentally break operator or exporter connectivity.

Motivation

Currently the operator and exporter sidecar connect to Valkey unauthenticated. If a cluster admin defines users and disables the default user, the operator loses the ability to manage the cluster. System users with auto-generated credentials solve this by giving the operator and exporter their own stable identities.

Design decisions

  • default user is not touched. Cluster admins manage it themselves via spec.users[].
  • System users are always injected, regardless of whether spec.users[] is empty or populated.
  • Passwords are auto-generated (32 random bytes, hex-encoded) on first reconcile and stored in an operator-managed Secret. No admin setup required. Passwords are stored as plaintext hex strings in the Secret. At ACL generation time, they are SHA256-hashed and prefixed with # in the ACL line, consistent with existing user password handling in buildUserAcl.
  • Credentials are unique per ValkeyCluster to limit blast radius if a Secret is compromised.
  • _operator has broad permissions (+@all) for now since the operator's command surface is still evolving. Tightening permissions is a future issue.
  • _exporter has locked-down permissions matching the redis_exporter recommended ACL.
  • Names prefixed with _ are reserved for system users. spec.users[] entries starting with _ are rejected at admission time via a CEL validation rule on the CRD.
  • **_exporter user is only created when exporter.enabled == true

System users

_operator

user _operator on #<sha256-of-password> ~* &* +@all

Used by the operator's Valkey client to issue cluster management commands (CLUSTER MEET, CLUSTER ADDSLOTSRANGE, CLUSTER REPLICATE, CLUSTER NODES, INFO, etc.).

_exporter

user _exporter on #<sha256-of-password> ~* &* +@connection +memory -readonly +strlen +config|get +xinfo +pfcount +zcard +type +xlen +scard +llen +hlen +get +eval +slowlog +cluster|info +cluster|slots +cluster|nodes +info +latency +scan +client

Used by the redis_exporter sidecar for metrics scraping. Permission set matches the exporter project's documentation.

System password Secret

A new Secret is created per cluster to hold auto-generated passwords. Use a helper function getSystemPasswordSecretName(clusterName) (matching the existing getInternalSecretName pattern):

apiVersion: v1
kind: Secret
metadata:
  name: internal-<cluster>-system-passwords
  namespace: <namespace>
  ownerReferences: [<ValkeyCluster>]
  labels: { <standard operator labels> }
data:
  _operator: <random-64-char-hex>
  _exporter: <random-64-char-hex>
  • Created on first reconcile if it doesn't exist.
  • Read on subsequent reconciles — passwords are never regenerated unless the Secret is deleted.
  • Garbage collected with the ValkeyCluster via ownerReferences.

Reconciliation flow

Update reconcileUsersAcl to:

  1. Validate spec.users[] — reject any name starting with _ (see Validation mechanism below).
  2. Ensure internal-<cluster>-system-passwords Secret exists.
    • If missing: generate random passwords, create the Secret.
    • If exists: read existing passwords.
  3. Build system user ACL lines (_operator, _exporter) using SHA256-hashed passwords from the Secret.
  4. Build user-defined ACL lines from spec.users[] (existing logic, unchanged).
  5. Concatenate system ACLs + user ACLs into a single users.acl.
  6. Hash and update internal-<cluster>-acl Secret (existing logic, unchanged).

Validation mechanism

Reject spec.users[] entries with names starting with _ using a CEL validation rule on the UserAclSpec.Name field:

// +kubebuilder:validation:XValidation:rule="!self.startsWith('_')",message="usernames starting with '_' are reserved for system users"
Name string `json:"name"`

This rejects at admission time, giving the user immediate feedback. Requires a change to api/v1alpha1/valkeyacls_types.go.

Operator authentication

Update the Valkey client connection in internal/valkey/clusterstate.go:

  • The GetClusterState function signature must change to accept optional credentials (username + password). Its only caller is in internal/controller/valkeycluster_controller.go (line ~469), which must be updated to read the _operator password from the system password Secret and pass it through.
  • getNodeState (called by GetClusterState) passes Username and Password in ClientOption when creating the rueidis client.

Auth fallback

On a fresh cluster where pods start before the ACL file is loaded, the operator must handle authentication gracefully:

  • Try connecting with _operator credentials first.
  • If the connection returns a NOAUTH or WRONGPASS error specifically, fall back to unauthenticated connection.
  • Do not fall back on other errors (network timeouts, connection refused, etc.) — those indicate real problems that should surface.

Secret deletion and self-healing

If the system password Secret is deleted:

  1. The next reconcile detects the missing Secret and generates new passwords.
  2. The new passwords are written to a new Secret and the users.acl is regenerated.
  3. However, Valkey does not hot-reload the ACL file — pods must restart to pick up the new users.acl. Live ACL reload (ACL LOAD) is out of scope for this issue.
  4. Until pods restart, the operator falls back to unauthenticated if the default user is still enabled, or connections will fail if the default user is disabled (this is expected — deleting operator-managed Secrets is a destructive action).

Exporter authentication

Update the exporter container definition in internal/controller/metrics_exporter.go:

  • Add environment variables sourced from the system password Secret:

    env:
      - name: REDIS_USER
        value: "_exporter"
      - name: REDIS_PASSWORD
        valueFrom:
          secretKeyRef:
            name: internal-<cluster>-system-passwords
            key: _exporter
  • The redis_exporter natively reads REDIS_USER and REDIS_PASSWORD.

Watch trigger for system password Secret

The existing findReferencedClusters watch only matches secrets referenced in spec.users[].passwordSecret.name. The new system password Secret is not referenced from spec.users[], so changes to it (including deletion) would not trigger a reconcile through that path.

Since the system password Secret already has ownerReferences pointing to the ValkeyCluster, the controller should additionally call Owns(&corev1.Secret{}) to get owner-based reconciliation for all owned Secrets. This ensures deletion of the system password Secret triggers a reconcile and self-healing.

Files to change

File Change
api/v1alpha1/valkeyacls_types.go Add CEL validation rule on UserAclSpec.Name to reject _ prefix
internal/controller/users.go Add getSystemPasswordSecretName helper, system user generation, password Secret create/read, prepend system ACLs in reconcileUsersAcl
internal/valkey/clusterstate.go Update GetClusterState and getNodeState signatures to accept optional credentials; add Username/Password to ClientOption; add auth fallback on NOAUTH/WRONGPASS
internal/controller/valkeycluster_controller.go Update GetClusterState call site (~line 469) to read and pass _operator credentials; add Owns(&corev1.Secret{}) to the controller watch setup
internal/controller/metrics_exporter.go Add REDIS_USER/REDIS_PASSWORD env vars from system password Secret
internal/controller/users_test.go Tests for system ACL generation and password Secret lifecycle

Out of scope

  • Live ACL reload (ACL LOAD on each node after users.acl changes) — separate issue
  • Password rotation — the UserAclSpec already supports multiple passwords per user, which is the foundation for this
  • Tightening _operator permissions — once the operator's command surface stabilizes
  • Admin override of system user passwords — allowing admins to provide their own Secret reference

Acceptance criteria

  • _operator and _exporter ACL lines appear in users.acl for every ValkeyCluster
  • internal-<cluster>-system-passwords Secret is created with random passwords on first reconcile
  • Passwords are stable across reconciles (not regenerated)
  • Operator authenticates as _operator when connecting to Valkey nodes
  • Auth fallback to unauthenticated triggers only on NOAUTH/WRONGPASS, not on other errors
  • Exporter sidecar receives REDIS_USER/REDIS_PASSWORD env vars
  • spec.users[] entries with names starting with _ are rejected at admission via CEL validation
  • Deleting the system password Secret triggers self-healing (new passwords generated, pods need restart to take effect)
  • Controller watches owned Secrets so system password Secret deletion triggers reconcile
  • Unit tests cover system ACL generation, password Secret lifecycle, and auth fallback logic

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions