-
Notifications
You must be signed in to change notification settings - Fork 23
feat: Add default system users for operator and exporter authentication #110
Description
Summary
Create two default system users (_operator and _exporter) that are automatically injected into every ValkeyCluster's ACL configuration. This ensures the operator and metrics exporter can authenticate to Valkey independently of any user-defined ACLs, so that changes to spec.users[] cannot accidentally break operator or exporter connectivity.
Motivation
Currently the operator and exporter sidecar connect to Valkey unauthenticated. If a cluster admin defines users and disables the default user, the operator loses the ability to manage the cluster. System users with auto-generated credentials solve this by giving the operator and exporter their own stable identities.
Design decisions
defaultuser is not touched. Cluster admins manage it themselves viaspec.users[].- System users are always injected, regardless of whether
spec.users[]is empty or populated. - Passwords are auto-generated (32 random bytes, hex-encoded) on first reconcile and stored in an operator-managed Secret. No admin setup required. Passwords are stored as plaintext hex strings in the Secret. At ACL generation time, they are SHA256-hashed and prefixed with
#in the ACL line, consistent with existing user password handling inbuildUserAcl. - Credentials are unique per
ValkeyClusterto limit blast radius if a Secret is compromised. _operatorhas broad permissions (+@all) for now since the operator's command surface is still evolving. Tightening permissions is a future issue._exporterhas locked-down permissions matching the redis_exporter recommended ACL.- Names prefixed with
_are reserved for system users.spec.users[]entries starting with_are rejected at admission time via a CEL validation rule on the CRD. - **
_exporteruser is only created whenexporter.enabled== true
System users
_operator
user _operator on #<sha256-of-password> ~* &* +@all
Used by the operator's Valkey client to issue cluster management commands (CLUSTER MEET, CLUSTER ADDSLOTSRANGE, CLUSTER REPLICATE, CLUSTER NODES, INFO, etc.).
_exporter
user _exporter on #<sha256-of-password> ~* &* +@connection +memory -readonly +strlen +config|get +xinfo +pfcount +zcard +type +xlen +scard +llen +hlen +get +eval +slowlog +cluster|info +cluster|slots +cluster|nodes +info +latency +scan +client
Used by the redis_exporter sidecar for metrics scraping. Permission set matches the exporter project's documentation.
System password Secret
A new Secret is created per cluster to hold auto-generated passwords. Use a helper function getSystemPasswordSecretName(clusterName) (matching the existing getInternalSecretName pattern):
apiVersion: v1
kind: Secret
metadata:
name: internal-<cluster>-system-passwords
namespace: <namespace>
ownerReferences: [<ValkeyCluster>]
labels: { <standard operator labels> }
data:
_operator: <random-64-char-hex>
_exporter: <random-64-char-hex>- Created on first reconcile if it doesn't exist.
- Read on subsequent reconciles — passwords are never regenerated unless the Secret is deleted.
- Garbage collected with the
ValkeyClusterviaownerReferences.
Reconciliation flow
Update reconcileUsersAcl to:
- Validate
spec.users[]— reject any name starting with_(see Validation mechanism below). - Ensure
internal-<cluster>-system-passwordsSecret exists.- If missing: generate random passwords, create the Secret.
- If exists: read existing passwords.
- Build system user ACL lines (
_operator,_exporter) using SHA256-hashed passwords from the Secret. - Build user-defined ACL lines from
spec.users[](existing logic, unchanged). - Concatenate system ACLs + user ACLs into a single
users.acl. - Hash and update
internal-<cluster>-aclSecret (existing logic, unchanged).
Validation mechanism
Reject spec.users[] entries with names starting with _ using a CEL validation rule on the UserAclSpec.Name field:
// +kubebuilder:validation:XValidation:rule="!self.startsWith('_')",message="usernames starting with '_' are reserved for system users"
Name string `json:"name"`This rejects at admission time, giving the user immediate feedback. Requires a change to api/v1alpha1/valkeyacls_types.go.
Operator authentication
Update the Valkey client connection in internal/valkey/clusterstate.go:
- The
GetClusterStatefunction signature must change to accept optional credentials (username + password). Its only caller is ininternal/controller/valkeycluster_controller.go(line ~469), which must be updated to read the_operatorpassword from the system password Secret and pass it through. getNodeState(called byGetClusterState) passesUsernameandPasswordinClientOptionwhen creating the rueidis client.
Auth fallback
On a fresh cluster where pods start before the ACL file is loaded, the operator must handle authentication gracefully:
- Try connecting with
_operatorcredentials first. - If the connection returns a
NOAUTHorWRONGPASSerror specifically, fall back to unauthenticated connection. - Do not fall back on other errors (network timeouts, connection refused, etc.) — those indicate real problems that should surface.
Secret deletion and self-healing
If the system password Secret is deleted:
- The next reconcile detects the missing Secret and generates new passwords.
- The new passwords are written to a new Secret and the
users.aclis regenerated. - However, Valkey does not hot-reload the ACL file — pods must restart to pick up the new
users.acl. Live ACL reload (ACL LOAD) is out of scope for this issue. - Until pods restart, the operator falls back to unauthenticated if the
defaultuser is still enabled, or connections will fail if thedefaultuser is disabled (this is expected — deleting operator-managed Secrets is a destructive action).
Exporter authentication
Update the exporter container definition in internal/controller/metrics_exporter.go:
-
Add environment variables sourced from the system password Secret:
env: - name: REDIS_USER value: "_exporter" - name: REDIS_PASSWORD valueFrom: secretKeyRef: name: internal-<cluster>-system-passwords key: _exporter
-
The redis_exporter natively reads
REDIS_USERandREDIS_PASSWORD.
Watch trigger for system password Secret
The existing findReferencedClusters watch only matches secrets referenced in spec.users[].passwordSecret.name. The new system password Secret is not referenced from spec.users[], so changes to it (including deletion) would not trigger a reconcile through that path.
Since the system password Secret already has ownerReferences pointing to the ValkeyCluster, the controller should additionally call Owns(&corev1.Secret{}) to get owner-based reconciliation for all owned Secrets. This ensures deletion of the system password Secret triggers a reconcile and self-healing.
Files to change
| File | Change |
|---|---|
api/v1alpha1/valkeyacls_types.go |
Add CEL validation rule on UserAclSpec.Name to reject _ prefix |
internal/controller/users.go |
Add getSystemPasswordSecretName helper, system user generation, password Secret create/read, prepend system ACLs in reconcileUsersAcl |
internal/valkey/clusterstate.go |
Update GetClusterState and getNodeState signatures to accept optional credentials; add Username/Password to ClientOption; add auth fallback on NOAUTH/WRONGPASS |
internal/controller/valkeycluster_controller.go |
Update GetClusterState call site (~line 469) to read and pass _operator credentials; add Owns(&corev1.Secret{}) to the controller watch setup |
internal/controller/metrics_exporter.go |
Add REDIS_USER/REDIS_PASSWORD env vars from system password Secret |
internal/controller/users_test.go |
Tests for system ACL generation and password Secret lifecycle |
Out of scope
- Live ACL reload (
ACL LOADon each node afterusers.aclchanges) — separate issue - Password rotation — the
UserAclSpecalready supports multiple passwords per user, which is the foundation for this - Tightening
_operatorpermissions — once the operator's command surface stabilizes - Admin override of system user passwords — allowing admins to provide their own Secret reference
Acceptance criteria
-
_operatorand_exporterACL lines appear inusers.aclfor everyValkeyCluster -
internal-<cluster>-system-passwordsSecret is created with random passwords on first reconcile - Passwords are stable across reconciles (not regenerated)
- Operator authenticates as
_operatorwhen connecting to Valkey nodes - Auth fallback to unauthenticated triggers only on
NOAUTH/WRONGPASS, not on other errors - Exporter sidecar receives
REDIS_USER/REDIS_PASSWORDenv vars -
spec.users[]entries with names starting with_are rejected at admission via CEL validation - Deleting the system password Secret triggers self-healing (new passwords generated, pods need restart to take effect)
- Controller watches owned Secrets so system password Secret deletion triggers reconcile
- Unit tests cover system ACL generation, password Secret lifecycle, and auth fallback logic
Metadata
Metadata
Assignees
Labels
Type
Projects
Status