Skip to content

feat: Add persistent volume support for ValkeyCluster#85

Open
umar-riswan-ap wants to merge 3 commits intovalkey-io:mainfrom
umar-riswan-ap:feature/persistent-volume-support
Open

feat: Add persistent volume support for ValkeyCluster#85
umar-riswan-ap wants to merge 3 commits intovalkey-io:mainfrom
umar-riswan-ap:feature/persistent-volume-support

Conversation

@umar-riswan-ap
Copy link
Copy Markdown

  • Add StorageSpec to ValkeyCluster API with enabled, size, storageClassName, and accessModes fields
  • Add VolumePermissions field to enable init container for volume ownership
  • Implement StatefulSet support for persistent storage with PVC templates
  • Keep Deployment support for ephemeral storage (emptyDir)
  • Update valkey.conf with dir /data and AOF persistence settings
  • Add security contexts (non-root user 1001, drop ALL capabilities)
  • Add init container to chown /data directory when volumePermissions=true
  • Update RBAC with StatefulSet and PVC permissions
  • Add two sample configurations: persistent and ephemeral
  • Update README with storage features and examples
  • Tested on EKS with gp3 storage class

- Add StorageSpec to ValkeyCluster API with enabled, size, storageClassName, and accessModes fields
- Add VolumePermissions field to enable init container for volume ownership
- Implement StatefulSet support for persistent storage with PVC templates
- Keep Deployment support for ephemeral storage (emptyDir)
- Update valkey.conf with dir /data and AOF persistence settings
- Add security contexts (non-root user 1001, drop ALL capabilities)
- Add init container to chown /data directory when volumePermissions=true
- Update RBAC with StatefulSet and PVC permissions
- Add two sample configurations: persistent and ephemeral
- Update README with storage features and examples
- Tested on EKS with gp3 storage class

Signed-off-by: Umar Riswan A P <umarriswanap@gmail.com>
@umar-riswan-ap umar-riswan-ap force-pushed the feature/persistent-volume-support branch from e96f66c to c549bf3 Compare February 16, 2026 18:37
Signed-off-by: UMAR RISWAN A P <umarriswanap@gmail.com>
@jdheyburn
Copy link
Copy Markdown
Collaborator

Hi @umar-riswan-ap , thanks for the contribution!

Persistence support is definitely something we want in the operator. We're currently fleshing out the design for this. One key constraint is that StatefulSet VolumeClaimTemplates have immutable size fields, so we're exploring
alternatives like managing PVCs through a dedicated ValkeyNode controller. You can follow that discussion in this RFC: #83

We're going to keep open this PR for now until we've voted on whether the ValkeyNode CRD will go ahead.

If you're interested in contributing further, the best way to get involved would be to join the RFC discussion above or open an issue/discussion so we can align on design before implementation. Thanks again!

Implements automatic PVC expansion when storage size is increased in ValkeyCluster spec.

Features:
- Detects storage size changes and patches PVCs automatically
- Validates StorageClass supports allowVolumeExpansion
- Monitors expansion progress with status conditions
- Zero downtime - cluster remains operational during expansion
- Emits events for expansion stages (VolumeExpanding/VolumeExpanded)
- Requeues every 10s during expansion to monitor progress

Changes:
- Added internal/controller/volume_expansion.go with expansion logic
- Updated valkeycluster_controller.go to integrate expansion checks
- Added ReasonVolumeExpanding and ReasonVolumeExpanded status reasons
- Added StorageClass RBAC permissions (storage.k8s.io)
- Created comprehensive documentation and example manifests

Tested:
- Successfully expanded 12 PVCs from 2Gi to 5Gi on EKS cluster
- AWS EBS gp3 volumes with allowVolumeExpansion enabled
- Verified zero downtime and data integrity

Closes: Volume expansion feature request
Signed-off-by: Umar Riswan A P <umarriswanap@gmail.com>
@umar-riswan-ap umar-riswan-ap force-pushed the feature/persistent-volume-support branch from 496d3d2 to 75b6d18 Compare February 17, 2026 15:41
@umar-riswan-ap
Copy link
Copy Markdown
Author

@jdheyburn Thanks for the update! I completely understand the complexity around StatefulSet VolumeClaimTemplates being immutable. I've actually been working on an enhancement that addresses online PV scaling for graceful storage expansion without disruption and commited that too here. I'd be happy to share my approach in the RFC discussion or open a separate issue to discuss how it might complement the ValkeyNode controller design.
Would you prefer I add my thoughts to #83 directly, or should I open a new discussion/issue first? Happy to align with whatever direction the team decides to take!

@jdheyburn
Copy link
Copy Markdown
Collaborator

@umar-riswan-ap Would you mind starting up a new discussion on how volumes would be managed, and how we'd get round volume expansion too? Thanks!

@deepakpunjabi
Copy link
Copy Markdown
Contributor

I have a dependency on this PR due to SecurityContext already being implemented here which I need to move valkey-operator towards production. Let me know if I could be of any help.

@jdheyburn
Copy link
Copy Markdown
Collaborator

Since this PR was raised we've added ValkeyNode which manages StatefulSets and Deployments. I have a draft spec on what adding Volumes would look like. I can share that out when I get a chance hopefully tomorrow.

@deepakpunjabi what is your requirement for SecurityContext?

@deepakpunjabi
Copy link
Copy Markdown
Contributor

@jdheyburn We can't run image as root.

  Warning  Failed     8s (x20 over 3m32s)  kubelet            Error: container has runAsNonRoot and image will run as root (pod: "valkey-minimal-0-0-0_valkey-operator-system(54036920-f76d-4717-a17b-29124ae8278c)", container: server)

To fix this, I have already implement below functionalities in my local fork, if required I can push it in a separate PR.

// SecurityContext defines pod-level security attributes applied to all containers.
// See https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
// +optional
SecurityContext *corev1.PodSecurityContext `json:"securityContext,omitempty"`

// ImagePullSecrets references secrets in the same namespace for pulling container images.
// +optional
ImagePullSecrets []corev1.LocalObjectReference `json:"imagePullSecrets,omitempty"`

// ServiceAccountName is the name of the ServiceAccount to use for the pods.
// +optional
ServiceAccountName string `json:"serviceAccountName,omitempty"`

// TerminationGracePeriodSeconds is the duration in seconds the pod needs to terminate gracefully.
// +optional
TerminationGracePeriodSeconds *int64 `json:"terminationGracePeriodSeconds,omitempty"`

// TopologySpreadConstraints describe how pods should be spread across topology domains.
// +optional
TopologySpreadConstraints []corev1.TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty"`

I have a draft spec on what adding Volumes would look like. I can share that out when I get a chance hopefully tomorrow.

Looking forward to it!

@jdheyburn
Copy link
Copy Markdown
Collaborator

@deepakpunjabi I've raised the design conversation here

Can you raise a separate issue for those fields you mentioned please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants