Skip to content

Conversation

@akankshapanse
Copy link
Contributor

@akankshapanse akankshapanse commented Dec 2, 2025

What this PR does / why we need it:
This PR adds changes to use Patch() method to update status of CnsRegisterVolume CR, instead of Update(), since Update() fails sometimes due to error as below, that causes further issue such as double quota accounting if update to mark success fails.

2025-11-13T10:25:57.869570556Z stderr F {"level":"error","time":"2025-11-13T10:25:57.86932921Z","caller":"kubernetes/kubernetes.go:787","msg":"Failed to update status. err: Operation cannot be fulfilled on cnsregistervolumes.cns.vmware.com \"cnsregvol-8cxcl\": the object has been modified; please apply your changes to the latest version and try again","kind":"CnsRegisterVolume","name":"e2e-test-namespace-274/cnsregvol-8cxcl","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/kubernetes.UpdateStatus\n\t/build/mts/release/sb-91958690/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/kubernetes/kubernetes.go:787\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/syncer/cnsoperator/controller/cnsregistervolume.setInstanceSuccess\n\t/build/mts/release/sb-91958690/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/syncer/cnsoperator/controller/cnsregistervolume/cnsregistervolume_controller.go:1071\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/syncer/cnsoperator/controller/cnsregistervolume.(*ReconcileCnsRegisterVolume).Reconcile\n\t/build/mts/release/sb-91958690/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/syncer/cnsoperator/controller/cnsregistervolume/cnsregistervolume_controller.go:802\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/build/mts/release/sb-91958690/cayman_vsphere_csi_driver/vsphere_csi_driver/go/pkg/mod/sigs.k8s.io/[[email protected]](mailto:[email protected])/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/build/mts/release/sb-91958690/cayman_vsphere_csi_driver/vsphere_csi_driver/go/pkg/mod/sigs.k8s.io/[[email protected]](mailto:[email protected])/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/build/mts/release/sb-91958690/cayman_vsphere_csi_driver/vsphere_csi_driver/go/pkg/mod/sigs.k8s.io/[[email protected]](mailto:[email protected])/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/build/mts/release/sb-91958690/cayman_vsphere_csi_driver/vsphere_csi_driver/go/pkg/mod/sigs.k8s.io/[[email protected]](mailto:[email protected])/pkg/internal/controller/controller.go:224"}

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Testing done:
WCP precheckin pipeline : https://jenkins-vcf-csifvt.devops.broadcom.net/job/wcp-instapp-e2e-pre-checkin/724/ - passed
VKS precheckin pipeline : https://jenkins-vcf-csifvt.devops.broadcom.net/view/instapp/job/vks-instapp-e2e-pre-checkin/668/ -passed
Manual testing:

Error case: CnsRegisterVolume for non-existing volume ID

root@42026f8c18cfe986a6078f15efe59749 [ ~ ]# k get pods -n vmware-system-csi -o yaml | grep image: | grep syncer
      image: cns-docker-dev-local.packages.vcfd.broadcom.net/pansea/syncer:cnsregistervolume_4dec-3
     
root@42026f8c18cfe986a6078f15efe59749 [ ~ ]# cat cnsregistervolume.yaml
apiVersion: cns.vmware.com/v1alpha1
kind: CnsRegisterVolume
metadata:
  name: example-register-volume-cr
  namespace: e2e-test-namespace-747
spec:
  volumeID: "5e27e93e-eda6-489c-8bdd-95ee5804b8be" # The internal vSphere CNS Volume ID (UUID)
  pvcName: "example-pvc-name"                 # Desired name for the resulting Kubernetes PVC
  volumeMode: "Block"
  accessMode: "ReadWriteOnce"
root@42026f8c18cfe986a6078f15efe59749 [ ~ ]#

root@42026f8c18cfe986a6078f15efe59749 [ ~ ]# k create -f cnsregistervolume.yaml
cnsregistervolume.cns.vmware.com/example-register-volume-cr created
root@42026f8c18cfe986a6078f15efe59749 [ ~ ]#

root@42026f8c18cfe986a6078f15efe59749 [ ~ ]# k get cnsregistervolume -A -oyaml
...
- apiVersion: cns.vmware.com/v1alpha1
  kind: CnsRegisterVolume
  metadata:
    creationTimestamp: "2025-12-04T08:47:54Z"
    generation: 1
    name: example-register-volume-cr
    namespace: e2e-test-namespace-747
    resourceVersion: "9369943"
    uid: a550946f-738c-4c98-b223-8a9c6d432329
  spec:
    accessMode: ReadWriteOnce
    pvcName: example-pvc-name
    volumeID: 5e27e93e-eda6-489c-8bdd-95ee5804b8be
    volumeMode: Block
  status:
    error: 'failed to create CNS volume. Error: ServerFaultCode: The object or item      <<< status updated with patch operation successful
      referred to could not be found.'
    registered: false

success case:

root@42026f8c18cfe986a6078f15efe59749 [ ~ ]# k create -f cnsregistervolume.yaml
cnsregistervolume.cns.vmware.com/example-register-volume-cr created
root@42026f8c18cfe986a6078f15efe59749 [ ~ ]#

root@42026f8c18cfe986a6078f15efe59749 [ ~ ]# k get cnsregistervolume -A -oyaml
...
- apiVersion: cns.vmware.com/v1alpha1
  kind: CnsRegisterVolume
  metadata:
    creationTimestamp: "2025-12-04T09:17:33Z"
    generation: 1
    name: example-register-volume-cr
    namespace: e2e-test-namespace-747
    ownerReferences:
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: PersistentVolumeClaim
      name: example-pvc-name
      uid: 2499d87a-6ea8-4873-ac23-1202e78de190
    resourceVersion: "9397808"
    uid: 7f0a8e71-77d9-4dca-afdc-7e85023db211
  spec:
    accessMode: ReadWriteOnce
    pvcName: example-pvc-name
    volumeID: 1226ff2e-12b6-4f95-bc13-08bc7ab775ed
    volumeMode: Block
  status:
    registered: true
      

Special notes for your reviewer:

Release note:

Use Patch() to update status of CnsRegisterVolume CR instead of Update()

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 2, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akankshapanse

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 2, 2025
@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #683

@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #689

@akankshapanse akankshapanse force-pushed the topic/ap024703/fix_update_with_patch_cnsregistervol branch from 8cc0ac8 to 6668356 Compare December 4, 2025 08:50
@akankshapanse akankshapanse changed the title [WIP]Use Patch() to update status of CnsRegisterVolume CR instead of Update() Use Patch() to update status of CnsRegisterVolume CR instead of Update() Dec 4, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 4, 2025
@akankshapanse akankshapanse force-pushed the topic/ap024703/fix_update_with_patch_cnsregistervol branch 2 times, most recently from 38c8863 to ebc8fb5 Compare December 4, 2025 10:39
@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #704

@akankshapanse akankshapanse force-pushed the topic/ap024703/fix_update_with_patch_cnsregistervol branch from ebc8fb5 to ebdf019 Compare December 5, 2025 07:13
@akankshapanse
Copy link
Contributor Author

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Dec 5, 2025
@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #707

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #707

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #652

@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #713

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #656

@deepakkinni
Copy link
Collaborator

FAILED --- Jenkins Build #713

@kolluria
Copy link
Contributor

kolluria commented Dec 8, 2025

@akankshapanse why don't we use the UpdateStatus function that we're using for other CRs?

Ref-

func UpdateStatus(ctx context.Context, c client.Client, obj client.Object) error {

Just need to make sure that we add the status sub-resource in the CRD spec.

@deepakkinni
Copy link
Collaborator

Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #659

@deepakkinni
Copy link
Collaborator

Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #660

@deepakkinni
Copy link
Collaborator

SUCCESS --- Jenkins Build #661

@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #724

@deepakkinni
Copy link
Collaborator

SUCCESS --- Jenkins Build #724

@akankshapanse
Copy link
Contributor Author

akankshapanse commented Dec 9, 2025

SUCCESS --- Jenkins Build #724

WCP precheckin passed


PR 3776

Ran 14 of 1847 Specs
SUCCESS! -- 14 Passed | 0 Failed | 0 Pending | 1833 Skipped | 0 Flaked

@akankshapanse
Copy link
Contributor Author

VKS precheckin passed:


PR 3776

Ran 6 of 1847 Specs
SUCCESS! -- 6 Passed | 0 Failed | 0 Pending | 1841 Skipped | 0 Flaked


rawPatch := client.RawPatch(apitypes.MergePatchType, patchBytes)

// Try to patch CnsRegisterVolume CR for 3 times
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if it fails after 3 retries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if Patch fails under setInstanceError(), the reconcile request gets requeued most of the times and the operation gets retried. if Patch fails under setInstanceSuccess(), the reconcile request gets requeued and next try would re-account quota for same object, which would be eventually corrected by full sync

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants