fix(logserver): Prevent race condition during reconciliation

calancha · calancha · commit 55e14df89656 · 2025-10-03T08:10:54.000+02:00
Resolves an issue where the logserver StatefulSet would hang for over
10 minutes during a rollout, a problem unique to this component. Pod
events showed repeated "FailedMount" warnings, eventually timing out
with the error: "Unable to attach or mount volumes: timed out waiting
for the condition".

The root cause was a race condition between the operator's reconciliation
loop and the Kubernetes controller managing the StatefulSet update.
Immediately after triggering a rollout, the operator would proceed to
reconcile the associated PVC. This interfered with the kubelet's process
of detaching the volume from the old pod and attaching it to the new one,
causing the prolonged timeout.

This commit fixes the race condition by ensuring that the DeployLogserver
function exits its reconciliation loop immediately after a StatefulSet
update has been triggered. This gives the Kubernetes volume controller
uninterrupted time to manage the PVC handover.

This change aligns the logserver controller's behavior with all other
StatefulSet controllers in the operator, which already followed this
pattern, correcting a historical inconsistency that only affected logserver.

Change-Id: I164ef03e0e4ef8557a1ec5effb0415b77a7c053f
diff --git a/controllers/logserver.go b/controllers/logserver.go
@@ -303,6 +303,9 @@ func (r *SFController) DeployLogserver() bool {
 	sts.Spec.Template.Spec.HostAliases = base.CreateHostAliases(r.cr.Spec.HostAliases)
 
 	current, stsUpdated := r.ensureStatefulset(storage.StorageClassName, sts)
+	if stsUpdated {
+		return false
+	}
 
 	pvcReadiness := r.reconcileExpandPVC(logserverIdent+"-"+logserverIdent+"-0", r.cr.Spec.Logserver.Storage)
 
diff --git a/doc/reference/CHANGELOG.md b/doc/reference/CHANGELOG.md
@@ -19,6 +19,7 @@ All notable changes to this project will be documented in this file.
 - Fix a few issues with the nodeAffinity setting logic, where the node affinity of a statefulset
   would only be updated when its annotations are changed; and where it would hang if the statefulset's
   replicas are set to 0.
+- Fix a race condition in the logserver controller that could cause PVCs to get stuck during operator upgrades or resource updates.
 
 ## [v0.0.58] - 2025-09-05