-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix StatefulSet#deploy_succeeded? for OnDelete #950
Conversation
@pods.each do |pod| | ||
success &= pod.definition["metadata"]["labels"]["controller-revision-hash"] == status_data['updateRevision'] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the check I've removed because it's not needed as far as I can tell. This previously passed in some of our deploys due to kubernetes/kubernetes#106055 not reliably setting .status.currentRevision
properly.
I don't believe it should be necessary to look up the pods. The StatefulSet's status should be enough.
@@ -80,7 +81,7 @@ def build_ss_template(status: {}, updateStrategy: "RollingUpdate", rollout: "ful | |||
ss_fixture.dup.deep_merge( | |||
"status" => status, | |||
"spec" => {"updateStrategy" => {"type" => updateStrategy}}, | |||
"metadata" => {"annotations" => {"krane.shopify.io/rollout" => rollout}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This annotation wasn't set correctly.
e323e0d
to
0af220c
Compare
success &= desired_replicas == status_data['currentReplicas'].to_i | ||
|
||
elsif update_strategy == ONDELETE | ||
if update_strategy == ONDELETE && required_rollout != "full" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there known cases of update_strategy == DELETE
AND required_rollout == "full"
? If those conditions hold, krane
will time out unless something external deletes the pods, right? I suppose that's not a problem if that's what's been encoded in the manifest since that would signal that as the desired behaviour 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do know of one such case in kafka-infrastructure
, where this script in zonal-deploy-controller
is used to delete the pods. I don't know that this change would break anything there, but tagging @Shopify/streaming-foundations for visibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If those conditions hold, krane will time out unless something external deletes the pods, right?
Yes, that's the desired behaviour as I understand it. You opt in with the annotation set to full
to wait until something else rolls the pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tophatted (unsure if it's useful to at this point -- if a tophat is necessary, maybe it'd just involve manually deploying a test runtime and deleting/recreating an existing StatefulSet to confirm that changes introduced here aren't breaking changes?), but the code & logic look sound - thanks for fixing this, James!
What are you trying to accomplish with this PR?
In #926, a check was added to
StatefulSet#deploy_succeeded?
that requires all pods belonging to a StatefulSet to have thecontroller-revision-hash
label set to the StatefulSet'supdateRevision
status. It ensures that all of the StatefulSet's pods have been updated to the StatefulSet's latest revision. This is fine for StatefulSets withupdateStrategy: RollingUpdate
, but doesn't work when it'sOnDelete
and we don't want to wait for some other mechanism to roll the pods.We're wrong to run this check for all StatefulSets.
How is this accomplished?
I've removed the check entirely since the StatefulSet's status should be sufficient to monitor rollout status.
What could go wrong?
...