-
Notifications
You must be signed in to change notification settings - Fork 775
Upgrade masters last when upgrading ES clusters #8871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Michael Montgomery <[email protected]>
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
|
buildkite test this -f p=kind,t=TestNonMasterFirstUpgradeComplexTopology -m s=9.1.2 |
Signed-off-by: Michael Montgomery <[email protected]>
updated. Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
…/cloud-on-k8s into fix-sts-upgrade-issue-recreation
Signed-off-by: Michael Montgomery <[email protected]>
|
buildkite test this -f p=kind,t=TestHandleUpscaleAndSpecChanges_VersionUpgradeDataFirstFlow -m s=9.1.2 |
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a non-master-first upgrade strategy for Elasticsearch clusters. The key change ensures that during version upgrades, non-master nodes (data, ingest, coordinating nodes) are upgraded before master nodes, which helps maintain cluster stability during upgrades.
- Adds logic to separate master and non-master StatefulSets during version upgrades
- Implements upgrade order validation to ensure non-master nodes complete their upgrades first
- Adds comprehensive unit and e2e tests to verify the upgrade flow
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pkg/controller/elasticsearch/driver/upgrade.go | Adds check to identify new clusters vs upgrades by checking if status version is empty |
| pkg/controller/elasticsearch/driver/upscale.go | Implements non-master-first upgrade logic with resource separation and upgrade status checking |
| pkg/controller/elasticsearch/driver/upscale_test.go | Adds comprehensive unit test for version upgrade flow and minor formatting fixes |
| test/e2e/es/non_master_first_upgrade_test.go | Adds e2e test that validates non-master-first upgrade behavior with a watcher |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Dismiss my review to avoid blocking the merge of this PR. I'll try to run some new tests later.
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
d82eb76 to
518d69d
Compare
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
|
I'll take another look today, sorry for the lag. |
| pendingNonMasterSTS = append(pendingNonMasterSTS, actualStatefulSet) | ||
| continue | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not rely on the status until the sts controller has observed the new generation.
| if actualStatefulSet.Status.ObservedGeneration < actualStatefulSet.Generation { | |
| // The StatefulSet controller has not yet observed the latest generation. | |
| pendingNonMasterSTS = append(pendingNonMasterSTS, actualStatefulSet) | |
| continue | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or:
- use
actualStatefulSets.PendingReconciliation()before that loop - create a common function to reuse the logic in
cloud-on-k8s/pkg/controller/elasticsearch/driver/expectations.go
Lines 23 to 46 in cb0d001
func (d *defaultDriver) expectationsSatisfied(ctx context.Context) (bool, string, error) { log := ulog.FromContext(ctx) // make sure the cache is up-to-date expectationsOK, reason, err := d.Expectations.Satisfied() if err != nil { return false, "", err } if !expectationsOK { log.V(1).Info("Cache expectations are not satisfied yet, re-queueing", "namespace", d.ES.Namespace, "es_name", d.ES.Name, "reason", reason) return false, reason, nil } actualStatefulSets, err := sset.RetrieveActualStatefulSets(d.Client, k8s.ExtractNamespacedName(&d.ES)) if err != nil { return false, "", err } // make sure StatefulSet statuses have been reconciled by the StatefulSet controller pendingStatefulSetReconciliation := actualStatefulSets.PendingReconciliation() if len(pendingStatefulSetReconciliation) > 0 { log.V(1).Info("StatefulSets observedGeneration is not reconciled yet, re-queueing", "namespace", d.ES.Namespace, "es_name", d.ES.Name) return false, fmt.Sprintf("observedGeneration is not reconciled yet for StatefulSets %s", strings.Join(pendingStatefulSetReconciliation.Names().AsSlice(), ",")), nil } // make sure pods have been reconciled by the StatefulSet controller return actualStatefulSets.PodReconciliationDone(ctx, d.Client) }
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Fixes #8429
What is changing?
This ensure that the master StatefulSets are always upgraded last when doing a version upgrade of Elasticsearch.
Validation