Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RayService] Add a safeguard to prevent overriding the pending cluster during a upgrade #2887

Merged
merged 1 commit into from
Feb 5, 2025

Conversation

rueian
Copy link
Contributor

@rueian rueian commented Feb 4, 2025

Why are these changes needed?

Fixes #2877.

Currently, a pending cluster will only be overridden if it is unhealthy, which is safe, but this behavior implicitly relies on the current implementation. This PR introduces an explicit check to prevent the overriding of a pending cluster that is actively serving.

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@rueian rueian force-pushed the rayservice-upgrade-safeguard branch from 3fcc4a7 to 9ca4b4a Compare February 4, 2025 02:49
@rueian rueian marked this pull request as ready for review February 4, 2025 03:33
@kevin85421 kevin85421 self-assigned this Feb 4, 2025
@@ -243,6 +243,7 @@ func (r *RayServiceReconciler) calculateStatus(ctx context.Context, rayServiceIn
rayServiceInstance.Status.ActiveServiceStatus.Applications = activeClusterServeApplications
rayServiceInstance.Status.PendingServiceStatus.Applications = pendingClusterServeApplications

var servingClusterName string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a boolean to indicate whether the services point to the pending cluster instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but I think using a Boolean feels like leaving the actual check outside of the shouldXXX function.

@rueian rueian force-pushed the rayservice-upgrade-safeguard branch 3 times, most recently from 8e02563 to 2311404 Compare February 5, 2025 15:55
assert.True(t, shouldPrepareNewCluster)
}

func TestShouldPrepareNewCluster_RecreatePendingCluster(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two tests look pretty similar how about merging it into one test with t.Run(...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -243,6 +243,7 @@ func (r *RayServiceReconciler) calculateStatus(ctx context.Context, rayServiceIn
rayServiceInstance.Status.ActiveServiceStatus.Applications = activeClusterServeApplications
rayServiceInstance.Status.PendingServiceStatus.Applications = pendingClusterServeApplications

var isPendingClusterServing bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicitly initialing it to false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rueian rueian force-pushed the rayservice-upgrade-safeguard branch 2 times, most recently from 5e23364 to 5b226cb Compare February 5, 2025 21:10
@rueian rueian force-pushed the rayservice-upgrade-safeguard branch from 5b226cb to 1981f08 Compare February 5, 2025 21:45
@rueian rueian requested a review from kevin85421 February 5, 2025 22:32
@kevin85421 kevin85421 merged commit b753f1a into ray-project:master Feb 5, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants