-
Notifications
You must be signed in to change notification settings - Fork 660
Add RayService incremental upgrade sample for guide #4164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Ryan O'Leary <[email protected]>
rueian
approved these changes
Nov 7, 2025
andrewsykim
added a commit
that referenced
this pull request
Nov 21, 2025
* [Bug] Sidecar mode shouldn't restart head pod when head pod is deleted (#4141) * [Bug] Sidecar mode shouldn't restart head pod when head pod is deleted Signed-off-by: 400Ping <[email protected]> * [Fix] Fix e2e error Signed-off-by: 400Ping <[email protected]> * [Fix] fix according to rueian's comment Signed-off-by: 400Ping <[email protected]> * [Chore] fix ci error Signed-off-by: 400Ping <[email protected]> * Update ray-operator/controllers/ray/raycluster_controller.go Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]> Signed-off-by: Ping <[email protected]> * Update ray-operator/controllers/ray/rayjob_controller.go Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]> Signed-off-by: Ping <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * Trigger CI Signed-off-by: Future-Outlier <[email protected]> --------- Signed-off-by: 400Ping <[email protected]> Signed-off-by: Ping <[email protected]> Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]> * fix: dashboard build for kuberay 1.5.0 (#4161) Signed-off-by: Future-Outlier <[email protected]> * [Feature Enhancement] Set ordered replica index label to support multi-slice (#4163) * [Feature Enhancement] Set ordered replica index label to support multi-slice Signed-off-by: Ryan O'Leary <[email protected]> * rename replica-id -> replica-name Signed-off-by: Ryan O'Leary <[email protected]> * Separate replica index feature gate logic Signed-off-by: Ryan O'Leary <[email protected]> * remove index arg in createWorkerPod Signed-off-by: Ryan O'Leary <[email protected]> --------- Signed-off-by: Ryan O'Leary <[email protected]> * update stale feature gate comments (#4174) Signed-off-by: Andrew Sy Kim <[email protected]> * [RayCluster] Add more context why we don't recreate head Pod for RayJob (#4175) Signed-off-by: Kai-Hsun Chen <[email protected]> * feature: Remove empty resource list initialization. (#4168) Fixes #4142. * [Dockerfile] [KubeRay Dashboard]: Fix Dockerfile warnings (ENV format, CMD JSON args) (#4167) * [#4166] improvement: Fix Dockerfile warnings (ENV format, CMD JSON args) * extract the hostname from CMD Signed-off-by: Neo Chien <[email protected]> --------- Signed-off-by: Neo Chien <[email protected]> Co-authored-by: cchung100m <[email protected]> * [Fix] Resolve int32 overflow by having the calculation in int64 and c… (#4158) * [Fix] Resolve int32 overflow by having the calculation in int64 and cap it if the count is over math.MaxInt32 Signed-off-by: justinyeh1995 <[email protected]> * [Test] Add unit tests for CalculateReadyReplicas Signed-off-by: justinyeh1995 <[email protected]> * [Fix] Add a nosec comment to pass the Lint (pre-commit) test Signed-off-by: justinyeh1995 <[email protected]> * [Refactor] Add CapInt64ToInt32 to replace #nosec directives Signed-off-by: justinyeh1995 <[email protected]> * [Refactor] Rename function to SafeInt64ToInt32 and add a underflowing prevention (it also help pass the lint test) Signed-off-by: justinyeh1995 <[email protected]> * [Refactor] Remove the early return as SafeInt64ToInt32 handles the int32 overflow and underflow checking. Signed-off-by: justinyeh1995 <[email protected]> --------- Signed-off-by: justinyeh1995 <[email protected]> * Add RayService incremental upgrade sample for guide (#4164) Signed-off-by: Ryan O'Leary <[email protected]> * Edit RayCluster example config for label selectors (#4151) Signed-off-by: Ryan O'Leary <[email protected]> * [RayJob] update light weight submitter image from quay.io (#4181) Signed-off-by: Future-Outlier <[email protected]> * [flaky] RayJob fails when head Pod is deleted when job is running (#4182) Signed-off-by: Future-Outlier <[email protected]> * [CI] Pin Docker api version to avoid API version mismatch (#4188) Signed-off-by: win5923 <[email protected]> * Make replicas configurable for kuberay-operator #4180 (#4195) * Make replicas configurable for kuberay-operator #4180 * Make replicas configurable for kuberay-operator #4180 * [Fix] rayjob update raycluster status (#4192) * feat: check if raycluster status update in rayjob * test: e2e test to check the rayjob raycluster status update * fix: dashboard http client tests discovered and passing (#4173) Signed-off-by: alimaazamat <[email protected]> * [RayJob] Lift cluster status while initializing (#4191) Signed-off-by: Spencer Peterson <[email protected]> * [RayJob] Remove updateJobStatus call (#4198) Fast follow to #4191 Signed-off-by: Spencer Peterson <[email protected]> * Add support for Ray token auth (#4179) * Add support for Ray token auth Signed-off-by: Andrew Sy Kim <[email protected]> * add e2e test for Ray cluster auth Signed-off-by: Andrew Sy Kim <[email protected]> * address nits from Ruiean Signed-off-by: Andrew Sy Kim <[email protected]> * update RAY_auth_mode -> RAY_AUTH_MODE Signed-off-by: Andrew Sy Kim <[email protected]> * configure auth for Ray autoscaler Signed-off-by: Andrew Sy Kim <[email protected]> --------- Signed-off-by: Andrew Sy Kim <[email protected]> * Bump js-yaml from 4.1.0 to 4.1.1 in /dashboard (#4194) Bumps [js-yaml](https://github.com/nodeca/js-yaml) from 4.1.0 to 4.1.1. - [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md) - [Commits](nodeca/js-yaml@4.1.0...4.1.1) --- updated-dependencies: - dependency-name: js-yaml dependency-version: 4.1.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * update minimum Ray version required for token authentication to 2.52.0 (#4201) * update minimum Ray version required for token authentication to 2.52.0 Signed-off-by: Andrew Sy Kim <[email protected]> * update RayCluster auth e2e test to use Ray v2.52 Signed-off-by: Andrew Sy Kim <[email protected]> --------- Signed-off-by: Andrew Sy Kim <[email protected]> * add samples for RayCluster token auth (#4200) Signed-off-by: Andrew Sy Kim <[email protected]> * update (#4208) Signed-off-by: Future-Outlier <[email protected]> * [RayJob] Add token authentication support for All mode (#4210) * dashboard client authentication support Signed-off-by: Future-Outlier <[email protected]> * support rayjob Signed-off-by: Future-Outlier <[email protected]> * update to fix api serverr err Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * updarte Signed-off-by: Future-Outlier <[email protected]> * Rayjob sidecar mode auth token mode support Signed-off-by: Future-Outlier <[email protected]> * RayJob support k8s job mode Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * Address Andrew's advice Signed-off-by: Future-Outlier <[email protected]> * add todo x-ray-authorization comments Signed-off-by: Future-Outlier <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]> * [RayCluster] Enable Secret informer watch/list and remove unused RBAC verbs (#4202) * Add authentication secret reconciliation support Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * update Signed-off-by: Future-Outlier <[email protected]> * fix flaky test Signed-off-by: Future-Outlier <[email protected]> * remove test fix Signed-off-by: Rueian <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]> Signed-off-by: Rueian <[email protected]> Co-authored-by: Rueian <[email protected]> * [APIServer][Docs] Add user guide for retry behavior & configuration (#4144) * [Docs] Add the draft description about feature intro, configurations, and usecases Signed-off-by: justinyeh1995 <[email protected]> * [Fix] Update the retry walk-through Signed-off-by: justinyeh1995 <[email protected]> * [Doc] rewrite the first 2 sections Signed-off-by: justinyeh1995 <[email protected]> * [Doc] Revise documentation wording and add Observing Retry Behavior section Signed-off-by: justinyeh1995 <[email protected]> * [Fix] fix linting issue by running pre-commit run berfore commiting Signed-off-by: justinyeh1995 <[email protected]> * [Fix] fix linting errors in the Markdown linting Signed-off-by: justinyeh1995 <[email protected]> * [Fix] Clean up the math equation Signed-off-by: justinyeh1995 <[email protected]> * Update the math formula of Backoff calculation. Co-authored-by: Nary Yeh <[email protected]> Signed-off-by: JustinYeh <[email protected]> * [Fix] Explicitly mentioned exponential backoff and removed the customization parts Signed-off-by: justinyeh1995 <[email protected]> * [Docs] Clarify naming by replacing “APIServer” with “KubeRay APIServer” Co-authored-by: Cheng-Yeh Chung <[email protected]> Signed-off-by: JustinYeh <[email protected]> * [Docs] Rename retry-configuration.md to retry-behavior.md for accuracy Signed-off-by: justinyeh1995 <[email protected]> * Update Title to KubeRay APIServer Retry Behavior Co-authored-by: Cheng-Yeh Chung <[email protected]> Signed-off-by: JustinYeh <[email protected]> * [Docs] Add a note about the limitation of retry configuration Signed-off-by: justinyeh1995 <[email protected]> --------- Signed-off-by: justinyeh1995 <[email protected]> Signed-off-by: JustinYeh <[email protected]> Co-authored-by: Nary Yeh <[email protected]> Co-authored-by: Cheng-Yeh Chung <[email protected]> * Support X-Ray-Authorization fallback header for accepting auth token via proxy (#4213) * Support X-Ray-Authorization fallback header for accepting auth token in dashboard Signed-off-by: Future-Outlier <[email protected]> * remove todo comment Signed-off-by: Future-Outlier <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]> * [RayCluster] make auth token secret name consistency (#4216) Signed-off-by: fscnick <[email protected]> * [RayCluster] Status includes head containter status message (#4196) * [RayCluster] Status includes head containter status message Signed-off-by: Spencer Peterson <[email protected]> * lint Signed-off-by: Spencer Peterson <[email protected]> * [RayCluster] Containers not ready status reflects structured reason Signed-off-by: Spencer Peterson <[email protected]> * nit Signed-off-by: Spencer Peterson <[email protected]> --------- Signed-off-by: Spencer Peterson <[email protected]> * Remove erroneous call in applyServeTargetCapacity (#4212) Signed-off-by: Ryan O'Leary <[email protected]> * [RayJob] Add token authentication support for light weight job submitter (#4215) * [RayJob] light weight job submitter auth token support Signed-off-by: Future-Outlier <[email protected]> * X-Ray-Authorization Signed-off-by: Rueian <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]> Signed-off-by: Rueian <[email protected]> Co-authored-by: Rueian <[email protected]> * feat: kubectl ray get token command (#4218) * feat: kubectl ray get token command Signed-off-by: Rueian <[email protected]> * Update kubectl-plugin/pkg/cmd/get/get_token_test.go Co-authored-by: Copilot <[email protected]> Signed-off-by: Rueian <[email protected]> * Update kubectl-plugin/pkg/cmd/get/get_token.go Co-authored-by: Copilot <[email protected]> Signed-off-by: Rueian <[email protected]> * make sure the raycluster exists before getting the secret Signed-off-by: Rueian <[email protected]> * better ux Signed-off-by: Rueian <[email protected]> * Update kubectl-plugin/pkg/cmd/get/get_token.go Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]> Signed-off-by: Rueian <[email protected]> --------- Signed-off-by: Rueian <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]> --------- Signed-off-by: 400Ping <[email protected]> Signed-off-by: Ping <[email protected]> Signed-off-by: Future-Outlier <[email protected]> Signed-off-by: Ryan O'Leary <[email protected]> Signed-off-by: Andrew Sy Kim <[email protected]> Signed-off-by: Kai-Hsun Chen <[email protected]> Signed-off-by: Neo Chien <[email protected]> Signed-off-by: justinyeh1995 <[email protected]> Signed-off-by: win5923 <[email protected]> Signed-off-by: alimaazamat <[email protected]> Signed-off-by: Spencer Peterson <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Rueian <[email protected]> Signed-off-by: JustinYeh <[email protected]> Signed-off-by: fscnick <[email protected]> Co-authored-by: Ping <[email protected]> Co-authored-by: Han-Ju Chen (Future-Outlier) <[email protected]> Co-authored-by: Ryan O'Leary <[email protected]> Co-authored-by: Kai-Hsun Chen <[email protected]> Co-authored-by: Kavish <[email protected]> Co-authored-by: Neo Chien <[email protected]> Co-authored-by: cchung100m <[email protected]> Co-authored-by: JustinYeh <[email protected]> Co-authored-by: Jun-Hao Wan <[email protected]> Co-authored-by: Divyam Raj <[email protected]> Co-authored-by: Nary Yeh <[email protected]> Co-authored-by: Alima Azamat <[email protected]> Co-authored-by: Spencer Peterson <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Rueian <[email protected]> Co-authored-by: Cheng-Yeh Chung <[email protected]> Co-authored-by: fscnick <[email protected]> Co-authored-by: Copilot <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
This PR adds a simple, sample RayService with
NewClusterWithIncrementalUpgradeenabled for a guide in the Ray docs. Related guide: ray-project/ray#58293.Related issue number
#3209
Checks