Skip to content

Conversation

@SheldonTsen
Copy link
Contributor

@SheldonTsen SheldonTsen commented Oct 27, 2025

I was investigating odd behaviour where requesting exact number of workers via the python sdk was not behaving as expected. At first I thought it was related to this: #3794. However, even after the fix, I was not observing any different behaviour.

Then I thought to try and have ArgoCD ignore the replicas field, and then, everything started working as expected.

I thought it be best to convey this in an example, and I could not find any documentation on how to deploy using ArgoCD (which also has a couple of lines that one needs to be aware about). IIRC I pieced it together based on some github issues and debugging.

The important point is that when managing Ray via ArgoCD with the Autoscaler enabled, the ignoreDifferences must be managed properly to get the expected behaviour of the Autoscaler.

I would have attached screenshots, but from a PR review perspective, this doesn't prove anything. Essentially what I did was:

  • introduce the ignoreDifferences section, request X number of workers via ray.autoscaler.sdk.request_resources, kept changing it. When increasing X, it worked as expected and quite speedily. When reducing X, it takes ~10 mins (based on my setting) then workers start spinning down. Eventually requesting 1 worker.
  • removed the ignoreDifferences section, request X number of workers. Then, requesting more than X, nothing happens. Request X=1, nothing happens. Delete the RayCluster, start back at original state, request Y, sometimes get Y, sometimes do not get Y workers.
  • repeat back and forth in my environment.

@win5923
Copy link
Collaborator

win5923 commented Oct 29, 2025

Hi @SheldonTsen, thanks for your contribution! However, all user-facing KubeRay documentation is now hosted on the Ray documentation site. I think you can open a PR under doc/source/cluster/kubernetes/user-guides.

@SheldonTsen
Copy link
Contributor Author

Thanks @win5923 - created the PR here: ray-project/ray#58340

Will close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants