Skip to content

Conversation

@X1aoZEOuO
Copy link
Contributor

@X1aoZEOuO X1aoZEOuO commented Sep 28, 2025

What this PR does / why we need it

In this update, several key improvements were made to support serverless operations and model activation. New constants were introduced to manage model activation states and cache information effectively.

Environment variables like POD_IP were added to dynamically configure networking settings, enhancing deployment flexibility. The main function was updated to include flags for enabling serverless features and configuring pod IPs, ensuring controllers can handle these operations smoothly.

RBAC rules were expanded to allow more comprehensive resource management, including patching and updating endpoints. A new controller, ActivatorReconciler, was implemented to manage model activation, service reconciliation, and traffic forwarding, crucial for serverless activations.

Lastly, service creation logic was updated to include model annotations, ensuring services are correctly configured for activation purposes. These changes collectively improve the system's ability to manage dynamic and scalable deployments.

Which issue(s) this PR fixes

Fixes #362

Special notes for your reviewer

Does this PR introduce a user-facing change?


cc @pacoxu @kerthcet

@InftyAI-Agent InftyAI-Agent added needs-triage Indicates an issue or PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Sep 28, 2025
@X1aoZEOuO X1aoZEOuO force-pushed the feat/0-1-activator branch 2 times, most recently from 5424dec to a7ae00f Compare September 28, 2025 12:19
@X1aoZEOuO
Copy link
Contributor Author

/kind feature

@InftyAI-Agent InftyAI-Agent added feature Categorizes issue or PR as related to a new feature. and removed do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Sep 28, 2025
@X1aoZEOuO X1aoZEOuO force-pushed the feat/0-1-activator branch 2 times, most recently from 5d23e51 to 0a1d0fe Compare September 28, 2025 15:40
@pacoxu
Copy link
Contributor

pacoxu commented Oct 10, 2025

/cc @kerthcet
/assign

cmd/main.go Outdated
flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
flag.StringVar(&namespace, "namespace", "llmaz-system", "The namespace of the llmaz to deploy")
flag.BoolVar(&enableServerless, "enable-serverless", false, "Enable the serverless feature")
flag.StringVar(&podIP, "pod-ip", "", "The pod IP of the llmaz controller manager")
Copy link
Contributor

@pacoxu pacoxu Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something like --bind-address ? Generally, we use 0.0.0.0 by default and check the exact pod IP at runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If podIP is only used for serverless mode, we should add a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use PodIP is not a solid solution I think. But service is not workable here because controller is HAed .. Do we have better solutions here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pacoxu @kerthcet This is different from a bind-address. The pod-ip is specifically used by the activator controller to register itself as an endpoint for traffic forwarding when services are scaled to zero. I've added a comment to clarify this: "Only used when service activator is enabled."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pod IP is necessary here because the activator needs to know its own IP to create endpoint configurations that route traffic through itself before waking up the scaled-down services.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current change here seems to be an temp solution to me.(which we may mark as alpha. )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the timeline, I'm happy to merge this, but we need to have a solid solution for this, the ip will change dynamically in the cluster, which is not workable I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created issue here: #504

// ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator.
ModelActivatorAnnoKey = "activator.llmaz.io/playground"
// CachedModelActivatorAnnotationKey is used to cache the activator info of the model.
CachedModelActivatorAnnoKey = "cached.activator.llmaz.io"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the annotation naming style is not following the annotation and labels above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activator.llmaz.io/svc-selector

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! I've updated the annotation naming to follow the project conventions:

  • ModelActivatorAnnoKey: changed from "activator.llmaz.io/playground" to "activator.llmaz.io/model-name"
  • CachedModelActivatorAnnoKey: changed from "cached.activator.llmaz.io" to "activator.llmaz.io/cached-state"

Now both annotations use the consistent activator.llmaz.io/* prefix pattern like other annotations in the codebase.


// ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator.
ModelActivatorAnnoKey = "activator.llmaz.io/playground"
// CachedModelActivatorAnnotationKey is used to cache the activator info of the model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is CachedModelActivatorAnnotationKey, below is CachedModelActivatorAnnoKey.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activator.llmaz.io/model-name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have fixed the comment to match the variable name exactly. Changed from:

// CachedModelActivatorAnnotationKey is used to cache the activator info of the model.

to:

// CachedModelActivatorAnnoKey is used to cache the activator state of the model.

// Once either of them qualified, we'll expose this as a field in Model.
ModelPreheatAnnoKey = "llmaz.io/model-preheat"

// ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment name is not same as ModelActivatorAnnoKey

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Now the comment name matches the variable name exactly.

@@ -0,0 +1,564 @@
/*
Copyright 2024 The InftyAI Team.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit 2025

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK!

Copy link
Member

@kerthcet kerthcet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @X1aoZEOuO this is impressive as exploration. Left some comments.

cmd/main.go Outdated
flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
flag.StringVar(&namespace, "namespace", "llmaz-system", "The namespace of the llmaz to deploy")
flag.BoolVar(&enableServerless, "enable-serverless", false, "Enable the serverless feature")
flag.StringVar(&podIP, "pod-ip", "", "The pod IP of the llmaz controller manager")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use PodIP is not a solid solution I think. But service is not workable here because controller is HAed .. Do we have better solutions here?

cmd/main.go Outdated
flag.StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metric endpoint binds to.")
flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
flag.StringVar(&namespace, "namespace", "llmaz-system", "The namespace of the llmaz to deploy")
flag.BoolVar(&enableServerless, "enable-serverless", false, "Enable the serverless feature")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enableServiceActivator, serverless has a lot of different modes, activator is just one kind of them. Also explain that: This is an experimental feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! I've made the following changes:

  1. Renamed enableServerless to enableServiceActivator throughout the codebase
  2. Updated the flag description to: "Enable the service activator feature. This is an experimental feature."
  3. Updated all references including function signatures and the Helm chart values.yaml


// ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator.
ModelActivatorAnnoKey = "activator.llmaz.io/playground"
// CachedModelActivatorAnnotationKey is used to cache the activator info of the model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activator.llmaz.io/model-name

// ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator.
ModelActivatorAnnoKey = "activator.llmaz.io/playground"
// CachedModelActivatorAnnotationKey is used to cache the activator info of the model.
CachedModelActivatorAnnoKey = "cached.activator.llmaz.io"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activator.llmaz.io/svc-selector

Copy link
Contributor

@pacoxu pacoxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kerthcet
Copy link
Member

/retest

@kerthcet
Copy link
Member

Merge this manually, I think the test error is happening now for every PR. I will take a deep look of this. If @X1aoZEOuO want to take a look as well, more than welcomed.

Anyway, thanks @X1aoZEOuO for this experimental feature, although we still have some work to follow.

@kerthcet kerthcet enabled auto-merge October 30, 2025 23:36
@kerthcet kerthcet disabled auto-merge October 30, 2025 23:36
@kerthcet kerthcet enabled auto-merge (squash) October 30, 2025 23:36
@kerthcet kerthcet disabled auto-merge October 30, 2025 23:41
@kerthcet kerthcet merged commit 3c13ff3 into InftyAI:main Oct 30, 2025
21 of 27 checks passed
@X1aoZEOuO
Copy link
Contributor Author

Merge this manually, I think the test error is happening now for every PR. I will take a deep look of this. If @X1aoZEOuO want to take a look as well, more than welcomed.

Anyway, thanks @X1aoZEOuO for this experimental feature, although we still have some work to follow.

@kerthcet I'm more than happy to look into this issue. It seems the CI problem has only emerged in the past few weeks, which might be due to changes in the GitHub environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a label and requires one. needs-triage Indicates an issue or PR lacks a label and requires one.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[OSPP] KEDA-based Serverless Elastic Scaling for llmaz

4 participants