-
-
Notifications
You must be signed in to change notification settings - Fork 43
feat(controller): support serverless serving with 0-1 activator. #498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5424dec to
a7ae00f
Compare
|
/kind feature |
5d23e51 to
0a1d0fe
Compare
|
/cc @kerthcet |
cmd/main.go
Outdated
| flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.") | ||
| flag.StringVar(&namespace, "namespace", "llmaz-system", "The namespace of the llmaz to deploy") | ||
| flag.BoolVar(&enableServerless, "enable-serverless", false, "Enable the serverless feature") | ||
| flag.StringVar(&podIP, "pod-ip", "", "The pod IP of the llmaz controller manager") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something like --bind-address ? Generally, we use 0.0.0.0 by default and check the exact pod IP at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If podIP is only used for serverless mode, we should add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use PodIP is not a solid solution I think. But service is not workable here because controller is HAed .. Do we have better solutions here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pod IP is necessary here because the activator needs to know its own IP to create endpoint configurations that route traffic through itself before waking up the scaled-down services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current change here seems to be an temp solution to me.(which we may mark as alpha. )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the timeline, I'm happy to merge this, but we need to have a solid solution for this, the ip will change dynamically in the cluster, which is not workable I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created issue here: #504
api/core/v1alpha1/model_types.go
Outdated
| // ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator. | ||
| ModelActivatorAnnoKey = "activator.llmaz.io/playground" | ||
| // CachedModelActivatorAnnotationKey is used to cache the activator info of the model. | ||
| CachedModelActivatorAnnoKey = "cached.activator.llmaz.io" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the annotation naming style is not following the annotation and labels above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
activator.llmaz.io/svc-selector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed! I've updated the annotation naming to follow the project conventions:
ModelActivatorAnnoKey: changed from"activator.llmaz.io/playground"to"activator.llmaz.io/model-name"CachedModelActivatorAnnoKey: changed from"cached.activator.llmaz.io"to"activator.llmaz.io/cached-state"
Now both annotations use the consistent activator.llmaz.io/* prefix pattern like other annotations in the codebase.
api/core/v1alpha1/model_types.go
Outdated
|
|
||
| // ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator. | ||
| ModelActivatorAnnoKey = "activator.llmaz.io/playground" | ||
| // CachedModelActivatorAnnotationKey is used to cache the activator info of the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is CachedModelActivatorAnnotationKey, below is CachedModelActivatorAnnoKey.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
activator.llmaz.io/model-name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have fixed the comment to match the variable name exactly. Changed from:
// CachedModelActivatorAnnotationKey is used to cache the activator info of the model.to:
// CachedModelActivatorAnnoKey is used to cache the activator state of the model.
api/core/v1alpha1/model_types.go
Outdated
| // Once either of them qualified, we'll expose this as a field in Model. | ||
| ModelPreheatAnnoKey = "llmaz.io/model-preheat" | ||
|
|
||
| // ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment name is not same as ModelActivatorAnnoKey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Now the comment name matches the variable name exactly.
| @@ -0,0 +1,564 @@ | |||
| /* | |||
| Copyright 2024 The InftyAI Team. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @X1aoZEOuO this is impressive as exploration. Left some comments.
cmd/main.go
Outdated
| flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.") | ||
| flag.StringVar(&namespace, "namespace", "llmaz-system", "The namespace of the llmaz to deploy") | ||
| flag.BoolVar(&enableServerless, "enable-serverless", false, "Enable the serverless feature") | ||
| flag.StringVar(&podIP, "pod-ip", "", "The pod IP of the llmaz controller manager") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use PodIP is not a solid solution I think. But service is not workable here because controller is HAed .. Do we have better solutions here?
cmd/main.go
Outdated
| flag.StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metric endpoint binds to.") | ||
| flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.") | ||
| flag.StringVar(&namespace, "namespace", "llmaz-system", "The namespace of the llmaz to deploy") | ||
| flag.BoolVar(&enableServerless, "enable-serverless", false, "Enable the serverless feature") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enableServiceActivator, serverless has a lot of different modes, activator is just one kind of them. Also explain that: This is an experimental feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion! I've made the following changes:
- Renamed
enableServerlesstoenableServiceActivatorthroughout the codebase - Updated the flag description to: "Enable the service activator feature. This is an experimental feature."
- Updated all references including function signatures and the Helm chart values.yaml
api/core/v1alpha1/model_types.go
Outdated
|
|
||
| // ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator. | ||
| ModelActivatorAnnoKey = "activator.llmaz.io/playground" | ||
| // CachedModelActivatorAnnotationKey is used to cache the activator info of the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
activator.llmaz.io/model-name
api/core/v1alpha1/model_types.go
Outdated
| // ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator. | ||
| ModelActivatorAnnoKey = "activator.llmaz.io/playground" | ||
| // CachedModelActivatorAnnotationKey is used to cache the activator info of the model. | ||
| CachedModelActivatorAnnoKey = "cached.activator.llmaz.io" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
activator.llmaz.io/svc-selector
Signed-off-by: X1aoZEOuO <[email protected]>
Signed-off-by: X1aoZEOuO <[email protected]>
Signed-off-by: X1aoZEOuO <[email protected]>
Signed-off-by: X1aoZEOuO <[email protected]>
Signed-off-by: X1aoZEOuO <[email protected]>
Signed-off-by: X1aoZEOuO <[email protected]>
Signed-off-by: X1aoZEOuO <[email protected]>
be0f504 to
43f79d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/retest |
|
Merge this manually, I think the test error is happening now for every PR. I will take a deep look of this. If @X1aoZEOuO want to take a look as well, more than welcomed. Anyway, thanks @X1aoZEOuO for this experimental feature, although we still have some work to follow. |
@kerthcet I'm more than happy to look into this issue. It seems the CI problem has only emerged in the past few weeks, which might be due to changes in the GitHub environment. |
What this PR does / why we need it
In this update, several key improvements were made to support serverless operations and model activation. New constants were introduced to manage model activation states and cache information effectively.
Environment variables like
POD_IPwere added to dynamically configure networking settings, enhancing deployment flexibility. The main function was updated to include flags for enabling serverless features and configuring pod IPs, ensuring controllers can handle these operations smoothly.RBAC rules were expanded to allow more comprehensive resource management, including patching and updating endpoints. A new controller,
ActivatorReconciler, was implemented to manage model activation, service reconciliation, and traffic forwarding, crucial for serverless activations.Lastly, service creation logic was updated to include model annotations, ensuring services are correctly configured for activation purposes. These changes collectively improve the system's ability to manage dynamic and scalable deployments.
Which issue(s) this PR fixes
Fixes #362
Special notes for your reviewer
Does this PR introduce a user-facing change?
cc @pacoxu @kerthcet