feat(controller): support serverless serving with 0-1 activator. #498

X1aoZEOuO · 2025-09-28T11:55:48Z

What this PR does / why we need it

In this update, several key improvements were made to support serverless operations and model activation. New constants were introduced to manage model activation states and cache information effectively.

Environment variables like POD_IP were added to dynamically configure networking settings, enhancing deployment flexibility. The main function was updated to include flags for enabling serverless features and configuring pod IPs, ensuring controllers can handle these operations smoothly.

RBAC rules were expanded to allow more comprehensive resource management, including patching and updating endpoints. A new controller, ActivatorReconciler, was implemented to manage model activation, service reconciliation, and traffic forwarding, crucial for serverless activations.

Lastly, service creation logic was updated to include model annotations, ensuring services are correctly configured for activation purposes. These changes collectively improve the system's ability to manage dynamic and scalable deployments.

Which issue(s) this PR fixes

Fixes #362

Special notes for your reviewer

Does this PR introduce a user-facing change?

cc @pacoxu @kerthcet

X1aoZEOuO · 2025-09-28T12:20:17Z

/kind feature

pacoxu · 2025-10-10T06:33:03Z

/cc @kerthcet
/assign

pacoxu · 2025-10-28T07:29:17Z

cmd/main.go

 	flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
 	flag.StringVar(&namespace, "namespace", "llmaz-system", "The namespace of the llmaz to deploy")
+	flag.BoolVar(&enableServerless, "enable-serverless", false, "Enable the serverless feature")
+	flag.StringVar(&podIP, "pod-ip", "", "The pod IP of the llmaz controller manager")


Is this something like --bind-address ? Generally, we use 0.0.0.0 by default and check the exact pod IP at runtime.

If podIP is only used for serverless mode, we should add a comment.

Use PodIP is not a solid solution I think. But service is not workable here because controller is HAed .. Do we have better solutions here?

@pacoxu @kerthcet This is different from a bind-address. The pod-ip is specifically used by the activator controller to register itself as an endpoint for traffic forwarding when services are scaled to zero. I've added a comment to clarify this: "Only used when service activator is enabled."

The pod IP is necessary here because the activator needs to know its own IP to create endpoint configurations that route traffic through itself before waking up the scaled-down services.

The current change here seems to be an temp solution to me.(which we may mark as alpha. )

Considering the timeline, I'm happy to merge this, but we need to have a solid solution for this, the ip will change dynamically in the cluster, which is not workable I think.

created issue here: #504

pacoxu · 2025-10-28T07:32:09Z

api/core/v1alpha1/model_types.go

+	// ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator.
+	ModelActivatorAnnoKey = "activator.llmaz.io/playground"
+	// CachedModelActivatorAnnotationKey is used to cache the activator info of the model.
+	CachedModelActivatorAnnoKey = "cached.activator.llmaz.io"


the annotation naming style is not following the annotation and labels above.

activator.llmaz.io/svc-selector

Fixed! I've updated the annotation naming to follow the project conventions:

ModelActivatorAnnoKey: changed from "activator.llmaz.io/playground" to "activator.llmaz.io/model-name"

CachedModelActivatorAnnoKey: changed from "cached.activator.llmaz.io" to "activator.llmaz.io/cached-state"

Now both annotations use the consistent activator.llmaz.io/* prefix pattern like other annotations in the codebase.

pacoxu · 2025-10-28T07:34:45Z

api/core/v1alpha1/model_types.go


+	// ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator.
+	ModelActivatorAnnoKey = "activator.llmaz.io/playground"
+	// CachedModelActivatorAnnotationKey is used to cache the activator info of the model.


Here is CachedModelActivatorAnnotationKey, below is CachedModelActivatorAnnoKey.

activator.llmaz.io/model-name

Thanks, I have fixed the comment to match the variable name exactly. Changed from:

// CachedModelActivatorAnnotationKey is used to cache the activator info of the model.

to:

// CachedModelActivatorAnnoKey is used to cache the activator state of the model.

pacoxu · 2025-10-28T07:35:07Z

api/core/v1alpha1/model_types.go

 	// Once either of them qualified, we'll expose this as a field in Model.
 	ModelPreheatAnnoKey = "llmaz.io/model-preheat"

+	// ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator.


comment name is not same as ModelActivatorAnnoKey

Thanks! Now the comment name matches the variable name exactly.

pacoxu · 2025-10-28T07:39:31Z

pkg/controller/inference/activator_controller.go

@@ -0,0 +1,564 @@
+/*
+Copyright 2024 The InftyAI Team.


kerthcet

Thanks @X1aoZEOuO this is impressive as exploration. Left some comments.

kerthcet · 2025-10-28T22:20:26Z

cmd/main.go

 	flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
 	flag.StringVar(&namespace, "namespace", "llmaz-system", "The namespace of the llmaz to deploy")
+	flag.BoolVar(&enableServerless, "enable-serverless", false, "Enable the serverless feature")
+	flag.StringVar(&podIP, "pod-ip", "", "The pod IP of the llmaz controller manager")


Use PodIP is not a solid solution I think. But service is not workable here because controller is HAed .. Do we have better solutions here?

kerthcet · 2025-10-28T22:49:22Z

cmd/main.go

 	flag.StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metric endpoint binds to.")
 	flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
 	flag.StringVar(&namespace, "namespace", "llmaz-system", "The namespace of the llmaz to deploy")
+	flag.BoolVar(&enableServerless, "enable-serverless", false, "Enable the serverless feature")


enableServiceActivator, serverless has a lot of different modes, activator is just one kind of them. Also explain that: This is an experimental feature.

Great suggestion! I've made the following changes:

Renamed enableServerless to enableServiceActivator throughout the codebase

Updated the flag description to: "Enable the service activator feature. This is an experimental feature."

Updated all references including function signatures and the Helm chart values.yaml

kerthcet · 2025-10-28T22:55:59Z

api/core/v1alpha1/model_types.go


+	// ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator.
+	ModelActivatorAnnoKey = "activator.llmaz.io/playground"
+	// CachedModelActivatorAnnotationKey is used to cache the activator info of the model.


activator.llmaz.io/model-name

kerthcet · 2025-10-28T22:56:43Z

api/core/v1alpha1/model_types.go

+	// ModelActivatorAnnotationKey is used to indicate whether the model is activated by the activator.
+	ModelActivatorAnnoKey = "activator.llmaz.io/playground"
+	// CachedModelActivatorAnnotationKey is used to cache the activator info of the model.
+	CachedModelActivatorAnnoKey = "cached.activator.llmaz.io"


activator.llmaz.io/svc-selector

Signed-off-by: X1aoZEOuO <[email protected]>

pacoxu

/lgtm

kerthcet · 2025-10-30T17:49:46Z

/retest

kerthcet · 2025-10-30T23:36:14Z

Merge this manually, I think the test error is happening now for every PR. I will take a deep look of this. If @X1aoZEOuO want to take a look as well, more than welcomed.

Anyway, thanks @X1aoZEOuO for this experimental feature, although we still have some work to follow.

X1aoZEOuO · 2025-10-31T01:58:16Z

Merge this manually, I think the test error is happening now for every PR. I will take a deep look of this. If @X1aoZEOuO want to take a look as well, more than welcomed.

Anyway, thanks @X1aoZEOuO for this experimental feature, although we still have some work to follow.

@kerthcet I'm more than happy to look into this issue. It seems the CI problem has only emerged in the past few weeks, which might be due to changes in the GitHub environment.

InftyAI-Agent added needs-triage Indicates an issue or PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Sep 28, 2025

InftyAI-Agent requested review from carlory and googs1025 September 28, 2025 11:56

X1aoZEOuO force-pushed the feat/0-1-activator branch 2 times, most recently from 5424dec to a7ae00f Compare September 28, 2025 12:19

InftyAI-Agent added feature Categorizes issue or PR as related to a new feature. and removed do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Sep 28, 2025

X1aoZEOuO force-pushed the feat/0-1-activator branch 2 times, most recently from 5d23e51 to 0a1d0fe Compare September 28, 2025 15:40

X1aoZEOuO mentioned this pull request Oct 9, 2025

feat(controller): support serverless serving with keda support by k8s scale subresource. #500

Open

InftyAI-Agent assigned pacoxu Oct 10, 2025

InftyAI-Agent requested a review from kerthcet October 10, 2025 06:33

pacoxu reviewed Oct 28, 2025

View reviewed changes

kerthcet reviewed Oct 28, 2025

View reviewed changes

X1aoZEOuO added 7 commits October 30, 2025 01:49

feat: add service anno for activator.

efa938b

Signed-off-by: X1aoZEOuO <[email protected]>

feat: add port manager.

e79d0dc

Signed-off-by: X1aoZEOuO <[email protected]>

feat: add activator controller manager for activator.

632cb3e

Signed-off-by: X1aoZEOuO <[email protected]>

feat: add entrypoint for activator.

19c57b7

Signed-off-by: X1aoZEOuO <[email protected]>

feat: add chart for activator.

ef39b50

Signed-off-by: X1aoZEOuO <[email protected]>

fix: fix param in create service.

67d5b73

Signed-off-by: X1aoZEOuO <[email protected]>

fix: update activator label and var name.

43f79d3

Signed-off-by: X1aoZEOuO <[email protected]>

X1aoZEOuO force-pushed the feat/0-1-activator branch from be0f504 to 43f79d3 Compare October 29, 2025 17:49

pacoxu approved these changes Oct 30, 2025

View reviewed changes

kerthcet mentioned this pull request Oct 30, 2025

Replace the activator access address from Pod IP to a more solid solution #504

Open

3 tasks

kerthcet enabled auto-merge October 30, 2025 23:36

kerthcet disabled auto-merge October 30, 2025 23:36

kerthcet enabled auto-merge (squash) October 30, 2025 23:36

kerthcet disabled auto-merge October 30, 2025 23:41

kerthcet merged commit 3c13ff3 into InftyAI:main Oct 30, 2025
21 of 27 checks passed

This was referenced Oct 31, 2025

doc: add serverless doc with keda and activator. #499

Open

Github e2e ci always failed in recent weeks. #505

Open

Uh oh!

feat(controller): support serverless serving with 0-1 activator. #498

feat(controller): support serverless serving with 0-1 activator. #498

Uh oh!

Conversation

X1aoZEOuO commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it

Which issue(s) this PR fixes

Special notes for your reviewer

Does this PR introduce a user-facing change?

Uh oh!

X1aoZEOuO commented Sep 28, 2025

Uh oh!

pacoxu commented Oct 10, 2025

Uh oh!

pacoxu Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kerthcet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pacoxu left a comment

Choose a reason for hiding this comment

Uh oh!

kerthcet commented Oct 30, 2025

Uh oh!

kerthcet commented Oct 30, 2025

Uh oh!

Uh oh!

X1aoZEOuO commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

X1aoZEOuO commented Sep 28, 2025 •

edited

Loading

pacoxu Oct 28, 2025 •

edited

Loading