doc: add serverless doc with keda and activator. #499

X1aoZEOuO · 2025-09-28T11:56:09Z

What this PR does / why we need it

This commit introduces a comprehensive guide for configuring serverless environments on Kubernetes, with a focus on integrating Prometheus for monitoring and KEDA for autoscaling. The guide aims to optimize resource efficiency through event-driven scaling while maintaining observability and resilience for AI/ML workloads and other latency-sensitive applications.

This commit adds a detailed guide for configuring serverless environments on Kubernetes, integrating Prometheus for monitoring and KEDA for autoscaling. The guide includes YAML configurations, step-by-step installation instructions, and performance benchmarks to help users achieve optimal resource efficiency and observability for their applications.

Which issue(s) this PR fixes

Fixes #362

Special notes for your reviewer

Does this PR introduce a user-facing change?

cc @pacoxu @kerthcet

X1aoZEOuO · 2025-09-28T12:19:53Z

/kind feature

X1aoZEOuO · 2025-09-28T12:20:31Z

/kind documentation

pacoxu · 2025-10-28T07:42:19Z

site/content/en/docs/features/serverless.md

+
+```bash
+helm install llmaz oci://registry-1.docker.io/inftyai/llmaz --namespace llmaz-system --create-namespace --version 0.0.10
+make install-keda


The doc should be merged in the last: #500 added the makefile task.

/hold
until #500 was merged

kerthcet

Please explain the relationship between activator and keda at the very beginning. Thanks!

kerthcet · 2025-10-28T22:44:31Z

site/content/en/docs/features/serverless.md

+    any: true
+  selector:
+    matchLabels:
+      llmaz.io/model-name: qwen2-0--5b


I think we should not only capture this service right? Let's add some explanations here.

Thanks! I've enhanced both the Prometheus and KEDA configuration sections with detailed explanations

X1aoZEOuO · 2025-10-29T17:15:09Z

Please explain the relationship between activator and keda at the very beginning. Thanks!

@kerthcet Thank you for the feedback! I've added a new section "Relationship Between Activator and KEDA". This section now clearly explains:

How KEDA handles dynamic scaling based on metrics (monitoring and adjusting replicas)
How the Activator serves as a request interceptor for scale-from-zero scenarios
How these two components work together to enable true serverless behavior

Signed-off-by: X1aoZEOuO <[email protected]>

X1aoZEOuO · 2025-10-29T17:51:57Z

@pacoxu @kerthcet Helm ci seemed failed because of network error, can we disable or ignore it now?

https://github.com/InftyAI/llmaz/actions/runs/18917273800/job/54003859685?pr=499

kerthcet · 2025-10-30T17:47:27Z

/retest

X1aoZEOuO · 2025-10-30T17:54:33Z

/retest

@kerthcet Hello, It seems that no space left on e2e test. https://github.com/InftyAI/llmaz/actions/runs/18917278030/job/54003874165?pr=500

  [FAILED] in [It] - /home/runner/work/llmaz/llmaz/test/util/validation/validate_playground.go:219 @ 10/29/25 18:02:24.453
  [FAILED] in [AfterEach] - /home/runner/work/llmaz/llmaz/test/e2e/playground_test.go:50 @ 10/29/25 18:02:24.453
• [FAILED] [335.923 seconds]
playground e2e tests [It] SpeculativeDecoding with llama.cpp
/home/runner/work/llmaz/llmaz/test/e2e/playground_test.go:145

  [FAILED] Timed out after 335.612s.
  Expected success, but got an error:
      <*url.Error | 0xc000712900>: 
Error: No space left on device : '/home/runner/actions-runner/cached/_diag/pages/7ae5050e-5137-471d-b700-9b1bd0d8553b_338ff102-8e76-46a1-a5ae-f669195390f6_1.log'

X1aoZEOuO · 2025-10-30T17:55:31Z

/retest

@kerthcet And the helm install is not ready. https://github.com/InftyAI/llmaz/actions/runs/18917273800/job/54003859685?pr=499

Installing v3.17.3
  Downloading 'v3.17.3' from 'https://get.helm.sh/'
  Request timeout: /helm-v3.17.3-linux-amd64.tar.gz
  Waiting 20 seconds before trying again
  Request timeout: /helm-v3.17.3-linux-amd64.tar.gz
  Waiting 14 seconds before trying again
  Error: Error: Failed to download Helm from location https://get.helm.sh/helm-v3.17.3-linux-amd64.tar.gz

kerthcet

/approve
/lgtm

kerthcet · 2025-10-30T18:00:15Z

/retest

kerthcet · 2025-10-30T18:01:15Z

/retest

pacoxu · 2025-10-31T01:02:54Z

/retest

pacoxu · 2025-10-31T01:19:30Z

/retest

X1aoZEOuO · 2025-10-31T02:04:54Z

/retest

@pacoxu The CI issue appears to have popped up over the last couple of weeks, likely resulting from recent changes to the GitHub environment. I receive the issue and take a look about it in next PR. #498 (comment)

pacoxu · 2025-10-31T03:50:10Z

/retest

pacoxu · 2025-10-31T06:22:58Z

/retest

@pacoxu The CI issue appears to have popped up over the last couple of weeks, likely resulting from recent changes to the GitHub environment. I receive the issue and take a look about it in next PR. #498 (comment)

#508: see https://github.com/InftyAI/llmaz/actions/runs/18964011867/job/54157010420
Adding https://github.com/InftyAI/llmaz/pull/508/files#diff-2976cc01ce3201f4a03e4021cdefdc1cd95b67efed6798930c222c59d1771f9aR73-R88 Free Disk Space can fix the CI failure.

/retest

I opened kerthcet/github-workflow-as-kube#15 to fix the CI.(after that, llmaz should bump the workflow version)

InftyAI-Agent added needs-triage Indicates an issue or PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Sep 28, 2025

InftyAI-Agent requested review from carlory and cr7258 September 28, 2025 11:56

InftyAI-Agent added feature Categorizes issue or PR as related to a new feature. and removed do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Sep 28, 2025

InftyAI-Agent added the documentation Categorizes issue or PR as related to documentation. label Sep 28, 2025

X1aoZEOuO mentioned this pull request Oct 9, 2025

feat(controller): support serverless serving with keda support by k8s scale subresource. #500

Open

pacoxu reviewed Oct 28, 2025

View reviewed changes

kerthcet reviewed Oct 28, 2025

View reviewed changes

X1aoZEOuO added 2 commits October 30, 2025 01:48

doc: add serverless doc with keda and activator.

e1a95d3

Signed-off-by: X1aoZEOuO <[email protected]>

fix: fix keda with activator issue.

56fb1ac

Signed-off-by: X1aoZEOuO <[email protected]>

X1aoZEOuO force-pushed the doc/ospp-serverless-deployment branch from 354e4ee to 56fb1ac Compare October 29, 2025 17:48

kerthcet reviewed Oct 30, 2025

View reviewed changes

X1aoZEOuO mentioned this pull request Oct 31, 2025

Github e2e ci always failed in recent weeks. #505

Open

Uh oh!

doc: add serverless doc with keda and activator. #499

Are you sure you want to change the base?

doc: add serverless doc with keda and activator. #499

Conversation

X1aoZEOuO commented Sep 28, 2025

What this PR does / why we need it

Which issue(s) this PR fixes

Special notes for your reviewer

Does this PR introduce a user-facing change?

Uh oh!

X1aoZEOuO commented Sep 28, 2025

Uh oh!

X1aoZEOuO commented Sep 28, 2025

Uh oh!

pacoxu Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

pacoxu Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

X1aoZEOuO Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

kerthcet left a comment

Choose a reason for hiding this comment

Uh oh!

kerthcet Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

X1aoZEOuO Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

X1aoZEOuO commented Oct 29, 2025

Uh oh!

X1aoZEOuO commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kerthcet commented Oct 30, 2025

Uh oh!

X1aoZEOuO commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

X1aoZEOuO commented Oct 30, 2025

Uh oh!

kerthcet left a comment

Choose a reason for hiding this comment

Uh oh!

kerthcet commented Oct 30, 2025

Uh oh!

kerthcet commented Oct 30, 2025

Uh oh!

pacoxu commented Oct 31, 2025

Uh oh!

pacoxu commented Oct 31, 2025

Uh oh!

X1aoZEOuO commented Oct 31, 2025

Uh oh!

pacoxu commented Oct 31, 2025

Uh oh!

pacoxu commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

X1aoZEOuO commented Oct 29, 2025 •

edited

Loading

X1aoZEOuO commented Oct 30, 2025 •

edited

Loading

pacoxu commented Oct 31, 2025 •

edited

Loading