Skip to content

Conversation

@X1aoZEOuO
Copy link
Contributor

What this PR does / why we need it

This commit introduces a comprehensive guide for configuring serverless environments on Kubernetes, with a focus on integrating Prometheus for monitoring and KEDA for autoscaling. The guide aims to optimize resource efficiency through event-driven scaling while maintaining observability and resilience for AI/ML workloads and other latency-sensitive applications.

This commit adds a detailed guide for configuring serverless environments on Kubernetes, integrating Prometheus for monitoring and KEDA for autoscaling. The guide includes YAML configurations, step-by-step installation instructions, and performance benchmarks to help users achieve optimal resource efficiency and observability for their applications.

Which issue(s) this PR fixes

Fixes #362

Special notes for your reviewer

Does this PR introduce a user-facing change?


cc @pacoxu @kerthcet

@InftyAI-Agent InftyAI-Agent added needs-triage Indicates an issue or PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Sep 28, 2025
@X1aoZEOuO
Copy link
Contributor Author

/kind feature

@InftyAI-Agent InftyAI-Agent added feature Categorizes issue or PR as related to a new feature. and removed do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Sep 28, 2025
@X1aoZEOuO
Copy link
Contributor Author

/kind documentation


```bash
helm install llmaz oci://registry-1.docker.io/inftyai/llmaz --namespace llmaz-system --create-namespace --version 0.0.10
make install-keda
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc should be merged in the last: #500 added the makefile task.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold
until #500 was merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@kerthcet kerthcet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain the relationship between activator and keda at the very beginning. Thanks!

any: true
selector:
matchLabels:
llmaz.io/model-name: qwen2-0--5b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not only capture this service right? Let's add some explanations here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've enhanced both the Prometheus and KEDA configuration sections with detailed explanations

@X1aoZEOuO
Copy link
Contributor Author

Please explain the relationship between activator and keda at the very beginning. Thanks!

@kerthcet Thank you for the feedback! I've added a new section "Relationship Between Activator and KEDA". This section now clearly explains:

  1. How KEDA handles dynamic scaling based on metrics (monitoring and adjusting replicas)
  2. How the Activator serves as a request interceptor for scale-from-zero scenarios
  3. How these two components work together to enable true serverless behavior

@X1aoZEOuO X1aoZEOuO force-pushed the doc/ospp-serverless-deployment branch from 354e4ee to 56fb1ac Compare October 29, 2025 17:48
@X1aoZEOuO
Copy link
Contributor Author

X1aoZEOuO commented Oct 29, 2025

@pacoxu @kerthcet Helm ci seemed failed because of network error, can we disable or ignore it now?

https://github.com/InftyAI/llmaz/actions/runs/18917273800/job/54003859685?pr=499

@kerthcet
Copy link
Member

/retest

@X1aoZEOuO
Copy link
Contributor Author

X1aoZEOuO commented Oct 30, 2025

/retest

@kerthcet Hello, It seems that no space left on e2e test. https://github.com/InftyAI/llmaz/actions/runs/18917278030/job/54003874165?pr=500

  [FAILED] in [It] - /home/runner/work/llmaz/llmaz/test/util/validation/validate_playground.go:219 @ 10/29/25 18:02:24.453
  [FAILED] in [AfterEach] - /home/runner/work/llmaz/llmaz/test/e2e/playground_test.go:50 @ 10/29/25 18:02:24.453
• [FAILED] [335.923 seconds]
playground e2e tests [It] SpeculativeDecoding with llama.cpp
/home/runner/work/llmaz/llmaz/test/e2e/playground_test.go:145

  [FAILED] Timed out after 335.612s.
  Expected success, but got an error:
      <*url.Error | 0xc000712900>: 
Error: No space left on device : '/home/runner/actions-runner/cached/_diag/pages/7ae5050e-5137-471d-b700-9b1bd0d8553b_338ff102-8e76-46a1-a5ae-f669195390f6_1.log'

@X1aoZEOuO
Copy link
Contributor Author

/retest

@kerthcet And the helm install is not ready. https://github.com/InftyAI/llmaz/actions/runs/18917273800/job/54003859685?pr=499

Installing v3.17.3
  Downloading 'v3.17.3' from 'https://get.helm.sh/'
  Request timeout: /helm-v3.17.3-linux-amd64.tar.gz
  Waiting 20 seconds before trying again
  Request timeout: /helm-v3.17.3-linux-amd64.tar.gz
  Waiting 14 seconds before trying again
  Error: Error: Failed to download Helm from location https://get.helm.sh/helm-v3.17.3-linux-amd64.tar.gz

Copy link
Member

@kerthcet kerthcet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@kerthcet
Copy link
Member

/retest

3 similar comments
@kerthcet
Copy link
Member

/retest

@pacoxu
Copy link
Contributor

pacoxu commented Oct 31, 2025

/retest

@pacoxu
Copy link
Contributor

pacoxu commented Oct 31, 2025

/retest

@X1aoZEOuO
Copy link
Contributor Author

/retest

@pacoxu The CI issue appears to have popped up over the last couple of weeks, likely resulting from recent changes to the GitHub environment. I receive the issue and take a look about it in next PR. #498 (comment)

@pacoxu
Copy link
Contributor

pacoxu commented Oct 31, 2025

/retest

@pacoxu
Copy link
Contributor

pacoxu commented Oct 31, 2025

/retest

@pacoxu The CI issue appears to have popped up over the last couple of weeks, likely resulting from recent changes to the GitHub environment. I receive the issue and take a look about it in next PR. #498 (comment)

#508: see https://github.com/InftyAI/llmaz/actions/runs/18964011867/job/54157010420
Adding https://github.com/InftyAI/llmaz/pull/508/files#diff-2976cc01ce3201f4a03e4021cdefdc1cd95b67efed6798930c222c59d1771f9aR73-R88 Free Disk Space can fix the CI failure.

/retest

I opened kerthcet/github-workflow-as-kube#15 to fix the CI.(after that, llmaz should bump the workflow version)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Categorizes issue or PR as related to documentation. feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a label and requires one. needs-triage Indicates an issue or PR lacks a label and requires one.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[OSPP] KEDA-based Serverless Elastic Scaling for llmaz

4 participants