Skip to content

Commit 4890d93

Browse files
committed
doc: add serverless doc with keda and activator.
Signed-off-by: X1aoZEOuO <[email protected]>
1 parent 20f2990 commit 4890d93

File tree

1 file changed

+140
-0
lines changed

1 file changed

+140
-0
lines changed
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
---
2+
title: Serverless
3+
weight: 4
4+
---
5+
6+
## Overview
7+
8+
This comprehensive guide provides enterprise-grade configuration patterns for serverless environments on Kubernetes, focusing on advanced integrations between Prometheus monitoring and KEDA autoscaling. The architecture delivers optimal resource efficiency through event-driven scaling while maintaining observability and resilience for AI/ML workloads and other latency-sensitive applications.
9+
10+
## Concepts
11+
12+
### Prometheus Configuration
13+
14+
Prometheus is utilized for monitoring and alerting purposes. To enable cross-namespace ServiceMonitor discovery, configure the `namespaceSelector`. In Prometheus, define the `serviceMonitorSelector` to associate with ServiceMonitors.
15+
16+
```yaml
17+
apiVersion: monitoring.coreos.com/v1
18+
kind: ServiceMonitor
19+
metadata:
20+
name: qwen2-0--5b-lb-monitor
21+
namespace: llmaz-system
22+
labels:
23+
control-plane: controller-manager
24+
app.kubernetes.io/name: servicemonitor
25+
spec:
26+
namespaceSelector:
27+
any: true
28+
selector:
29+
matchLabels:
30+
llmaz.io/model-name: qwen2-0--5b
31+
endpoints:
32+
- port: http
33+
path: /metrics
34+
scheme: http
35+
```
36+
37+
- Ensure the `namespaceSelector` is configured to allow cross-namespace monitoring.
38+
- Appropriately label your services to be discovered by Prometheus.
39+
40+
### KEDA Configuration
41+
42+
KEDA (Kubernetes Event-driven Autoscaling) is employed for scaling applications based on custom metrics. It can be integrated with Prometheus to trigger scaling actions.
43+
44+
```yaml
45+
apiVersion: keda.sh/v1alpha1
46+
kind: ScaledObject
47+
metadata:
48+
name: qwen2-0--5b-scaler
49+
namespace: default
50+
spec:
51+
scaleTargetRef:
52+
apiVersion: inference.llmaz.io/v1alpha1
53+
kind: Playground
54+
name: qwen2-0--5b
55+
pollingInterval: 30
56+
cooldownPeriod: 50
57+
minReplicaCount: 0
58+
maxReplicaCount: 3
59+
triggers:
60+
- type: prometheus
61+
metadata:
62+
serverAddress: http://prometheus-operated.llmaz-system.svc.cluster.local:9090
63+
metricName: llamacpp:requests_processing
64+
query: sum(llamacpp:requests_processing)
65+
threshold: "0.2"
66+
```
67+
68+
- Ensure the `serverAddress` correctly points to the Prometheus service.
69+
- Adjust `pollingInterval` and `cooldownPeriod` to optimize scaling behavior and prevent conflicts with other scaling mechanisms.
70+
71+
### Integration with Activator
72+
73+
Consider integrating the serverless configuration with an activator for scale-from-zero scenarios. The activator can be implemented using a controller pattern or as a standalone goroutine.
74+
75+
Key Architecture Components:
76+
- Request Interception: Capture incoming requests to scaled-to-zero services
77+
- Pre-Scale Trigger: Initiate scale-up before forwarding requests
78+
- Request Buffering: Queue requests during cold start period
79+
- Event-Driven Scaling: Integrate with KEDA using CloudEvents:
80+
81+
### Controller Runtime Framework
82+
83+
The Controller Runtime framework simplifies the development of Kubernetes controllers by providing abstractions for managing resources and handling events.
84+
85+
#### Key Components
86+
87+
1. **Controller**: Monitors resource states and triggers actions to align actual and desired states.
88+
2. **Reconcile Function**: Contains the core logic for transitioning resource states.
89+
3. **Manager**: Manages the lifecycle of controllers and shared resources.
90+
4. **Client**: Interface for interacting with the Kubernetes API.
91+
5. **Scheme**: Registry for resource types.
92+
6. **Event Source and Handler**: Define event sources and handling logic.
93+
94+
## Quick Start Guide
95+
96+
1. Install Prometheus and KEDA using Helm charts, following the official documentation [Install Guide](https://llmaz.inftyai.com/docs/getting-started/installation/).
97+
98+
```bash
99+
helm install llmaz oci://registry-1.docker.io/inftyai/llmaz --namespace llmaz-system --create-namespace --version 0.0.10
100+
make install-keda
101+
make install-prometheus
102+
```
103+
104+
2. Create a ServiceMonitor for Prometheus to discover your services.
105+
106+
```bash
107+
kubectl apply -f service-monitor.yaml
108+
```
109+
110+
3. Create a ScaledObject for KEDA to manage scaling.
111+
112+
```bash
113+
kubectl apply -f scaled-object.yaml
114+
```
115+
116+
4. Test with a cold start application.
117+
118+
```bash
119+
kubectl exec -it -n kube-system deploy/activator -- wget -O- qwen2-0--5b-lb.default.svc:8080
120+
```
121+
122+
5. Use Prometheus and KEDA dashboards to monitor metrics and scaling activities via web pages.
123+
124+
```bash
125+
kubectl port-forward services/prometheus-operated 9090:9090 --address 0.0.0.0 -n llmaz-system
126+
```
127+
128+
## Benchmark
129+
130+
Cold start latency is a critical metric for evaluating user experience in llmaz Serverless environments. To assess performance stability and efficiency, we conducted rigorous testing under different instance scaling scenarios. The testing methodology included:
131+
132+
| Scaling Pattern | Avg. Latency (s) | P90 Latency (s) | Resource Initialization | Optimization Potential |
133+
|-----------------|------------------|-----------------|-------------------------|-------------------------|
134+
| **0 -> 1** | 29 | 31 | Full pod creation<br>Image pull<br>Engine initialization | Pre-fetching<br>Snapshot restore |
135+
| **1 -> 2** | 15 | 16 | Partial image reuse<br>Network reuse<br>Pod creation | Warm pool<br>Priority scheduling |
136+
| **2 -> 3** | 11 | 12 | Cached dependencies<br>Parallel scheduling<br>Shared resources | Predictive scaling<br>Node affinity |
137+
138+
## Conclusion
139+
140+
This configuration guide offers a detailed approach to setting up a serverless environment with Kubernetes, Prometheus, and KEDA. By adhering to these guidelines, you can ensure efficient scaling and monitoring of your applications.

0 commit comments

Comments
 (0)