Skip to content
This repository was archived by the owner on Jan 29, 2025. It is now read-only.

Commit 6dede47

Browse files
uniemimutogashidm
authored andcommitted
GAS documentation update
Documentation for the new features. This also removes the configure-scheduler script duplication and just suggests to use the instructions from TAS instead. The script there has been updated for k8s 1.22+. Signed-off-by: Ukri Niemimuukko <[email protected]>
1 parent a10f38e commit 6dede47

File tree

5 files changed

+102
-96
lines changed

5 files changed

+102
-96
lines changed

gpu-aware-scheduling/README.md

+3-55
Original file line numberDiff line numberDiff line change
@@ -32,61 +32,9 @@ A worked example for GAS is available [here](docs/usage.md)
3232
The deploy folder has all of the yaml files necessary to get GPU Aware Scheduling running in a Kubernetes cluster. Some additional steps are required to configure the generic scheduler.
3333

3434
#### Extender configuration
35-
Note: a shell script that shows these steps can be found [here](deploy/extender-configuration). This script should be seen as a guide only, and will not work on most Kubernetes installations.
36-
37-
The extender configuration files can be found under deploy/extender-configuration.
38-
GAS Scheduler Extender needs to be registered with the Kubernetes Scheduler. In order to do this a configmap should be created like the below:
39-
```
40-
apiVersion: v1alpha1
41-
kind: ConfigMap
42-
metadata:
43-
name: scheduler-extender-policy
44-
namespace: kube-system
45-
data:
46-
policy.cfg: |
47-
{
48-
"kind" : "Policy",
49-
"apiVersion" : "v1",
50-
"extenders" : [
51-
{
52-
"urlPrefix": "https://gpu-service.default.svc.cluster.local:9001",
53-
"apiVersion": "v1",
54-
"filterVerb": "scheduler/filter",
55-
"bindVerb": "scheduler/bind",
56-
"weight": 1,
57-
"enableHttps": true,
58-
"managedResources": [
59-
{
60-
"name": "gpu.intel.com/i915",
61-
"ignoredByScheduler": false
62-
}
63-
],
64-
"ignorable": true,
65-
"nodeCacheCapable": true
66-
"tlsConfig": {
67-
"insecure": false,
68-
"certFile": "/host/certs/client.crt",
69-
"keyFile" : "/host/certs/client.key"
70-
}
71-
}
72-
]
73-
}
74-
75-
```
76-
77-
A similar file can be found [in the deploy folder](./deploy/extender-configuration/scheduler-extender-configmap.yaml). This configmap can be created with ``kubectl apply -f ./deploy/scheduler-extender-configmap.yaml``
78-
The scheduler requires flags passed to it in order to know the location of this config map. The flags are:
79-
```
80-
- --policy-configmap=scheduler-extender-policy
81-
- --policy-configmap-namespace=kube-system
82-
```
83-
84-
If scheduler is running as a service these can be added as flags to the binary. If scheduler is running as a container - as in kubeadm - these args can be passed in the deployment file.
85-
Note: For Kubeadm set ups some additional steps may be needed.
86-
1) Add the ability to get configmaps to the kubeadm scheduler config map. (A cluster role binding for this is at deploy/extender-configuration/configmap-getter.yaml)
87-
2) Add the ``dnsPolicy: ClusterFirstWithHostNet`` in order to access the scheduler extender by service name.
88-
89-
After these steps the scheduler extender should be registered with the Kubernetes Scheduler.
35+
You should follow extender configuration instructions from the
36+
[Telemetry Aware Scheduling](../telemetry-aware-scheduling/README.md#Extender-configuration) and adapt those instructions to
37+
use GPU Aware Scheduling configurations, which can be found in the [deploy/extender-configuration](deploy/extender-configuration) folder.
9038

9139
#### Deploy GAS
9240
GPU Aware Scheduling uses go modules. It requires Go 1.13+ with modules enabled in order to build. GAS has been tested with Kubernetes 1.15+.

gpu-aware-scheduling/deploy/extender-configuration/configure-scheduler.sh

-41
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
apiVersion: kubescheduler.config.k8s.io/XVERSIONX
2+
kind: KubeSchedulerConfiguration
3+
clientConnection:
4+
kubeconfig: /etc/kubernetes/scheduler.conf
5+
extenders:
6+
- urlPrefix: "https://tas-service.default.svc.cluster.local:9001"
7+
prioritizeVerb: "scheduler/prioritize"
8+
filterVerb: "scheduler/filter"
9+
weight: 1
10+
enableHTTPS: true
11+
managedResources:
12+
- name: "telemetry/scheduling"
13+
ignoredByScheduler: true
14+
ignorable: true
15+
tlsConfig:
16+
insecure: false
17+
certFile: "/host/certs/client.crt"
18+
keyFile: "/host/certs/client.key"
19+
- urlPrefix: "https://gas-service.default.svc.cluster.local:9001"
20+
filterVerb: "scheduler/filter"
21+
bindVerb: "scheduler/bind"
22+
weight: 1
23+
enableHTTPS: true
24+
managedResources:
25+
- name: "gpu.intel.com/i915"
26+
ignoredByScheduler: false
27+
ignorable: true
28+
tlsConfig:
29+
insecure: false
30+
certFile: "/host/certs/client.crt"
31+
keyFile: "/host/certs/client.key"
32+
nodeCacheCapable: true
33+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
apiVersion: kubescheduler.config.k8s.io/XVERSIONX
2+
kind: KubeSchedulerConfiguration
3+
clientConnection:
4+
kubeconfig: /etc/kubernetes/scheduler.conf
5+
extenders:
6+
- urlPrefix: "https://gas-service.default.svc.cluster.local:9001"
7+
filterVerb: "scheduler/filter"
8+
bindVerb: "scheduler/bind"
9+
weight: 1
10+
enableHTTPS: true
11+
managedResources:
12+
- name: "gpu.intel.com/i915"
13+
ignoredByScheduler: false
14+
ignorable: true
15+
tlsConfig:
16+
insecure: false
17+
certFile: "/host/certs/client.crt"
18+
keyFile: "/host/certs/client.key"
19+
nodeCacheCapable: true
20+

gpu-aware-scheduling/docs/usage.md

+46
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,52 @@ Your PODs then, needs to ask for some GPU-resources. Like this:
6565

6666
A complete example pod yaml is located in [docs/example](./example)
6767

68+
## Node Label support
69+
70+
GAS supports certain node labels as a means to allow telemetry based GPU selection decisions and
71+
descheduling of PODs using a certain GPU. You can create node labels with the
72+
[Telemetry Aware Scheduling](../../telemetry-aware-scheduling/README.md) labeling strategy,
73+
which puts them in its own namespace. In practice the supported labels need to be in the
74+
`telemetry.aware.scheduling.POLICYNAME/` namespace, where the POLICYNAME may be anything.
75+
76+
The node label `gas-deschedule-pods-GPUNAME` where the GPUNAME can be e.g. card0, card1, card2...
77+
which corresponds to the gpu names under /dev/dri, will result in GAS labeling the PODs which
78+
use the named GPU with the `gpu.aware.scheduling/deschedule-pod=gpu` label. You may then
79+
use with a kubernetes descheduler to pick the pods for descheduling. So TAS labels the node, and
80+
based on the node label GAS finds and labels the PODs. Descheduler can be configured to
81+
deschedule the pods based on pod labels.
82+
83+
The node label `gas-disable-GPUNAME` where the GPUNAME can be e.g. card0, card1, card2... which
84+
corresponds to the gpu names under /dev/dri, will result in GAS stopping the use of the named
85+
GPU for new allocations.
86+
87+
The node label `gas-prefer-gpu=GPUNAME` where the GPUNAME can be e.g. card0, card1, card2...
88+
which corresponds to the gpu names under /dev/dri, will result in GAS trying to use the named
89+
GPU for new allocations before other GPUs of the same node.
90+
91+
Note that the value of the labels starting with gas-deschedule-pods-GPUNAME and
92+
gas-disable-GPUNAME doesn't matter. You may use e.g. "true" as the value. The only exception to
93+
the rule is `PCI_GROUP` which has a special meaning, explained separately. Example:
94+
`gas-disable-card0=PCI_GROUP`.
95+
96+
### PCI Groups
97+
98+
If GAS finds a node label `gas-disable-GPUNAME=PCI_GROUP` where the GPUNAME can be e.g. card0,
99+
card1, card2... which corresponds to the gpu names under /dev/dri, the disabling will impact a
100+
group of GPUs which is defined in the node label `gpu.intel.com/pci-groups`. The syntax of the
101+
pci group node label is easiest to explain with an example: `gpu.intel.com/pci-groups=0.1_2.3.4`
102+
would indicate there are two pci-groups in the node separated with an underscore, in which card0
103+
and card1 form the first group, and card2, card3 and card4 form the second group. If GAS would
104+
find the node label `gas-disable-card3=PCI_GROUP` in a node with the previous example PCI-group
105+
label, GAS would stop using card2, card3 and card4 for new allocations, as card3 belongs in that
106+
group.
107+
108+
`gas-deschedule-pods-GPUNAME` supports the PCI-GROUP value similarly, the whole group in which
109+
the named gpu belongs, will end up descheduled.
110+
111+
The PCI group feature allows for e.g. having a telemetry action to operate on all GPUs which
112+
share the same physical card.
113+
68114
## Allowlist and Denylist
69115

70116
You can use POD-annotations in your POD-templates to list the GPU names which you allow, or deny for your deployment. The values for the annotations are comma separated value lists of the form "card0,card1,card2", and the names of the annotations are:

0 commit comments

Comments
 (0)