You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 29, 2025. It is now read-only.
Documentation for the new features.
This also removes the configure-scheduler script duplication and just
suggests to use the instructions from TAS instead. The script there
has been updated for k8s 1.22+.
Signed-off-by: Ukri Niemimuukko <[email protected]>
Copy file name to clipboardExpand all lines: gpu-aware-scheduling/README.md
+3-55
Original file line number
Diff line number
Diff line change
@@ -32,61 +32,9 @@ A worked example for GAS is available [here](docs/usage.md)
32
32
The deploy folder has all of the yaml files necessary to get GPU Aware Scheduling running in a Kubernetes cluster. Some additional steps are required to configure the generic scheduler.
33
33
34
34
#### Extender configuration
35
-
Note: a shell script that shows these steps can be found [here](deploy/extender-configuration). This script should be seen as a guide only, and will not work on most Kubernetes installations.
36
-
37
-
The extender configuration files can be found under deploy/extender-configuration.
38
-
GAS Scheduler Extender needs to be registered with the Kubernetes Scheduler. In order to do this a configmap should be created like the below:
A similar file can be found [in the deploy folder](./deploy/extender-configuration/scheduler-extender-configmap.yaml). This configmap can be created with ``kubectl apply -f ./deploy/scheduler-extender-configmap.yaml``
78
-
The scheduler requires flags passed to it in order to know the location of this config map. The flags are:
79
-
```
80
-
- --policy-configmap=scheduler-extender-policy
81
-
- --policy-configmap-namespace=kube-system
82
-
```
83
-
84
-
If scheduler is running as a service these can be added as flags to the binary. If scheduler is running as a container - as in kubeadm - these args can be passed in the deployment file.
85
-
Note: For Kubeadm set ups some additional steps may be needed.
86
-
1) Add the ability to get configmaps to the kubeadm scheduler config map. (A cluster role binding for this is at deploy/extender-configuration/configmap-getter.yaml)
87
-
2) Add the ``dnsPolicy: ClusterFirstWithHostNet`` in order to access the scheduler extender by service name.
88
-
89
-
After these steps the scheduler extender should be registered with the Kubernetes Scheduler.
35
+
You should follow extender configuration instructions from the
36
+
[Telemetry Aware Scheduling](../telemetry-aware-scheduling/README.md#Extender-configuration) and adapt those instructions to
37
+
use GPU Aware Scheduling configurations, which can be found in the [deploy/extender-configuration](deploy/extender-configuration) folder.
90
38
91
39
#### Deploy GAS
92
40
GPU Aware Scheduling uses go modules. It requires Go 1.13+ with modules enabled in order to build. GAS has been tested with Kubernetes 1.15+.
which puts them in its own namespace. In practice the supported labels need to be in the
74
+
`telemetry.aware.scheduling.POLICYNAME/` namespace, where the POLICYNAME may be anything.
75
+
76
+
The node label `gas-deschedule-pods-GPUNAME` where the GPUNAME can be e.g. card0, card1, card2...
77
+
which corresponds to the gpu names under /dev/dri, will result in GAS labeling the PODs which
78
+
use the named GPU with the `gpu.aware.scheduling/deschedule-pod=gpu` label. You may then
79
+
use with a kubernetes descheduler to pick the pods for descheduling. So TAS labels the node, and
80
+
based on the node label GAS finds and labels the PODs. Descheduler can be configured to
81
+
deschedule the pods based on pod labels.
82
+
83
+
The node label `gas-disable-GPUNAME` where the GPUNAME can be e.g. card0, card1, card2... which
84
+
corresponds to the gpu names under /dev/dri, will result in GAS stopping the use of the named
85
+
GPU for new allocations.
86
+
87
+
The node label `gas-prefer-gpu=GPUNAME` where the GPUNAME can be e.g. card0, card1, card2...
88
+
which corresponds to the gpu names under /dev/dri, will result in GAS trying to use the named
89
+
GPU for new allocations before other GPUs of the same node.
90
+
91
+
Note that the value of the labels starting with gas-deschedule-pods-GPUNAME and
92
+
gas-disable-GPUNAME doesn't matter. You may use e.g. "true" as the value. The only exception to
93
+
the rule is `PCI_GROUP` which has a special meaning, explained separately. Example:
94
+
`gas-disable-card0=PCI_GROUP`.
95
+
96
+
### PCI Groups
97
+
98
+
If GAS finds a node label `gas-disable-GPUNAME=PCI_GROUP` where the GPUNAME can be e.g. card0,
99
+
card1, card2... which corresponds to the gpu names under /dev/dri, the disabling will impact a
100
+
group of GPUs which is defined in the node label `gpu.intel.com/pci-groups`. The syntax of the
101
+
pci group node label is easiest to explain with an example: `gpu.intel.com/pci-groups=0.1_2.3.4`
102
+
would indicate there are two pci-groups in the node separated with an underscore, in which card0
103
+
and card1 form the first group, and card2, card3 and card4 form the second group. If GAS would
104
+
find the node label `gas-disable-card3=PCI_GROUP` in a node with the previous example PCI-group
105
+
label, GAS would stop using card2, card3 and card4 for new allocations, as card3 belongs in that
106
+
group.
107
+
108
+
`gas-deschedule-pods-GPUNAME` supports the PCI-GROUP value similarly, the whole group in which
109
+
the named gpu belongs, will end up descheduled.
110
+
111
+
The PCI group feature allows for e.g. having a telemetry action to operate on all GPUs which
112
+
share the same physical card.
113
+
68
114
## Allowlist and Denylist
69
115
70
116
You can use POD-annotations in your POD-templates to list the GPU names which you allow, or deny for your deployment. The values for the annotations are comma separated value lists of the form "card0,card1,card2", and the names of the annotations are:
0 commit comments