@@ -71,32 +71,30 @@ GAS supports certain node labels as a means to allow telemetry based GPU selecti
71
71
descheduling of PODs using a certain GPU. You can create node labels with the
72
72
[ Telemetry Aware Scheduling] ( ../../telemetry-aware-scheduling/README.md ) labeling strategy,
73
73
which puts them in its own namespace. In practice the supported labels need to be in the
74
- ` telemetry.aware.scheduling.POLICYNAME/ ` namespace, where the POLICYNAME may be anything .
74
+ ` telemetry.aware.scheduling.POLICYNAME/ ` [ ^ 1 ] namespace.
75
75
76
- The node label ` gas-deschedule-pods-GPUNAME ` where the GPUNAME can be e.g. card0, card1, card2...
77
- which corresponds to the gpu names under /dev/dri, will result in GAS labeling the PODs which
78
- use the named GPU with the ` gpu.aware.scheduling/deschedule-pod=gpu ` label. You may then
79
- use with a kubernetes descheduler to pick the pods for descheduling. So TAS labels the node, and
80
- based on the node label GAS finds and labels the PODs. Descheduler can be configured to
81
- deschedule the pods based on pod labels.
76
+ The node label ` gas-deschedule-pods-GPUNAME ` [ ^ 2 ] will result in GAS labeling the PODs which
77
+ use the named GPU with the ` gpu.aware.scheduling/deschedule-pod=gpu ` label. So TAS labels the node,
78
+ and based on the node label GAS finds and labels the PODs. You may then use a kubernetes descheduler
79
+ to pick the pods for descheduling via their labels.
82
80
83
- The node label ` gas-disable-GPUNAME ` where the GPUNAME can be e.g. card0, card1, card2... which
84
- corresponds to the gpu names under /dev/dri, will result in GAS stopping the use of the named
85
- GPU for new allocations.
81
+ The node label ` gas-disable-GPUNAME ` [ ^ 2 ] will result in GAS stopping the use of the named GPU for new
82
+ allocations.
86
83
87
- The node label ` gas-prefer-gpu=GPUNAME ` where the GPUNAME can be e.g. card0, card1, card2...
88
- which corresponds to the gpu names under /dev/dri, will result in GAS trying to use the named
84
+ The node label ` gas-prefer-gpu=GPUNAME ` [ ^ 2 ] will result in GAS trying to use the named
89
85
GPU for new allocations before other GPUs of the same node.
90
86
91
- Note that the value of the labels starting with gas-deschedule-pods-GPUNAME and
92
- gas-disable-GPUNAME doesn't matter. You may use e.g. "true" as the value. The only exception to
87
+ Note that the value of the labels starting with ` gas-deschedule-pods-GPUNAME ` [ ^ 2 ] and
88
+ ` gas-disable-GPUNAME ` [ ^ 2 ] doesn't matter. You may use e.g. "true" as the value. The only exception to
93
89
the rule is ` PCI_GROUP ` which has a special meaning, explained separately. Example:
94
90
` gas-disable-card0=PCI_GROUP ` .
95
91
92
+ [ ^ 1 ] : POLICYNAME is defined by the name of the TASPolicy. It can vary.
93
+ [ ^ 2 ] : GPUNAME can be e.g. card0, card1, card2… which corresponds to the gpu names under ` /dev/dri ` .
94
+
96
95
### PCI Groups
97
96
98
- If GAS finds a node label ` gas-disable-GPUNAME=PCI_GROUP ` where the GPUNAME can be e.g. card0,
99
- card1, card2... which corresponds to the gpu names under /dev/dri, the disabling will impact a
97
+ If GAS finds a node label ` gas-disable-GPUNAME=PCI_GROUP ` [ ^ 2 ] the disabling will impact a
100
98
group of GPUs which is defined in the node label ` gpu.intel.com/pci-groups ` . The syntax of the
101
99
pci group node label is easiest to explain with an example: ` gpu.intel.com/pci-groups=0.1_2.3.4 `
102
100
would indicate there are two pci-groups in the node separated with an underscore, in which card0
@@ -105,7 +103,7 @@ find the node label `gas-disable-card3=PCI_GROUP` in a node with the previous ex
105
103
label, GAS would stop using card2, card3 and card4 for new allocations, as card3 belongs in that
106
104
group.
107
105
108
- ` gas-deschedule-pods-GPUNAME ` supports the PCI-GROUP value similarly, the whole group in which
106
+ ` gas-deschedule-pods-GPUNAME ` [ ^ 2 ] supports the PCI_GROUP value similarly, the whole group in which
109
107
the named gpu belongs, will end up descheduled.
110
108
111
109
The PCI group feature allows for e.g. having a telemetry action to operate on all GPUs which
0 commit comments