Skip to content
This repository was archived by the owner on Jan 29, 2025. It is now read-only.

Commit 8529a6b

Browse files
uniemimutogashidm
authored andcommitted
Review fixes
Signed-off-by: Ukri Niemimuukko <[email protected]>
1 parent 6dede47 commit 8529a6b

File tree

3 files changed

+17
-19
lines changed

3 files changed

+17
-19
lines changed

gpu-aware-scheduling/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ You should follow extender configuration instructions from the
3737
use GPU Aware Scheduling configurations, which can be found in the [deploy/extender-configuration](deploy/extender-configuration) folder.
3838

3939
#### Deploy GAS
40-
GPU Aware Scheduling uses go modules. It requires Go 1.13+ with modules enabled in order to build. GAS has been tested with Kubernetes 1.15+.
40+
GPU Aware Scheduling uses go modules. It requires Go 1.16 with modules enabled in order to build. GAS has been tested with Kubernetes 1.22.
4141
A yaml file for GAS is contained in the deploy folder along with its service and RBAC roles and permissions.
4242

4343
**Note:** If run without the unsafe flag a secret called extender-secret will need to be created with the cert and key for the TLS endpoint.

gpu-aware-scheduling/docs/usage.md

+15-17
Original file line numberDiff line numberDiff line change
@@ -71,32 +71,30 @@ GAS supports certain node labels as a means to allow telemetry based GPU selecti
7171
descheduling of PODs using a certain GPU. You can create node labels with the
7272
[Telemetry Aware Scheduling](../../telemetry-aware-scheduling/README.md) labeling strategy,
7373
which puts them in its own namespace. In practice the supported labels need to be in the
74-
`telemetry.aware.scheduling.POLICYNAME/` namespace, where the POLICYNAME may be anything.
74+
`telemetry.aware.scheduling.POLICYNAME/`[^1] namespace.
7575

76-
The node label `gas-deschedule-pods-GPUNAME` where the GPUNAME can be e.g. card0, card1, card2...
77-
which corresponds to the gpu names under /dev/dri, will result in GAS labeling the PODs which
78-
use the named GPU with the `gpu.aware.scheduling/deschedule-pod=gpu` label. You may then
79-
use with a kubernetes descheduler to pick the pods for descheduling. So TAS labels the node, and
80-
based on the node label GAS finds and labels the PODs. Descheduler can be configured to
81-
deschedule the pods based on pod labels.
76+
The node label `gas-deschedule-pods-GPUNAME`[^2] will result in GAS labeling the PODs which
77+
use the named GPU with the `gpu.aware.scheduling/deschedule-pod=gpu` label. So TAS labels the node,
78+
and based on the node label GAS finds and labels the PODs. You may then use a kubernetes descheduler
79+
to pick the pods for descheduling via their labels.
8280

83-
The node label `gas-disable-GPUNAME` where the GPUNAME can be e.g. card0, card1, card2... which
84-
corresponds to the gpu names under /dev/dri, will result in GAS stopping the use of the named
85-
GPU for new allocations.
81+
The node label `gas-disable-GPUNAME`[^2] will result in GAS stopping the use of the named GPU for new
82+
allocations.
8683

87-
The node label `gas-prefer-gpu=GPUNAME` where the GPUNAME can be e.g. card0, card1, card2...
88-
which corresponds to the gpu names under /dev/dri, will result in GAS trying to use the named
84+
The node label `gas-prefer-gpu=GPUNAME`[^2] will result in GAS trying to use the named
8985
GPU for new allocations before other GPUs of the same node.
9086

91-
Note that the value of the labels starting with gas-deschedule-pods-GPUNAME and
92-
gas-disable-GPUNAME doesn't matter. You may use e.g. "true" as the value. The only exception to
87+
Note that the value of the labels starting with `gas-deschedule-pods-GPUNAME`[^2] and
88+
`gas-disable-GPUNAME`[^2] doesn't matter. You may use e.g. "true" as the value. The only exception to
9389
the rule is `PCI_GROUP` which has a special meaning, explained separately. Example:
9490
`gas-disable-card0=PCI_GROUP`.
9591

92+
[^1]: POLICYNAME is defined by the name of the TASPolicy. It can vary.
93+
[^2]: GPUNAME can be e.g. card0, card1, card2… which corresponds to the gpu names under `/dev/dri`.
94+
9695
### PCI Groups
9796

98-
If GAS finds a node label `gas-disable-GPUNAME=PCI_GROUP` where the GPUNAME can be e.g. card0,
99-
card1, card2... which corresponds to the gpu names under /dev/dri, the disabling will impact a
97+
If GAS finds a node label `gas-disable-GPUNAME=PCI_GROUP`[^2] the disabling will impact a
10098
group of GPUs which is defined in the node label `gpu.intel.com/pci-groups`. The syntax of the
10199
pci group node label is easiest to explain with an example: `gpu.intel.com/pci-groups=0.1_2.3.4`
102100
would indicate there are two pci-groups in the node separated with an underscore, in which card0
@@ -105,7 +103,7 @@ find the node label `gas-disable-card3=PCI_GROUP` in a node with the previous ex
105103
label, GAS would stop using card2, card3 and card4 for new allocations, as card3 belongs in that
106104
group.
107105

108-
`gas-deschedule-pods-GPUNAME` supports the PCI-GROUP value similarly, the whole group in which
106+
`gas-deschedule-pods-GPUNAME`[^2] supports the PCI_GROUP value similarly, the whole group in which
109107
the named gpu belongs, will end up descheduled.
110108

111109
The PCI group feature allows for e.g. having a telemetry action to operate on all GPUs which

gpu-aware-scheduling/pkg/gpuscheduler/node_resource_cache.go

+1-1
Original file line numberDiff line numberDiff line change
@@ -308,7 +308,7 @@ func (c *Cache) checkPodResourceAdjustment(containerRequests []resourceMap,
308308
}
309309

310310
// This must be called with rwmutex locked
311-
// set add=true to add, false to remove resources.
311+
// set adj=true to add, false to remove resources.
312312
func (c *Cache) adjustPodResources(pod *v1.Pod, adj bool, annotation, nodeName string) error {
313313
// get slice of resource maps, one map per container
314314
containerRequests := containerRequests(pod)

0 commit comments

Comments
 (0)