Skip to content

Commit 8bfa57c

Browse files
tkatilaeero-t
andcommitted
operator: add gpu plugin's by-path option
Signed-off-by: Tuomas Katila <[email protected]> Co-authored-by: Eero Tamminen <[email protected]>
1 parent 2ffbe4b commit 8bfa57c

File tree

5 files changed

+23
-3
lines changed

5 files changed

+23
-3
lines changed

cmd/gpu_plugin/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -275,8 +275,8 @@ To support possible all use cases, GPU plugin allows changing the by-path mounti
275275
* `single` - Symlinks are individually mounted per device. Default.
276276
* Mostly Works, but is known to have issues with some pytorch workloads. See [issue](https://github.com/intel/intel-device-plugins-for-kubernetes/issues/2158).
277277
* `none` - No symlinks are mounted.
278-
* Aligned with docker use where devices are included with privileged mode.
279-
* `all` - All symlinks are mounted even if only one is allocated by the container.
278+
* Aligned with Docker `privileged` mode devices usage.
279+
* `all` - Mounts whole DRM `by-path` directory. Pro: symlink file types are preserved. Con: symlinks are present for all devices.
280280
* Optimal for scale-up workloads where all the GPUs are used by the workload.
281281

282282
### Issues with media workloads on multi-GPU setups

cmd/gpu_plugin/gpu_plugin.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -811,7 +811,7 @@ func main() {
811811
flag.StringVar(&prefix, "prefix", "", "Prefix for devfs & sysfs paths")
812812
flag.BoolVar(&opts.enableMonitoring, "enable-monitoring", false, "whether to enable '*_monitoring' (= all GPUs) resource")
813813
flag.BoolVar(&opts.healthManagement, "health-management", false, "enable GPU health management")
814-
flag.StringVar(&opts.bypathMount, "bypath", bypathOptionSingle, "bypath mounting options: single, none, all. Default: single")
814+
flag.StringVar(&opts.bypathMount, "bypath", bypathOptionSingle, "DRI device 'by-path/' directory mounting options: single, none, all. Default: single")
815815
flag.BoolVar(&opts.wslScan, "wsl", false, "scan for / use WSL devices")
816816
flag.IntVar(&opts.sharedDevNum, "shared-dev-num", 1, "number of containers sharing the same GPU device")
817817
flag.IntVar(&opts.globalTempLimit, "temp-limit", 100, "Global temperature limit at which device is marked unhealthy")

deployments/operator/crd/bases/deviceplugin.intel.com_gpudeviceplugins.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,16 @@ spec:
6262
The list can contain IDs in the form of '0x1234,0x49a4,0x50b4'.
6363
Cannot be used together with DenyIDs.
6464
type: string
65+
bypathMode:
66+
description: |-
67+
ByPathMode changes how plugin handles the DRM by-path/ directory mounting for GPU devices.
68+
See GPU plugin documentation for detailed description of the modes.
69+
If left empty, it defaults to 'single'.
70+
enum:
71+
- none
72+
- single
73+
- all
74+
type: string
6575
denyIDs:
6676
description: |-
6777
DenyIDs is a comma-separated list of PCI IDs of GPU devices that should only be denied by the plugin.

pkg/apis/deviceplugin/v1/gpudeviceplugin_types.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,12 @@ type GpuDevicePluginSpec struct {
5151
// +kubebuilder:validation:Enum=balanced;packed;none
5252
PreferredAllocationPolicy string `json:"preferredAllocationPolicy,omitempty"`
5353

54+
// ByPathMode changes how plugin handles the DRM by-path/-dir mounting for GPU devices.
55+
// See GPU plugin documentation for detailed description of the modes.
56+
// If left empty, it defaults to 'single'.
57+
// +kubebuilder:validation:Enum=none;single;all
58+
ByPathMode string `json:"bypathMode,omitempty"`
59+
5460
// Specialized nodes (e.g., with accelerators) can be Tainted to make sure unwanted pods are not scheduled on them. Tolerations can be set for the plugin pod to neutralize the Taint.
5561
Tolerations []v1.Toleration `json:"tolerations,omitempty"`
5662

pkg/controllers/gpu/controller.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -285,5 +285,9 @@ func getPodArgs(gdp *devicepluginv1.GpuDevicePlugin) []string {
285285
args = append(args, "-deny-ids", gdp.Spec.DenyIDs)
286286
}
287287

288+
if gdp.Spec.ByPathMode != "" {
289+
args = append(args, "-bypath", gdp.Spec.ByPathMode)
290+
}
291+
288292
return args
289293
}

0 commit comments

Comments
 (0)