Skip to content

Commit

Permalink
DRA API: bump maximum size of ReservedFor to 256
Browse files Browse the repository at this point in the history
The original limit of 32 seemed sufficient for a single GPU on a node. But for
shared non-local resources it is too low. For example, a ResourceClaim might be
used to allocate an interconnect channel that connects all pods of a workload
running on several different nodes, in which case the number of pods can be
considerably larger.

256 is high enough for currently planned systems. If we need something even
higher in the future, an alternative approach might be needed to avoid
scalability problems.

Normally, increasing such a limit would have to be done incrementally over two
releases. In this case we decided on
Slack (https://kubernetes.slack.com/archives/CJUQN3E4T/p1734593174791519) to
make an exception and apply this change to current master for 1.33 and backport
it to the next 1.32.x patch release for production usage.

This breaks downgrades to a 1.32 release without this change if there are
ResourceClaims with a number of consumers > 32 in ReservedFor. In practice,
this breakage is very unlikely because there are no workloads yet which need so
many consumers and such downgrades to a previous patch release are also
unlikely. Downgrades to 1.31 already weren't supported when using DRA v1beta1.
  • Loading branch information
pohly committed Jan 9, 2025
1 parent 117a48f commit a5de754
Show file tree
Hide file tree
Showing 11 changed files with 19 additions and 19 deletions.
4 changes: 2 additions & 2 deletions api/openapi-spec/swagger.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions pkg/apis/resource/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -689,7 +689,7 @@ type ResourceClaimStatus struct {
// which issued it knows that it must put the pod back into the queue,
// waiting for the ResourceClaim to become usable again.
//
// There can be at most 32 such reservations. This may get increased in
// There can be at most 256 such reservations. This may get increased in
// the future, but not reduced.
//
// +optional
Expand Down Expand Up @@ -717,9 +717,9 @@ type ResourceClaimStatus struct {
Devices []AllocatedDeviceStatus
}

// ReservedForMaxSize is the maximum number of entries in
// ResourceClaimReservedForMaxSize is the maximum number of entries in
// claim.status.reservedFor.
const ResourceClaimReservedForMaxSize = 32
const ResourceClaimReservedForMaxSize = 256

// ResourceClaimConsumerReference contains enough information to let you
// locate the consumer of a ResourceClaim. The user must be a resource in the same
Expand Down
4 changes: 2 additions & 2 deletions pkg/generated/openapi/zz_generated.openapi.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion staging/src/k8s.io/api/resource/v1alpha3/generated.proto

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions staging/src/k8s.io/api/resource/v1alpha3/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -687,7 +687,7 @@ type ResourceClaimStatus struct {
// which issued it knows that it must put the pod back into the queue,
// waiting for the ResourceClaim to become usable again.
//
// There can be at most 32 such reservations. This may get increased in
// There can be at most 256 such reservations. This may get increased in
// the future, but not reduced.
//
// +optional
Expand Down Expand Up @@ -715,9 +715,9 @@ type ResourceClaimStatus struct {
Devices []AllocatedDeviceStatus `json:"devices,omitempty" protobuf:"bytes,4,opt,name=devices"`
}

// ReservedForMaxSize is the maximum number of entries in
// ResourceClaimReservedForMaxSize is the maximum number of entries in
// claim.status.reservedFor.
const ResourceClaimReservedForMaxSize = 32
const ResourceClaimReservedForMaxSize = 256

// ResourceClaimConsumerReference contains enough information to let you
// locate the consumer of a ResourceClaim. The user must be a resource in the same
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion staging/src/k8s.io/api/resource/v1beta1/generated.proto

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions staging/src/k8s.io/api/resource/v1beta1/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -695,7 +695,7 @@ type ResourceClaimStatus struct {
// which issued it knows that it must put the pod back into the queue,
// waiting for the ResourceClaim to become usable again.
//
// There can be at most 32 such reservations. This may get increased in
// There can be at most 256 such reservations. This may get increased in
// the future, but not reduced.
//
// +optional
Expand Down Expand Up @@ -723,9 +723,9 @@ type ResourceClaimStatus struct {
Devices []AllocatedDeviceStatus `json:"devices,omitempty" protobuf:"bytes,4,opt,name=devices"`
}

// ReservedForMaxSize is the maximum number of entries in
// ResourceClaimReservedForMaxSize is the maximum number of entries in
// claim.status.reservedFor.
const ResourceClaimReservedForMaxSize = 32
const ResourceClaimReservedForMaxSize = 256

// ResourceClaimConsumerReference contains enough information to let you
// locate the consumer of a ResourceClaim. The user must be a resource in the same
Expand Down
Loading

0 comments on commit a5de754

Please sign in to comment.