Describe the bug
Both actions of attaching and detaching NVLink interfaces to Carbide instance takes too much time to make interface's status "Ready":
I detached one partition from instance at :22
"id": "2c149132-24cc-47df-82cc-78bd2d214d4b",
"instanceId": "<id>",
"nvLinklogicalPartitionId": "a24a6d68-ae9b-4cfc-83af-69739e28288f",
"nvLinkDomainId": "<id>",
"deviceInstance": 3,
"gpuGuid": "<guid>",
"status": "Deleting",
"created": "2026-02-25T13:14:40.621245Z",
"updated": "2026-02-25T13:22:10.044117Z"
}
and it is deleted after 4.5 minutes (and in my case replaced by new partition):
{
"id": "536302b7-a600-4efc-bd21-395c75c04c7e",
"instanceId": "<id>",
"nvLinklogicalPartitionId": "41aa70d0-faf5-4166-81be-b929142eeb78",
"nvLinkDomainId": null,
"deviceInstance": 3,
"gpuGuid": null,
"status": "Pending",
"created": "2026-02-25T13:26:42.843422Z",
"updated": "2026-02-25T13:26:42.843422Z"
}
The same is for attaching NVLink interface - new partition attachment created at :26 min. Now it is :37 min (11 min since create) and it is still in Pending:
{
"id": "536302b7-a600-4efc-bd21-395c75c04c7e",
"instanceId": "<id>",
"nvLinklogicalPartitionId": "41aa70d0-faf5-4166-81be-b929142eeb78",
"nvLinkDomainId": "<id>",
"deviceInstance": 3,
"gpuGuid": "<guid>",
"status": "Pending",
"created": "2026-02-25T13:26:42.843422Z",
"updated": "2026-02-25T13:32:13.550321Z"
}
However, Pending interfaces are updated since 4-5 minutes with correct GPU GUIDs but still stuck in Pending. Initially pending interfaces have null GPU GUID field's value.
Partition with that ID is Ready itself:
{
"id": "41aa70d0-faf5-4166-81be-b929142eeb78",
"name": "kubevirt-node-02-pt-0-1-2-3",
"description": "",
...
...
"nvLinkLogicalPartitionStats": null,
"status": "Ready",
"statusHistory": [
{
"status": "Ready",
"message": "NVLink Logical Partition is ready for use",
"created": "2026-02-25T13:22:09.137991Z",
"updated": "2026-02-25T13:22:09.137991Z"
},
{
"status": "Pending",
"message": "received NVLink Logical Partition creation request, pending",
"created": "2026-02-25T13:21:23.205157Z",
"updated": "2026-02-25T13:21:23.205157Z"
}
],
"created": "2026-02-25T13:21:23.191649Z",
"updated": "2026-02-25T13:22:09.132546Z"
}
]
Some of NVLink partitions attach to instances after 4+ minutes with no problems and others - stuck in Pending. Easy example: I have partition
"id": "a24a6d68-ae9b-4cfc-83af-69739e28288f",
"name": "kubevirt-node-02-default",
which was created 2 hours ago and it is attaching/detaching to/from the instance's all GPUs with no issue.
And now I'm creating another partition
"id": "41aa70d0-faf5-4166-81be-b929142eeb78",
"name": "kubevirt-node-02-pt-0-1-2-3",
which I'm attaching to the same instance's all GPUs (after detaching kubevirt-node-02-default partition from instance's GPUs) and they are stuck in "Pending".
30 minutes since interface attachment create - still in Pending:
{
"id": "536302b7-a600-4efc-bd21-395c75c04c7e",
"instanceId": "<id>",
"nvLinklogicalPartitionId": "41aa70d0-faf5-4166-81be-b929142eeb78",
"nvLinkDomainId": "<id>",
"deviceInstance": 3,
"gpuGuid": "<guid>",
"status": "Pending",
"created": "2026-02-25T13:26:42.843422Z",
"updated": "2026-02-25T13:32:13.550321Z"
}
]
$ date
Wed Feb 25 06:06:07 AM PST 2026 (i.e. 14:06:07)
Interestingly, kubevirt-node-02-default attachments become Ready after 5 minutes but kubevirt-node-02-pt-0-1-2-3 for the same GPUs - stuck in Pending.
Steps/Code to reproduce bug
I repeat the following scenario about 5-6 times and had the same effect:
- before workload all gpus (deviceInstance 0-3) of an instance are attached to partition "kubevirt-node-02-default" - means I attach all deviceInstances to this partition. All attachments (nvlink interfaces) become Ready in 5-6 minutes since PATCH API call.
- I create new partition and always name it "kubevirt-node-02-pt-0-1-2-3". It becomes Ready in 4-5 minutes after POST API call.
- I'm deleting all attachments of "kubevirt-node-02-default" partition from an instance. As a result, attachments deleted and nvlink interfaces list of an instance is empty after 5-6 minutes since PATCH API call.
- I attach all gpus (deviceInstance 0-3) of an instance to the new partition "kubevirt-node-02-pt-0-1-2-3". All attachments stuck in Pending state for any time (the longest time I waited was ~40 minutes).
- I delete all attachments of "kubevirt-node-02-pt-0-1-2-3" from an instance. As a result, after 5-6 minutes all attachments are deleted and nvlink interfaces list becomes empty for an instance.
Expected behavior
Both partitions are capable of attaching/detaching and becoming "Ready". Time of moving to "Ready" or deleted state less than 4-5 minutes.
Describe the bug
Both actions of attaching and detaching NVLink interfaces to Carbide instance takes too much time to make interface's status "Ready":
I detached one partition from instance at :22
and it is deleted after 4.5 minutes (and in my case replaced by new partition):
The same is for attaching NVLink interface - new partition attachment created at :26 min. Now it is :37 min (11 min since create) and it is still in Pending:
However, Pending interfaces are updated since 4-5 minutes with correct GPU GUIDs but still stuck in Pending. Initially pending interfaces have null GPU GUID field's value.
Partition with that ID is Ready itself:
Some of NVLink partitions attach to instances after 4+ minutes with no problems and others - stuck in Pending. Easy example: I have partition
which was created 2 hours ago and it is attaching/detaching to/from the instance's all GPUs with no issue.
And now I'm creating another partition
which I'm attaching to the same instance's all GPUs (after detaching kubevirt-node-02-default partition from instance's GPUs) and they are stuck in "Pending".
30 minutes since interface attachment create - still in Pending:
Interestingly, kubevirt-node-02-default attachments become Ready after 5 minutes but kubevirt-node-02-pt-0-1-2-3 for the same GPUs - stuck in Pending.
Steps/Code to reproduce bug
I repeat the following scenario about 5-6 times and had the same effect:
Expected behavior
Both partitions are capable of attaching/detaching and becoming "Ready". Time of moving to "Ready" or deleted state less than 4-5 minutes.