Skip to content

bug: NICo does not restart DPU reprovisioning if machine stuck in Failed/DpfProvisioning state #2834

Description

@abvarshney-nv

Version

0.2

Describe the bug.

In DPF testing, the host failed DPU reprovisioning. When tried to restart the reprovisioning, nothing happened.
The issue seems with https://github.com/NVIDIA/ncx-infra-controller-core/blob/main/crates/machine-controller/src/handler.rs#L1883-1900 which does not allow reprovisioning other than these failed state.

Failed/DpfProvisioning should also be added here.

Minimum reproducible example

Relevant log output

Other/Misc.

No response

Code of Conduct

  • I agree to follow NVIDIA Infra Controller's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report

Metadata

Metadata

Assignees

Labels

bugA defect in existing software (deprecated - use issue type, but it's needed for reporting now)

Type

No fields configured for Bug.

Projects

Status
Verify

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions