Skip to content

bug: Nico Error Handling when a key is missing in redfish #2886

Description

@desrod-nvidia

Version

v0.10.3-0-g4d11815e6

Describe the bug.

Multiple Dell GPU nodes that were in the HostPlatformConfiguration/PollingBiosSetup state got stuck and would not progress automatically. A manual restart of the BMC allowed the missing keys to be populated and the node would progress on to the next state.

If nico could recognise missing redfish keys and initiate a BMC restart it would mitigate the need for manual intervention.

Nico showed the following state messages:

`Message: The object is in the state for longer than defined by the SLA. Handler outcome: Wait("Failed to check BIOS setup status: Missing key SerialComm in JSON at bios. Will retry.")

Message: The object is in the state for longer than defined by the SLA. Handler outcome: Wait("Failed to check BIOS setup status: Missing key TpmSecurity in JSON at bios. Will retry.")

Message: The object is in the state for longer than defined by the SLA. Handler outcome: Wait("Failed to check BIOS setup status: Missing key ConTermType in JSON at bios. Will retry.")`

The Nico logs showed events such as below, but the node did not progress as a BMC restart was required in order to have the redfish keys populated:

TimeInStateAboveSla { handler_outcome: \"Wait(\\\"Failed to check BIOS setup status: Missing key SerialComm in JSON at bios. Will retry.

Minimum reproducible example

Relevant log output

Other/Misc.

No response

Code of Conduct

  • I agree to follow NVIDIA Infra Controller's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugA defect in existing software (deprecated - use issue type, but it's needed for reporting now)interest/dsx

    Type

    No fields configured for Bug.

    Projects

    Status
    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions