Version
v0.10.3-0-g4d11815e6
Describe the bug.
Multiple Dell GPU nodes that were in the HostPlatformConfiguration/PollingBiosSetup state got stuck and would not progress automatically. A manual restart of the BMC allowed the missing keys to be populated and the node would progress on to the next state.
If nico could recognise missing redfish keys and initiate a BMC restart it would mitigate the need for manual intervention.
Nico showed the following state messages:
`Message: The object is in the state for longer than defined by the SLA. Handler outcome: Wait("Failed to check BIOS setup status: Missing key SerialComm in JSON at bios. Will retry.")
Message: The object is in the state for longer than defined by the SLA. Handler outcome: Wait("Failed to check BIOS setup status: Missing key TpmSecurity in JSON at bios. Will retry.")
Message: The object is in the state for longer than defined by the SLA. Handler outcome: Wait("Failed to check BIOS setup status: Missing key ConTermType in JSON at bios. Will retry.")`
The Nico logs showed events such as below, but the node did not progress as a BMC restart was required in order to have the redfish keys populated:
TimeInStateAboveSla { handler_outcome: \"Wait(\\\"Failed to check BIOS setup status: Missing key SerialComm in JSON at bios. Will retry.
Minimum reproducible example
Relevant log output
Other/Misc.
No response
Code of Conduct
Version
v0.10.3-0-g4d11815e6
Describe the bug.
Multiple Dell GPU nodes that were in the HostPlatformConfiguration/PollingBiosSetup state got stuck and would not progress automatically. A manual restart of the BMC allowed the missing keys to be populated and the node would progress on to the next state.
If nico could recognise missing redfish keys and initiate a BMC restart it would mitigate the need for manual intervention.
Nico showed the following state messages:
`Message: The object is in the state for longer than defined by the SLA. Handler outcome: Wait("Failed to check BIOS setup status: Missing key SerialComm in JSON at bios. Will retry.")
Message: The object is in the state for longer than defined by the SLA. Handler outcome: Wait("Failed to check BIOS setup status: Missing key TpmSecurity in JSON at bios. Will retry.")
Message: The object is in the state for longer than defined by the SLA. Handler outcome: Wait("Failed to check BIOS setup status: Missing key ConTermType in JSON at bios. Will retry.")`
The Nico logs showed events such as below, but the node did not progress as a BMC restart was required in order to have the redfish keys populated:
TimeInStateAboveSla { handler_outcome: \"Wait(\\\"Failed to check BIOS setup status: Missing key SerialComm in JSON at bios. Will retry.Minimum reproducible example
Relevant log output
Other/Misc.
No response
Code of Conduct