-
Notifications
You must be signed in to change notification settings - Fork 93
Issues
is:issue state:open
is:issue state:open
Issue creation is restricted in this repository
Search results
[Feature]: Add post-remediation validation gate before clearing node/GPU fault state
enhancementNew feature or requestNew feature or requestStatus: Open.#1427 In NVIDIA/NVSentinel;[Feature]: Do not uncordon nodes cordoned independently of NVSentinel
enhancementNew feature or requestNew feature or requestpriority/P1Max fix SLA: 183 daysMax fix SLA: 183 daysStatus: Open.#1424 In NVIDIA/NVSentinel;[Feature]: Prevent DCGM connectivity errors on node bootstrapping
enhancementNew feature or requestNew feature or requestpriority/P1Max fix SLA: 183 daysMax fix SLA: 183 daysStatus: Open.#1423 In NVIDIA/NVSentinel;[Feature]: Allow preflight tests to have namespace scoped configuration
priority/P1Max fix SLA: 183 daysMax fix SLA: 183 daysStatus: Open.#1420 In NVIDIA/NVSentinel;[Bug]: syslog-health-monitor can miss XIDs when journald rotates before scan
bugSomething isn't workingSomething isn't workingpriority/P1Max fix SLA: 183 daysMax fix SLA: 183 daysStatus: Open.[Bug]: remediation-failed node label can remain after the unsupported failing check has recovered
bugSomething isn't workingSomething isn't workingpriority/P1Max fix SLA: 183 daysMax fix SLA: 183 daysStatus: Open.[Bug]: NVSentinel doesn't work in IPv6-only clusters
bugSomething isn't workingSomething isn't workingpriority/P1Max fix SLA: 183 daysMax fix SLA: 183 daysStatus: Open.#1407 In NVIDIA/NVSentinel;[Bug]: NIC monitor emits non-fatal events for expected-down ports after reboot
bugSomething isn't workingSomething isn't workingpriority/P1Max fix SLA: 183 daysMax fix SLA: 183 daysStatus: Open.#1379 In NVIDIA/NVSentinel;[Bug]: NIC monitor emits false FATAL for unprovisioned Ethernet/RoCE Aux ports on cloud shapes
bugSomething isn't workingSomething isn't workingpriority/P1Max fix SLA: 183 daysMax fix SLA: 183 daysStatus: Open.#1361 In NVIDIA/NVSentinel;[Bug]: Platform connector OR-based entity matching silently clears unrelated NIC failures
bugSomething isn't workingSomething isn't workingpriority/P1Max fix SLA: 183 daysMax fix SLA: 183 daysStatus: Open.#1360 In NVIDIA/NVSentinel;[Feature]: Parallelism-aware preflight checks
enhancementNew feature or requestNew feature or requestpriority/P2no SLA breachno SLA breachStatus: Open.#1354 In NVIDIA/NVSentinel;[Bug]: ND cold-start replays stale quarantine events from ended sessions, causing orphaned remediation-failed labels
bugSomething isn't workingSomething isn't workingpriority/P0Max Fix SLA: 30 daysMax Fix SLA: 30 daysStatus: Open.#1347 In NVIDIA/NVSentinel;