Skip to content

Node registration/connection issue on NUC: Apiserver uses wrong node IP from ETCD #391

@akshaylg0314

Description

@akshaylg0314

When setting up the system on a NUC, I encountered a node registration and connection issue between the NodeAgent and the Apiserver. The flow and problem are as follows:

Flow:

Register the node from the NUC by running NodeAgent.
On RHIVOS, the Apiserver receives the node info and saves it to ETCD.
Apply a YAML from the NUC by running ./demo.sh (with RHIVOS IP in the curl command).
The Apiserver reports a nodeagent connection error.
ETCD node info (example):

`cluster/nodes/HPC
{"node_id":"HPC-0.0.0.0","hostname":"HPC","ip_address":"0.0.0.0", ...}

cluster/nodes/localhost.localdomain
{"node_id":"localhost.localdomain","hostname":"localhost.localdomain","ip_address":"192.168.10.100", ...}

nodes/0.0.0.0
HPC
nodes/192.168.10.100
localhost.localdomain
nodes/HPC
0.0.0.0
nodes/localhost.localdomain
192.168.10.100`

Root Cause (code):
The Apiserver fetches the node IP by simply taking the first key with the prefix nodes/:

// Find a node by IP address from simplified node keys pub async fn find_node_by_simple_key() -> Option<String> { ... if let Some(kv) = kvs.first() { let ip_address = kv.key.trim_start_matches("nodes/"); return Some(ip_address.to_string()); } ... }
This means it may use the wrong node IP (e.g., 0.0.0.0 instead of the real node IP), causing connection errors.

Expected:

The Apiserver should select the correct node IP (matching the actual node or the one specified in the YAML/scenario), not just the first one in ETCD.
Actual:

The Apiserver may use 0.0.0.0 or another incorrect IP, leading to nodeagent connection errors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions