-
Notifications
You must be signed in to change notification settings - Fork 16
Description
When setting up the system on a NUC, I encountered a node registration and connection issue between the NodeAgent and the Apiserver. The flow and problem are as follows:
Flow:
Register the node from the NUC by running NodeAgent.
On RHIVOS, the Apiserver receives the node info and saves it to ETCD.
Apply a YAML from the NUC by running ./demo.sh (with RHIVOS IP in the curl command).
The Apiserver reports a nodeagent connection error.
ETCD node info (example):
`cluster/nodes/HPC
{"node_id":"HPC-0.0.0.0","hostname":"HPC","ip_address":"0.0.0.0", ...}
cluster/nodes/localhost.localdomain
{"node_id":"localhost.localdomain","hostname":"localhost.localdomain","ip_address":"192.168.10.100", ...}
nodes/0.0.0.0
HPC
nodes/192.168.10.100
localhost.localdomain
nodes/HPC
0.0.0.0
nodes/localhost.localdomain
192.168.10.100`
Root Cause (code):
The Apiserver fetches the node IP by simply taking the first key with the prefix nodes/:
// Find a node by IP address from simplified node keys pub async fn find_node_by_simple_key() -> Option<String> { ... if let Some(kv) = kvs.first() { let ip_address = kv.key.trim_start_matches("nodes/"); return Some(ip_address.to_string()); } ... }
This means it may use the wrong node IP (e.g., 0.0.0.0 instead of the real node IP), causing connection errors.
Expected:
The Apiserver should select the correct node IP (matching the actual node or the one specified in the YAML/scenario), not just the first one in ETCD.
Actual:
The Apiserver may use 0.0.0.0 or another incorrect IP, leading to nodeagent connection errors.