Skip to content

Commit bac1bb9

Browse files
committed
MGMT-21485: Enable dpu-host mode that matches DPF requirements
This commit introduces OVN_NODE_MODE environment variable to enable per-node feature enforcement, particularly for DPU host mode where certain features must be disabled regardless of cluster-wide configuration. - Move feature toggles from ConfigMap (004-config.yaml) to startup script (008-script-lib.yaml) - ConfigMap values cannot be reliably overridden per-node, but script logic can be conditional - Implement OVN_NODE_MODE-based conditional feature enablement in startup script - Add 'dpu-host' mode that automatically disables incompatible features: - Egress IP and related features (egress firewall, egress QoS, egress service) - Multicast support - Multi-external gateway support - Multi-network policies and admin network policies - Network segmentation features - Set gateway_interface='derive-from-mgmt-port' for DPU host nodes - Add ovnkube_node_mode='--ovnkube-node-mode dpu-host' flag From bindata/network/ovn-kubernetes/*/004-config.yaml: - enable-egress-ip=true - enable-egress-firewall=true - enable-egress-qos=true - enable-egress-service=true - enable-multicast=true - enable-multi-external-gateway=true - enable-multi-network=true (conditionally) - enable-admin-network-policy=true (conditionally) - enable-network-segmentation=true (conditionally) Note: HyperShift hosted cluster ConfigMap (managed/004-config.yaml) retains egress feature flags as DPU host mode is not supported in hosted cluster configurations. - Add conditional blocks in 008-script-lib.yaml based on OVN_NODE_MODE - Full mode (default): All features enabled as configured - DPU host mode: Incompatible features force-disabled - Maintain backward compatibility for existing deployments - Rename egress_ip_enable_flag to egress_features_enable_flag for clarity - Add comprehensive TestOVNKubernetesScriptLibCombined test covering: - DPU host mode feature gating and disabling - Full mode with multi-network features enabled/disabled - Non-mode-gated features (route advertisements, DNS resolver, etc.) - Gateway interface variable usage validation - Multi-external gateway and egress features flag behavior across modes - Remove redundant individual test functions after consolidation - Update existing config rendering tests for new ConfigMap content - Update test assertions to use correct flag names (egress_features_enable_flag) - Create docs/ovn_node_mode.md with detailed technical explanation - Update docs/operands.md with OVN-Kubernetes node modes section - Update docs/architecture.md with per-node configuration explanation - Update README.md with DPU host mode support information - Add implementation details, feature mapping tables, and migration notes - Document multi-external gateway as disabled feature in DPU host mode - Update all references to use correct flag names ConfigMap-based feature control cannot be overridden per-node, making it impossible to disable features on specific node types (like DPU hosts) while keeping them enabled cluster-wide. Moving the logic to startup scripts allows the same cluster configuration to work across heterogeneous node types. This change ensures that DPU host nodes automatically have incompatible features disabled, preventing runtime failures and enabling mixed-mode cluster deployments. - Existing clusters continue to work without changes - Default behavior (full mode) remains unchanged - Migration is automatic during upgrade process - No manual intervention required - HyperShift hosted clusters unaffected (DPU host mode not supported)
1 parent 31ac8da commit bac1bb9

File tree

10 files changed

+319
-207
lines changed

10 files changed

+319
-207
lines changed

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,18 @@ OVNKubernetes supports the following configuration options, all of which are opt
157157
* `egressIPConfig`: holds the configuration for EgressIP options.
158158
* `reachabilityTotalTimeoutSeconds`: Set EgressIP node reachability total timeout in seconds, 0 means disable reachability check and the default is 1 second.
159159

160+
#### DPU Host Mode Support
161+
162+
OVN-Kubernetes supports specialized hardware deployments such as DPU (Data Processing Unit) hosts through the `OVN_NODE_MODE` environment variable. In `dpu-host` mode, certain features are automatically disabled on those nodes regardless of cluster-wide configuration:
163+
164+
- Egress IP and related features (egress firewall, egress QoS, egress service)
165+
- Multicast support
166+
- Multi-external gateway support
167+
- Multi-network policies and admin network policies
168+
- Network segmentation features
169+
170+
This per-node feature enforcement is implemented through conditional logic in the startup scripts, allowing the same cluster configuration to work across heterogeneous node types. For detailed information about node modes and the technical implementation, see `docs/ovn_node_mode.md`.
171+
160172
These configuration flags are only in the Operator configuration object.
161173

162174
Example from the `manifests/cluster-network-03-config.yml` file:

bindata/network/ovn-kubernetes/common/008-script-lib.yaml

Lines changed: 38 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -512,13 +512,35 @@ data:
512512

513513
echo "I$(date "+%m%d %H:%M:%S.%N") - starting ovnkube-node"
514514

515+
# enable egress ip, egress firewall, egress qos, egress service
516+
egress_features_enable_flag="--enable-egress-ip=true --enable-egress-firewall=true --enable-egress-qos=true --enable-egress-service=true"
517+
init_ovnkube_controller="--init-ovnkube-controller ${K8S_NODE}"
518+
multi_external_gateway_enable_flag="--enable-multi-external-gateway=true"
519+
gateway_interface=br-ex
515520

516-
if [ "{{.OVN_NODE_MODE}}" == "dpu-host" ]; then
517-
// this is required for the dpu-host mode to configure right gateway interface
518-
// https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5327/files
519-
gateway_interface=derive-from-mgmt-port
520-
else
521-
gateway_interface=br-ex
521+
# enable multicast
522+
enable_multicast_flag="--enable-multicast"
523+
524+
# Use OVN_NODE_MODE environment variable, default to "full" if not set
525+
OVN_NODE_MODE=${OVN_NODE_MODE:-full}
526+
# We check only dpu-host mode and not smart-nic mode here as currently we do not support it yet
527+
# Once we support it, we will need to check for it here and add relevant code.
528+
if [ "${OVN_NODE_MODE}" == "dpu-host" ]; then
529+
# this is required for the dpu-host mode to configure right gateway interface
530+
# https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5327/files
531+
gateway_interface="derive-from-mgmt-port"
532+
ovnkube_node_mode="--ovnkube-node-mode dpu-host"
533+
# disable egress ip for dpu-host mode as it is not supported
534+
egress_features_enable_flag=""
535+
536+
# disable multicast for dpu-host mode as it is not supported
537+
enable_multicast_flag=""
538+
539+
# disable init-ovnkube-controller for dpu-host mode as it is not supported
540+
init_ovnkube_controller=""
541+
542+
# disable multi-external-gateway for dpu-host mode as it is not supported
543+
multi_external_gateway_enable_flag=""
522544
fi
523545

524546
if [ "{{.OVN_GATEWAY_MODE}}" == "shared" ]; then
@@ -564,12 +586,12 @@ data:
564586
fi
565587

566588
multi_network_enabled_flag=
567-
if [[ "{{.OVN_MULTI_NETWORK_ENABLE}}" == "true" ]]; then
589+
if [[ "{{.OVN_MULTI_NETWORK_ENABLE}}" == "true" && "${OVN_NODE_MODE}" != "dpu-host" ]]; then
568590
multi_network_enabled_flag="--enable-multi-network"
569591
fi
570592

571593
network_segmentation_enabled_flag=
572-
if [[ "{{.OVN_NETWORK_SEGMENTATION_ENABLE}}" == "true" ]]; then
594+
if [[ "{{.OVN_NETWORK_SEGMENTATION_ENABLE}}" == "true" && "${OVN_NODE_MODE}" != "dpu-host" ]]; then
573595
multi_network_enabled_flag="--enable-multi-network"
574596
network_segmentation_enabled_flag="--enable-network-segmentation"
575597
fi
@@ -590,12 +612,12 @@ data:
590612
fi
591613

592614
multi_network_policy_enabled_flag=
593-
if [[ "{{.OVN_MULTI_NETWORK_POLICY_ENABLE}}" == "true" ]]; then
615+
if [[ "{{.OVN_MULTI_NETWORK_POLICY_ENABLE}}" == "true"&& "${OVN_NODE_MODE}" != "dpu-host" ]]; then
594616
multi_network_policy_enabled_flag="--enable-multi-networkpolicy"
595617
fi
596618

597619
admin_network_policy_enabled_flag=
598-
if [[ "{{.OVN_ADMIN_NETWORK_POLICY_ENABLE}}" == "true" ]]; then
620+
if [[ "{{.OVN_ADMIN_NETWORK_POLICY_ENABLE}}" == "true" && "${OVN_NODE_MODE}" != "dpu-host" ]]; then
599621
admin_network_policy_enabled_flag="--enable-admin-network-policy"
600622
fi
601623

@@ -656,17 +678,15 @@ data:
656678
fi
657679

658680
exec /usr/bin/ovnkube \
659-
--init-ovnkube-controller "${K8S_NODE}" \
681+
${init_ovnkube_controller} \
660682
--init-node "${K8S_NODE}" \
661683
--config-file=/run/ovnkube-config/ovnkube.conf \
662684
--ovn-empty-lb-events \
663685
--loglevel "${log_level}" \
664686
--inactivity-probe="${OVN_CONTROLLER_INACTIVITY_PROBE}" \
665687
${gateway_mode_flags} \
666688
${node_mgmt_port_netdev_flags} \
667-
{{- if eq .OVN_NODE_MODE "dpu-host" }}
668-
--ovnkube-node-mode dpu-host \
669-
{{- end }}
689+
${ovnkube_node_mode} \
670690
--metrics-bind-address "127.0.0.1:${metrics_port}" \
671691
--ovn-metrics-bind-address "127.0.0.1:${ovn_metrics_port}" \
672692
--metrics-enable-pprof \
@@ -682,7 +702,7 @@ data:
682702
${admin_network_policy_enabled_flag} \
683703
${dns_name_resolver_enabled_flag} \
684704
${network_observability_enabled_flag} \
685-
--enable-multicast \
705+
${enable_multicast_flag} \
686706
--zone ${K8S_NODE} \
687707
--enable-interconnect \
688708
--acl-logging-rate-limit "{{.OVNPolicyAuditRateLimit}}" \
@@ -694,5 +714,7 @@ data:
694714
${ovn_v4_masquerade_subnet_opt} \
695715
${ovn_v6_masquerade_subnet_opt} \
696716
${ovn_v4_transit_switch_subnet_opt} \
697-
${ovn_v6_transit_switch_subnet_opt}
717+
${ovn_v6_transit_switch_subnet_opt} \
718+
${egress_features_enable_flag} \
719+
${multi_external_gateway_enable_flag}
698720
}

bindata/network/ovn-kubernetes/managed/004-config.yaml

Lines changed: 0 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -33,16 +33,9 @@ data:
3333
dns-service-name="dns-default"
3434

3535
[ovnkubernetesfeature]
36-
enable-egress-ip=true
37-
enable-egress-firewall=true
38-
enable-egress-qos=true
39-
enable-egress-service=true
4036
{{- if .ReachabilityNodePort }}
4137
egressip-node-healthcheck-port={{.ReachabilityNodePort}}
4238
{{- end }}
43-
{{- if .OVN_MULTI_NETWORK_ENABLE }}
44-
enable-multi-network=true
45-
{{- end }}
4639
{{- if .OVN_NETWORK_SEGMENTATION_ENABLE }}
4740
{{- if not .OVN_MULTI_NETWORK_ENABLE }}
4841
enable-multi-network=true
@@ -52,13 +45,6 @@ data:
5245
{{- if .OVN_PRE_CONF_UDN_ADDR_ENABLE }}
5346
enable-preconfigured-udn-addresses=true
5447
{{- end }}
55-
{{- if .OVN_MULTI_NETWORK_POLICY_ENABLE }}
56-
enable-multi-networkpolicy=true
57-
{{- end }}
58-
{{- if .OVN_ADMIN_NETWORK_POLICY_ENABLE }}
59-
enable-admin-network-policy=true
60-
{{- end }}
61-
enable-multi-external-gateway=true
6248
{{- if .DNS_NAME_RESOLVER_ENABLE }}
6349
enable-dns-name-resolver=true
6450
{{- end }}
@@ -147,13 +133,6 @@ data:
147133
enable-preconfigured-udn-addresses=true
148134
{{- end }}
149135
{{- end }}
150-
{{- if .OVN_MULTI_NETWORK_POLICY_ENABLE }}
151-
enable-multi-networkpolicy=true
152-
{{- end }}
153-
{{- if .OVN_ADMIN_NETWORK_POLICY_ENABLE }}
154-
enable-admin-network-policy=true
155-
{{- end }}
156-
enable-multi-external-gateway=true
157136
{{- if .DNS_NAME_RESOLVER_ENABLE }}
158137
enable-dns-name-resolver=true
159138
{{- end }}

bindata/network/ovn-kubernetes/managed/ovnkube-node.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -394,6 +394,8 @@ spec:
394394
value: "{{.OVN_CONTROLLER_INACTIVITY_PROBE}}"
395395
- name: OVN_KUBE_LOG_LEVEL
396396
value: "4"
397+
- name: OVN_NODE_MODE
398+
value: "{{.OVN_NODE_MODE}}"
397399
{{ if .NetFlowCollectors }}
398400
- name: NETFLOW_COLLECTORS
399401
value: "{{.NetFlowCollectors}}"

bindata/network/ovn-kubernetes/self-hosted/004-config.yaml

Lines changed: 1 addition & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -36,19 +36,13 @@ data:
3636
dns-service-name="dns-default"
3737

3838
[ovnkubernetesfeature]
39-
enable-egress-ip=true
40-
enable-egress-firewall=true
41-
enable-egress-qos=true
42-
enable-egress-service=true
39+
4340
{{- if .ReachabilityTotalTimeoutSeconds }}
4441
egressip-reachability-total-timeout={{.ReachabilityTotalTimeoutSeconds}}
4542
{{- end }}
4643
{{- if .ReachabilityNodePort }}
4744
egressip-node-healthcheck-port={{.ReachabilityNodePort}}
4845
{{- end }}
49-
{{- if .OVN_MULTI_NETWORK_ENABLE }}
50-
enable-multi-network=true
51-
{{- end }}
5246
{{- if .OVN_NETWORK_SEGMENTATION_ENABLE }}
5347
{{- if not .OVN_MULTI_NETWORK_ENABLE }}
5448
enable-multi-network=true
@@ -61,10 +55,6 @@ data:
6155
{{- if .OVN_MULTI_NETWORK_POLICY_ENABLE }}
6256
enable-multi-networkpolicy=true
6357
{{- end }}
64-
{{- if .OVN_ADMIN_NETWORK_POLICY_ENABLE }}
65-
enable-admin-network-policy=true
66-
{{- end }}
67-
enable-multi-external-gateway=true
6858
{{- if .DNS_NAME_RESOLVER_ENABLE }}
6959
enable-dns-name-resolver=true
7060
{{- end }}

bindata/network/ovn-kubernetes/self-hosted/ovnkube-node.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -538,6 +538,8 @@ spec:
538538
value: "{{.OVN_CONTROLLER_INACTIVITY_PROBE}}"
539539
- name: OVN_KUBE_LOG_LEVEL
540540
value: "4"
541+
- name: OVN_NODE_MODE
542+
value: "{{.OVN_NODE_MODE}}"
541543
{{ if .NetFlowCollectors }}
542544
- name: NETFLOW_COLLECTORS
543545
value: "{{.NetFlowCollectors}}"

docs/architecture.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,12 @@ The Network operator needs to make sure that the input configuration doesn't cha
141141

142142
The persisted configuration must **make all defaults explicit**. This protects against inadvertent code changes that could destabilize an existing cluster.
143143

144+
### Per-Node Configuration
145+
146+
For certain specialized deployments (e.g., DPU host nodes), some features need to be disabled on a per-node basis even when enabled cluster-wide. Since ConfigMap values cannot be reliably overridden per-node, the CNO implements per-node feature enforcement through conditional logic in the startup scripts.
147+
148+
The `OVN_NODE_MODE` environment variable is injected into `ovnkube-node` pods and consumed by the startup script (`008-script-lib.yaml`) to conditionally enable or disable features based on the node's operational mode. This ensures that unsupported features are deterministically disabled on specialized hardware regardless of cluster-wide configuration.
149+
144150
## Egress Router
145151

146152
**Input:** `EgressRouter.network.operator.openshift.io`

docs/operands.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,26 @@ configuration object (which in turn is copied there from the
9393
configuration) is "`OVNKubernetes`". If the specified network type is
9494
not "`OVNKubernetes`", the CNO will not render any network plugin.
9595

96+
### OVN-Kubernetes Node Modes
97+
98+
OVN-Kubernetes supports different node operational modes through the `OVN_NODE_MODE`
99+
environment variable. This allows per-node feature enforcement, particularly for
100+
specialized hardware like DPU (Data Processing Unit) hosts where certain features
101+
must be disabled.
102+
103+
The startup script (`008-script-lib.yaml`) contains conditional logic that adjusts
104+
feature enablement based on the node mode:
105+
106+
- **`full` mode (default)**: All features enabled as configured
107+
- **`dpu-host` mode**: Certain features like egress IP, multicast, multi-network
108+
policies, and admin network policies are automatically disabled regardless of
109+
cluster-wide configuration
110+
111+
This approach was necessary because ConfigMap values (`004-config.yaml`) cannot be
112+
reliably overridden on a per-node basis, but startup script logic can be conditional.
113+
114+
For detailed information, see `docs/ovn_node_mode.md`.
115+
96116
## Multus
97117

98118
Multus is deployed as long as `.spec.disableMultiNetwork` is not set.

docs/ovn_node_mode.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
## OVN node modes and per-node feature enforcement
2+
3+
This change introduces `OVN_NODE_MODE` as an environment variable injected into the `ovnkube-node` Pod. The value is consumed by the startup script rendered from `bindata/network/ovn-kubernetes/common/008-script-lib.yaml` to tailor behavior per node mode at runtime.
4+
5+
### Why move flags from the config map into the script?
6+
7+
- The INI-based config (`004-config.yaml`) is rendered cluster-wide. Those values are not reliably overridable on a per-node or per-mode basis.
8+
- In DPU host mode, some features are not supported and must be deterministically disabled on those nodes even if the cluster-wide config enables them.
9+
- Moving the enablement logic to the entrypoint script allows per-node enforcement using `OVN_NODE_MODE`, preventing unsupported features from being turned on by cluster defaults.
10+
11+
### Behavior by mode
12+
13+
- `full` (default):
14+
- `gateway_interface=br-ex`
15+
- `init_ovnkube_controller="--init-ovnkube-controller ${K8S_NODE}"`
16+
- `enable_multicast_flag="--enable-multicast"`
17+
- `egress_features_enable_flag="--enable-egress-ip=true --enable-egress-firewall=true --enable-egress-qos=true --enable-egress-service=true"`
18+
- `multi_external_gateway_enable_flag="--enable-multi-external-gateway=true"`
19+
20+
- `dpu-host`:
21+
- `gateway_interface="derive-from-mgmt-port"`
22+
- `ovnkube_node_mode="--ovnkube-node-mode dpu-host"`
23+
- `init_ovnkube_controller=""` (disabled)
24+
- `enable_multicast_flag=""` (disabled)
25+
- `egress_features_enable_flag=""` (egress IP and related features disabled)
26+
- `multi_external_gateway_enable_flag=""` (multi-external gateway disabled)
27+
- Multi-network, network segmentation, and multi-network policy/admin network policy are gated and not enabled in this mode.
28+
29+
### Manifests
30+
31+
- `ovnkube-node.yaml` (managed and self-hosted) now inject `OVN_NODE_MODE` into the Pod env so the script can apply mode-aware logic.
32+
- `004-config.yaml` drops hard-coded feature enables that conflict with per-node enforcement.
33+
34+
### Implementation Details
35+
36+
#### Environment Variable Injection
37+
38+
The `OVN_NODE_MODE` environment variable is injected into `ovnkube-node` pods through the DaemonSet specification in both managed and self-hosted variants:
39+
40+
- `bindata/network/ovn-kubernetes/managed/ovnkube-node.yaml`
41+
- `bindata/network/ovn-kubernetes/self-hosted/ovnkube-node.yaml`
42+
43+
The value is typically derived from node labels or annotations that identify the node's hardware type.
44+
45+
#### Script Logic Flow
46+
47+
The startup script (`008-script-lib.yaml`) implements the following conditional logic:
48+
49+
```bash
50+
if [[ "${OVN_NODE_MODE}" != "dpu-host" ]]; then
51+
# Enable features for full mode
52+
egress_ip_enable_flag="--enable-egress-ip=true --enable-egress-firewall=true --enable-egress-qos=true --enable-egress-service=true"
53+
enable_multicast_flag="--enable-multicast"
54+
# ... other feature flags
55+
else
56+
# DPU host mode - disable features
57+
egress_ip_enable_flag=""
58+
enable_multicast_flag=""
59+
gateway_interface="derive-from-mgmt-port"
60+
ovnkube_node_mode="--ovnkube-node-mode dpu-host"
61+
fi
62+
```
63+
64+
#### Feature Flag Mapping
65+
66+
The following table shows how cluster-wide configuration translates to per-node enforcement:
67+
68+
| Feature | ConfigMap (004-config.yaml) | Script Variable | DPU Host Behavior |
69+
|---------|----------------------------|-----------------|-------------------|
70+
| Egress IP | `enable-egress-ip=true` | `egress_features_enable_flag` | Force disabled |
71+
| Multicast | `enable-multicast=true` | `enable_multicast_flag` | Force disabled |
72+
| Multi External Gateway | `enable-multi-external-gateway=true` | `multi_external_gateway_enable_flag` | Force disabled |
73+
| Multi-network | `enable-multi-network=true` | `multi_network_enabled_flag` | Conditionally disabled |
74+
| Admin Network Policy | `enable-admin-network-policy=true` | `admin_network_policy_enabled_flag` | Conditionally disabled |
75+
| Network Segmentation | `enable-network-segmentation=true` | `network_segmentation_enabled_flag` | Conditionally disabled |
76+
77+
### Testing
78+
79+
- Unit tests assert that the rendered script contains the correct assignments for `gateway_interface`, `init_ovnkube_controller`, `enable_multicast_flag`, `egress_features_enable_flag`, and `ovnkube_node_mode` across modes.
80+
- The comprehensive test `TestOVNKubernetesScriptLibCombined` validates all conditional logic paths and feature flag assignments.
81+
- Tests verify both positive cases (features enabled in full mode) and negative cases (features disabled in DPU host mode).
82+
83+
### Migration Notes
84+
85+
When upgrading clusters that previously relied on ConfigMap-based feature control:
86+
87+
1. Existing ConfigMap values in `004-config.yaml` have been removed for features that require per-node control
88+
2. The startup script now contains the authoritative feature enablement logic
89+
3. DPU host nodes will automatically have incompatible features disabled regardless of previous ConfigMap settings
90+
4. No manual intervention is required - the migration is handled automatically during the upgrade process
91+
92+

0 commit comments

Comments
 (0)