Skip to content

Commit 2d609a4

Browse files
authored
make host networking optional (#270)
* make host networking optional (attribution: Leo Palmer Sunmo @leosunmo) * update helm readme and add hostnetworking=false test * generate queue-processor assets * updated test output
1 parent a20febc commit 2d609a4

File tree

7 files changed

+224
-27
lines changed

7 files changed

+224
-27
lines changed

config/helm/aws-node-termination-handler/README.md

Lines changed: 35 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,9 @@ The command removes all the Kubernetes components associated with the chart and
4747

4848
The following tables lists the configurable parameters of the chart and their default values.
4949

50-
### AWS Node Termination Handler Configuration
50+
### AWS Node Termination Handler Common Configuration
51+
52+
The configuration in this table applies to both queue-processor mode and IMDS mode.
5153

5254
Parameter | Description | Default
5355
--- | --- | ---
@@ -64,25 +66,33 @@ Parameter | Description | Default
6466
`webhookTemplate` | Replaces the default webhook message template. | `{"text":"[NTH][Instance Interruption] EventID: {{ .EventID }} - Kind: {{ .Kind }} - Instance: {{ .InstanceID }} - Description: {{ .Description }} - Start Time: {{ .StartTime }}"}`
6567
`webhookTemplateConfigMapName` | Pass Webhook template file as configmap | None
6668
`webhookTemplateConfigMapKey` | Name of the template file stored in the configmap| None
67-
`enableScheduledEventDraining` | [EXPERIMENTAL] If true, drain nodes before the maintenance window starts for an EC2 instance scheduled event | `false`
68-
`enableSpotInterruptionDraining` | If true, drain nodes when the spot interruption termination notice is received | `true`
69-
`enableSqsTerminationDraining` | If true, drain nodes when an SQS termination event is received | `false`
70-
`queueURL` | Listens for messages on the specified SQS queue URL | None
71-
`awsRegion` | If specified, use the AWS region for AWS API calls, else NTH will try to find the region through AWS_REGION env var, IMDS, or the specified queue URL | ``
7269
`metadataTries` | The number of times to try requesting metadata. If you would like 2 retries, set metadata-tries to 3. | `3`
7370
`cordonOnly` | If true, nodes will be cordoned but not drained when an interruption event occurs. | `false`
7471
`taintNode` | If true, nodes will be tainted when an interruption event occurs. Currently used taint keys are `aws-node-termination-handler/scheduled-maintenance`, `aws-node-termination-handler/spot-itn`, and `aws-node-termination-handler/asg-lifecycle-termination` | `false`
7572
`jsonLogging` | If true, use JSON-formatted logs instead of human readable logs. | `false`
73+
`enablePrometheusServer` | If true, start an http server exposing `/metrics` endpoint for prometheus. | `false`
74+
`prometheusServerPort` | Replaces the default HTTP port for exposing prometheus metrics. | `9092`
75+
`podMonitor.create` | if `true`, create a PodMonitor | `false`
76+
`podMonitor.interval` | Prometheus scrape interval | `30s`
77+
`podMonitor.sampleLimit` | Number of scraped samples accepted | `5000`
78+
`podMonitor.labels` | Additional PodMonitor metadata labels | `{}`
7679

77-
### Testing Configuration (NOT RECOMMENDED FOR PROD DEPLOYMENTS)
80+
81+
### AWS Node Termination Handler - Queue-Processor Mode Configuration
7882

7983
Parameter | Description | Default
8084
--- | --- | ---
81-
`procUptimeFile` | (Used for Testing) Specify the uptime file | `/proc/uptime`
82-
`awsEndpoint` | (Used for testing) If specified, use the AWS endpoint to make API calls | None
83-
`awsSecretAccessKey` | (Used for testing) Pass-thru env var | None
84-
`awsAccessKeyID` | (Used for testing) Pass-thru env var | None
85-
`dryRun` | If true, only log if a node would be drained | `false`
85+
`enableSqsTerminationDraining` | If true, this turns on queue-processor mode which drains nodes when an SQS termination event is received| `false`
86+
`queueURL` | Listens for messages on the specified SQS queue URL | None
87+
`awsRegion` | If specified, use the AWS region for AWS API calls, else NTH will try to find the region through AWS_REGION env var, IMDS, or the specified queue URL | ``
88+
89+
### AWS Node Termination Handler - IMDS Mode Configuration
90+
91+
Parameter | Description | Default
92+
--- | --- | ---
93+
`enableScheduledEventDraining` | [EXPERIMENTAL] If true, drain nodes before the maintenance window starts for an EC2 instance scheduled event | `false`
94+
`enableSpotInterruptionDraining` | If true, drain nodes when the spot interruption termination notice is received | `true`
95+
`useHostNetwork` | If `true`, enables `hostNetwork` for the Linux DaemonSet. NOTE: setting this to `false` may cause issues accessing IMDSv2 if your account is not configured with an IP hop count of 2 | `true`
8696

8797
### Kubernetes Configuration
8898

@@ -118,17 +128,21 @@ Parameter | Description | Default
118128
`nodeSelectorTermsOs` | Operating System Node Selector Key | >=1.14: `kubernetes.io/os`, <1.14: `beta.kubernetes.io/os`
119129
`nodeSelectorTermsArch` | CPU Architecture Node Selector Key | >=1.14: `kubernetes.io/arch`, <1.14: `beta.kubernetes.io/arch`
120130
`targetNodeOs` | Space separated list of node OS's to target, e.g. "linux", "windows", "linux windows". Note: Windows support is experimental. | `"linux"`
121-
`enablePrometheusServer` | If true, start an http server exposing `/metrics` endpoint for prometheus. | `false`
122-
`prometheusServerPort` | Replaces the default HTTP port for exposing prometheus metrics. | `9092`
123-
`podMonitor.create` | if `true`, create a PodMonitor | `false`
124-
`podMonitor.interval` | Prometheus scrape interval | `30s`
125-
`podMonitor.sampleLimit` | Number of scraped samples accepted | `5000`
126-
`podMonitor.labels` | Additional PodMonitor metadata labels | `{}`
127131
`updateStrategy` | Update strategy for the all DaemonSets (Linux and Windows) | `type=RollingUpdate,rollingUpdate.maxUnavailable=1`
128132
`linuxUpdateStrategy` | Update strategy for the Linux DaemonSet | `type=RollingUpdate,rollingUpdate.maxUnavailable=1`
129133
`windowsUpdateStrategy` | Update strategy for the Windows DaemonSet | `type=RollingUpdate,rollingUpdate.maxUnavailable=1`
130134

135+
### Testing Configuration (NOT RECOMMENDED FOR PROD DEPLOYMENTS)
136+
137+
Parameter | Description | Default
138+
--- | --- | ---
139+
`procUptimeFile` | (Used for Testing) Specify the uptime file | `/proc/uptime`
140+
`awsEndpoint` | (Used for testing) If specified, use the AWS endpoint to make API calls | None
141+
`awsSecretAccessKey` | (Used for testing) Pass-thru env var | None
142+
`awsAccessKeyID` | (Used for testing) Pass-thru env var | None
143+
`dryRun` | If true, only log if a node would be drained | `false`
144+
131145
## Metrics endpoint consideration
132-
If prometheus server is enabled and since NTH is a daemonset with `host_networking=true`, nothing else will be able to bind to `:9092` (or the port configured) in the root network namespace
133-
since it's listening on all interfaces.
134-
Therefore, it will need to have a firewall/security group configured on the nodes to block access to the `/metrics` endpoint.
146+
NTH in IMDS mode runs as a DaemonSet w/ `host_networking=true` by default. If the prometheus server is enabled, nothing else will be able to bind to the configured port (by default `:9092`) in the root network namespace. Therefore, it will need to have a firewall/security group configured on the nodes to block access to the `/metrics` endpoint.
147+
148+
You can switch NTH in IMDS mode to run w/ `host_networking=false`, but you will need to make sure that IMDSv1 is enabled or IMDSv2 IP hop count will need to be incremented to 2. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html

config/helm/aws-node-termination-handler/templates/daemonset.linux.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ spec:
7474
{{- toYaml . | nindent 8 }}
7575
{{- end }}
7676
serviceAccountName: {{ template "aws-node-termination-handler.serviceAccountName" . }}
77-
hostNetwork: true
77+
hostNetwork: {{ .Values.useHostNetwork }}
7878
dnsPolicy: {{ .Values.dnsPolicy | default "ClusterFirstWithHostNet" | quote }}
7979
containers:
8080
- name: {{ include "aws-node-termination-handler.name" . }}

config/helm/aws-node-termination-handler/templates/psp.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ metadata:
1010
spec:
1111
privileged: false
1212
hostIPC: false
13-
hostNetwork: true
13+
hostNetwork: {{ .Values.useHostNetwork }}
1414
hostPID: false
1515
readOnlyRootFilesystem: false
1616
allowPrivilegeEscalation: false

config/helm/aws-node-termination-handler/values.yaml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ resources:
3333
memory: "128Mi"
3434
cpu: "100m"
3535

36-
# enableSqsTerminationDraining If true, drain nodes when an SQS termination event is received
36+
# enableSqsTerminationDraining If true, this turns on queue-processor mode which drains nodes when an SQS termination event is received
3737
enableSqsTerminationDraining: false
3838

3939
# queueURL Listens for messages on the specified SQS queue URL
@@ -174,3 +174,8 @@ updateStrategy:
174174
maxUnavailable: 1
175175
linuxUpdateStrategy: ""
176176
windowsUpdateStrategy: ""
177+
178+
# Determines if NTH uses host networking for Linux when running the DaemonSet (only IMDS mode; queue-processor never runs with host networking)
179+
# If you have disabled IMDSv1 and are relying on IMDSv2, you'll need to increase the IP hop count to 2 before switching this to false
180+
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html
181+
useHostNetwork: true

scripts/generate-k8s-yaml

Lines changed: 36 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,17 @@ NAMESPACE="kube-system"
1010
MAKEFILEPATH=$SCRIPTPATH/../Makefile
1111
VERSION=$(make -s -f $MAKEFILEPATH version)
1212
BUILD_DIR=$SCRIPTPATH/../build/k8s-resources/$VERSION
13+
1314
INDV_RESOURCES_DIR=$BUILD_DIR/individual-resources
1415
TAR_RESOURCES_FILE=$BUILD_DIR/individual-resources.tar
1516
AGG_RESOURCES_YAML=$BUILD_DIR/all-resources.yaml
1617
mkdir -p $INDV_RESOURCES_DIR
1718

19+
QP_INDV_RESOURCES_DIR=$BUILD_DIR/individual-resources-queue-processor
20+
QP_TAR_RESOURCES_FILE=$BUILD_DIR/individual-resources-queue-processor.tar
21+
QP_AGG_RESOURCES_YAML=$BUILD_DIR/all-resources-queue-processor.yaml
22+
mkdir -p $QP_INDV_RESOURCES_DIR
23+
1824
USAGE=$(cat << 'EOM'
1925
Usage: generate-k8s-yaml [-n <K8s_NAMESPACE>]
2026
Generates the kubernetes yaml resource files from the helm chart
@@ -46,30 +52,58 @@ mv $BUILD_DIR/$PLATFORM-amd64/helm $BUILD_DIR/.
4652
rm -rf $BUILD_DIR/$PLATFORM-amd64
4753
chmod +x $BUILD_DIR/helm
4854

55+
## IMDS Mode
4956
$BUILD_DIR/helm template aws-node-termination-handler \
5057
--namespace $NAMESPACE \
5158
--set targetNodeOs="linux windows" \
5259
$SCRIPTPATH/../config/helm/aws-node-termination-handler/ > $AGG_RESOURCES_YAML
5360

54-
# remove helm annotations from template
61+
## Queue Processor Mode
62+
$BUILD_DIR/helm template aws-node-termination-handler \
63+
--namespace $NAMESPACE \
64+
--set enableSqsTerminationDraining="true" \
65+
$SCRIPTPATH/../config/helm/aws-node-termination-handler/ > $QP_AGG_RESOURCES_YAML
66+
67+
# IMDS mode - remove helm annotations from template
5568
cat $AGG_RESOURCES_YAML | grep -v 'helm.sh\|app.kubernetes.io/managed-by: Helm' > $BUILD_DIR/helm_annotations_removed.yaml
5669
mv $BUILD_DIR/helm_annotations_removed.yaml $AGG_RESOURCES_YAML
5770

71+
# Queue Processor Mode - remove helm annotations from template
72+
cat $QP_AGG_RESOURCES_YAML | grep -v 'helm.sh\|app.kubernetes.io/managed-by: Helm' > $BUILD_DIR/helm_annotations_removed.yaml
73+
mv $BUILD_DIR/helm_annotations_removed.yaml $QP_AGG_RESOURCES_YAML
74+
75+
# IMDS Mode
5876
$BUILD_DIR/helm template aws-node-termination-handler \
5977
--namespace $NAMESPACE \
6078
--set targetNodeOs="linux windows" \
6179
--output-dir $INDV_RESOURCES_DIR/ \
6280
$SCRIPTPATH/../config/helm/aws-node-termination-handler/
6381

64-
# remove helm annotations from template
82+
# Queue Processor Mode
83+
$BUILD_DIR/helm template aws-node-termination-handler \
84+
--namespace $NAMESPACE \
85+
--set enableSqsTerminationDraining="true" \
86+
--output-dir $QP_INDV_RESOURCES_DIR/ \
87+
$SCRIPTPATH/../config/helm/aws-node-termination-handler/
88+
89+
# Queue Processor Mode - remove helm annotations from template
6590
for i in $INDV_RESOURCES_DIR/aws-node-termination-handler/templates/*; do
6691
cat $i | grep -v 'helm.sh\|app.kubernetes.io/managed-by: Helm' > $BUILD_DIR/helm_annotations_removed.yaml
6792
mv $BUILD_DIR/helm_annotations_removed.yaml $i
6893
done
6994

95+
# IMDS Mode - remove helm annotations from template
96+
for i in $QP_INDV_RESOURCES_DIR/aws-node-termination-handler/templates/*; do
97+
cat $i | grep -v 'helm.sh\|app.kubernetes.io/managed-by: Helm' > $BUILD_DIR/helm_annotations_removed.yaml
98+
mv $BUILD_DIR/helm_annotations_removed.yaml $i
99+
done
100+
70101
cd $INDV_RESOURCES_DIR/aws-node-termination-handler/ && tar cvf $TAR_RESOURCES_FILE templates/*
102+
cd $QP_INDV_RESOURCES_DIR/aws-node-termination-handler/ && tar cvf $QP_TAR_RESOURCES_FILE templates/*
71103
cd $SCRIPTPATH
72104

73105
echo "Generated aws-node-termination-handler kubernetes yaml resources files in:"
74106
echo " - $AGG_RESOURCES_YAML"
75107
echo " - $TAR_RESOURCES_FILE"
108+
echo " - $QP_AGG_RESOURCES_YAML"
109+
echo " - $QP_TAR_RESOURCES_FILE"

scripts/upload-resources-to-github

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ BUILD_DIR=$SCRIPTPATH/../build/k8s-resources/$VERSION
1010
BINARY_DIR=$SCRIPTPATH/../build/bin
1111
INDV_K8S_RESOURCES=$BUILD_DIR/individual-resources.tar
1212
AGG_RESOURCES_YAML=$BUILD_DIR/all-resources.yaml
13+
QP_TAR_RESOURCES_FILE=$BUILD_DIR/individual-resources-queue-processor.tar
14+
QP_AGG_RESOURCES_YAML=$BUILD_DIR/all-resources-queue-processor.yaml
1315
BINARIES_ONLY="false"
1416

1517
USAGE=$(cat << 'EOM'
@@ -66,7 +68,7 @@ gather_assets_to_upload() {
6668
resources+=("$binary")
6769
done
6870
if [ $BINARIES_ONLY != "true" ]; then
69-
resources+=("$INDV_K8S_RESOURCES" "$AGG_RESOURCES_YAML")
71+
resources+=("$INDV_K8S_RESOURCES" "$AGG_RESOURCES_YAML" "$QP_INDV_K8S_RESOURCES" "$QP_AGG_RESOURCES_YAML")
7072
fi
7173
echo "${resources[@]}"
7274
}
Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
#!/bin/bash
2+
set -euo pipefail
3+
4+
# Available env vars:
5+
# $TMP_DIR
6+
# $CLUSTER_NAME
7+
# $KUBECONFIG
8+
# $NODE_TERMINATION_HANDLER_DOCKER_REPO
9+
# $NODE_TERMINATION_HANDLER_DOCKER_TAG
10+
# $WEBHOOK_DOCKER_REPO
11+
# $WEBHOOK_DOCKER_TAG
12+
# $AEMM_URL
13+
# $AEMM_VERSION
14+
15+
function fail_and_exit {
16+
echo "❌ Spot Interruption w/o Host Networking test failed $CLUSTER_NAME"
17+
exit ${1:-1}
18+
}
19+
20+
echo "Starting Spot Interruption w/o Host Networking Test for Node Termination Handler"
21+
22+
SCRIPTPATH="$( cd "$(dirname "$0")" ; pwd -P )"
23+
24+
common_helm_args=()
25+
[[ "${TEST_WINDOWS-}" == "true" ]] && common_helm_args+=(--set targetNodeOs="windows")
26+
[[ -n "${NTH_WORKER_LABEL-}" ]] && common_helm_args+=(--set nodeSelector."$NTH_WORKER_LABEL")
27+
28+
anth_helm_args=(
29+
upgrade
30+
--install
31+
"$CLUSTER_NAME-anth"
32+
"$SCRIPTPATH/../../config/helm/aws-node-termination-handler/"
33+
--wait
34+
--force
35+
--namespace kube-system
36+
--set instanceMetadataURL="${INSTANCE_METADATA_URL:-"http://$AEMM_URL:$IMDS_PORT"}"
37+
--set image.repository="$NODE_TERMINATION_HANDLER_DOCKER_REPO"
38+
--set image.tag="$NODE_TERMINATION_HANDLER_DOCKER_TAG"
39+
--set enableScheduledEventDraining="false"
40+
--set enableSpotInterruptionDraining="true"
41+
--set taintNode="true"
42+
--set useHostNetwork="false"
43+
--set tolerations=""
44+
)
45+
[[ -n "${NODE_TERMINATION_HANDLER_DOCKER_PULL_POLICY-}" ]] &&
46+
anth_helm_args+=(--set image.pullPolicy="$NODE_TERMINATION_HANDLER_DOCKER_PULL_POLICY")
47+
[[ ${#common_helm_args[@]} -gt 0 ]] &&
48+
anth_helm_args+=("${common_helm_args[@]}")
49+
50+
set -x
51+
helm "${anth_helm_args[@]}"
52+
set +x
53+
54+
emtp_helm_args=(
55+
upgrade
56+
--install
57+
"$CLUSTER_NAME-emtp"
58+
"$SCRIPTPATH/../../config/helm/webhook-test-proxy/"
59+
--wait
60+
--force
61+
--namespace default
62+
--set webhookTestProxy.image.repository="$WEBHOOK_DOCKER_REPO"
63+
--set webhookTestProxy.image.tag="$WEBHOOK_DOCKER_TAG"
64+
)
65+
[[ -n "${WEBHOOK_DOCKER_PULL_POLICY-}" ]] &&
66+
emtp_helm_args+=(--set webhookTestProxy.image.pullPolicy="$WEBHOOK_DOCKER_PULL_POLICY")
67+
[[ ${#common_helm_args[@]} -gt 0 ]] &&
68+
emtp_helm_args+=("${common_helm_args[@]}")
69+
70+
set -x
71+
helm "${emtp_helm_args[@]}"
72+
set +x
73+
74+
aemm_helm_args=(
75+
upgrade
76+
--install
77+
"$CLUSTER_NAME-aemm"
78+
"$AEMM_DL_URL"
79+
--wait
80+
--namespace default
81+
--set servicePort="$IMDS_PORT"
82+
--set 'tolerations[0].effect=NoSchedule'
83+
--set 'tolerations[0].operator=Exists'
84+
--set arguments='{spot}'
85+
)
86+
[[ ${#common_helm_args[@]} -gt 0 ]] &&
87+
aemm_helm_args+=("${common_helm_args[@]}")
88+
89+
set -x
90+
retry 5 helm "${aemm_helm_args[@]}"
91+
set +x
92+
93+
TAINT_CHECK_CYCLES=15
94+
TAINT_CHECK_SLEEP=15
95+
96+
deployed=0
97+
for i in `seq 1 $TAINT_CHECK_CYCLES`; do
98+
if [[ $(kubectl get deployments regular-pod-test -o jsonpath='{.status.unavailableReplicas}') -eq 0 ]]; then
99+
echo "✅ Verified regular-pod-test pod was scheduled and started!"
100+
deployed=1
101+
break
102+
fi
103+
echo "Setup Loop $i/$TAINT_CHECK_CYCLES, sleeping for $TAINT_CHECK_SLEEP seconds"
104+
sleep $TAINT_CHECK_SLEEP
105+
done
106+
107+
if [[ $deployed -eq 0 ]]; then
108+
echo "❌ regular-pod-test pod deployment failed"
109+
fail_and_exit 2
110+
fi
111+
112+
cordoned=0
113+
tainted=0
114+
test_node=${TEST_NODE:-$CLUSTER_NAME-worker}
115+
for i in `seq 1 $TAINT_CHECK_CYCLES`; do
116+
if [[ $cordoned -eq 0 ]] && kubectl get nodes $test_node | grep SchedulingDisabled >/dev/null; then
117+
echo "✅ Verified the worker node was cordoned!"
118+
cordoned=1
119+
fi
120+
121+
if [[ $cordoned -eq 1 && $tainted -eq 0 ]] && kubectl get nodes $test_node -o json | grep -q "aws-node-termination-handler/spot-itn" >/dev/null; then
122+
echo "✅ Verified the worked node was tainted!"
123+
tainted=1
124+
fi
125+
126+
if [[ $tainted -eq 1 && $(kubectl get deployments regular-pod-test -o=jsonpath='{.status.unavailableReplicas}') -eq 1 ]]; then
127+
echo "✅ Verified the regular-pod-test pod was evicted!"
128+
echo "✅ Spot Interruption w/o Host Networking Test Passed $CLUSTER_NAME! ✅"
129+
exit 0
130+
fi
131+
echo "Assertion Loop $i/$TAINT_CHECK_CYCLES, sleeping for $TAINT_CHECK_SLEEP seconds"
132+
sleep $TAINT_CHECK_SLEEP
133+
done
134+
135+
if [[ $cordoned -eq 0 ]]; then
136+
echo "❌ Worker node was not cordoned"
137+
elif [[ $tainted -eq 0 ]]; then
138+
echo "❌ Worker node was not tainted"
139+
else
140+
echo "❌ regular-pod-test pod was not evicted"
141+
fi
142+
fail_and_exit 1

0 commit comments

Comments
 (0)