@@ -180,7 +180,7 @@ communication to, from and within a Kubernetes cluster.
180180 documented appropriately in-order so that aformentioned tools may choose to
181181 enable it for use.
182182- Enable Kubernetes api-server dual-stack addresses listening and binding. Additionally
183- enable dualstack for Kubernetes default service.
183+ enable dual-stack for Kubernetes default service.
184184
185185## Proposal
186186
@@ -754,10 +754,10 @@ spec:
754754
755755# #### Creating a New Dual-Stack Service
756756
757- Users can create-dual stack services according to the following methods (in
757+ Users can create dual- stack services according to the following methods (in
758758increasing specificity) :
759- - If the user *prefers* dual stack (if available, service creation will not fail if
760- the cluster is not configured for dual stack) then they can do one of the
759+ - If the user *prefers* dual- stack (if available, service creation will not fail if
760+ the cluster is not configured for dual- stack) then they can do one of the
761761 following :
762762 1. Set `spec.ipFamilyPolicy` to `PreferDualStack` and do not set `spec.ipFamilies` or
763763 ` spec.clusterIPs` . The apiserver will set `spec.ipFamilies` according to
@@ -1023,7 +1023,7 @@ dual-stack:
10231023- Headless Kubernetes services: CoreDNS will resolve these services to either
10241024 an IPv4 entry (A record), an IPv6 entry (AAAA record), or both, depending on
10251025 the service's `ipFamily`.
1026- - Once Kubernetes service (pointing to Cluster DNS) is converted to dualstack pods
1026+ - Once Kubernetes service (pointing to Cluster DNS) is converted to dual-stack pods
10271027 will automatically get two DNS servers (one for each IP family) in their resolv.conf.
10281028
10291029### Ingress Controller Operation
@@ -1377,7 +1377,7 @@ This capability will move to stable when the following criteria have been met.
13771377* Support of at least one CNI plugin to provide multi-IP
13781378* e2e test successfully running on two platforms
13791379* testing ingress controller infrastructure with updated dual-stack services
1380- * dualstack tests run as pre-submit blocking for PRs
1380+ * dual-stack tests run as pre-submit blocking for PRs
13811381
13821382
13831383## Production Readiness Review Questionnaire
@@ -1389,12 +1389,6 @@ This capability will move to stable when the following criteria have been met.
13891389 - [X] Feature gate (also fill in values in `kep.yaml`)
13901390 - Feature gate name: IPv6DualStack
13911391 - Components depending on the feature gate: kube-apiserver, kube-controller-manager, kube-proxy, and kubelet
1392- - [ ] Other
1393- - Describe the mechanism:
1394- - Will enabling / disabling the feature require downtime of the control
1395- plane?
1396- - Will enabling / disabling the feature require downtime or reprovisioning
1397- of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
13981392
13991393* **Does enabling the feature change any default behavior?**
14001394 No. Pods and Services will remain single stack until cli flags have been modified
@@ -1404,25 +1398,28 @@ This capability will move to stable when the following criteria have been met.
14041398
14051399* **Can the feature be disabled once it has been enabled (i.e. can we roll back
14061400 the enablement)?**
1407- Yes.
1401+ Yes. If we disable the feature gate, we must remove the CLI parameters. An
1402+ older client won't see or be able to use the new fields.
14081403
14091404* **What happens if we reenable the feature if it was previously rolled back?**
1410- Similar to enable it the first time on a cluster.
1405+ Similar to enabling it the first time on a cluster. We don't load balance
1406+ across IP families, and with no selectors we don't get endpoints. If you use
1407+ the feature flag to turn off dual-stack, we do not edit your service. If you
1408+ disable dual-stack from the controller manager, the service will be given
1409+ single-stack endpoints.
14111410
14121411* **Are there any tests for feature enablement/disablement?**
14131412 The feature is being tested using integration tests with gate on/off. The
14141413 tests can be found here: https://github.com/kubernetes/kubernetes/tree/master/test/integration/dualstack
14151414
1416- The feature is being tested on -some of - the cloud providers and kind.
1417- 1. https://testgrid.k8s.io/sig-network-dualstack-azure-e2e. This has all dualstack tests on azure.
1415+ The feature is being tested on a cloud provider and kind.
1416+ 1. azure dual-stack e2e: https://testgrid.k8s.io/sig-network-dualstack-azure-e2e
14181417 2. kind dual-stack iptables: https://testgrid.k8s.io/sig-network-kind#sig-network-kind,%20dual,%20master
14191418 3. kind dual-stack ipvs: https://testgrid.k8s.io/sig-network-kind#sig-network-kind,%20ipvs,%20master
14201419
14211420### Rollout, Upgrade and Rollback Planning
14221421
14231422* **How can a rollout fail? Can it impact already running workloads?**
1424- Try to be as paranoid as possible - e.g., what if some components will restart
1425- mid-rollout?
14261423 Users **must** avoid changing existing cidrs. For both pods and services. Users
14271424 can only add to alternative ip family to existing cidrs. Changing existing cidrs
14281425 will result in nondeterministic failures depending on how the cluster networking
@@ -1436,60 +1433,57 @@ This capability will move to stable when the following criteria have been met.
14361433 N/A
14371434
14381435* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
1439- Describe manual testing that was done and the outcomes.
1440- Longer term, we may want to require automated upgrade/rollback tests, but we
1441- are missing a bunch of machinery and tooling and can't do that now .
1436+ We did manual testing of a cluster turning it off and on to explore
1437+ disabled-with-data behavior. Testing details can be seen in [Dual-stack
1438+ testing](https://github.com/kubernetes/kubernetes/blob/master/test/integration/dualstack/dualstack_test.go) .
14421439
14431440* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
14441441fields of API types, flags, etc.?**
1445- Even if applying deprecation policies, they may still surprise some users.
1442+ Enabling this without configuring the CLI options will not change any default
1443+ behavior.
14461444
14471445### Monitoring Requirements
14481446
14491447* **How can an operator determine if the feature is in use by workloads?**
1450- Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
1451- checking if there are objects with field X set) may be a last resort. Avoid
1452- logs or events for this purpose.
14531448
14541449 Operators can determine if the feature is in use by listing services that
1455- employ dual-stack. This can be done via
1450+ employ dual-stack. This can be done via:
14561451
14571452 ```
1458- kubectl get services --all-namespaces spec.ipFamilyPolicy!= SingleStack
1453+ kubectl get services --all-namespaces -ogo-template='{{range .items}}{{. spec.ipFamilyPolicy}}{{"\n"}}{{end}}' | grep -v SingleStack
14591454 ```
14601455
14611456* **What are the SLIs (Service Level Indicators) an operator can use to determine
14621457the health of the service?**
1463- - [ ] Metrics
1464- - Metric name:
1465- - [Optional] Aggregation method:
1466- - Components exposing the metric:
1467- - [ ] Other (treat as last resort)
1468- - Details:
1458+ Dual-stack networking is a functional addition, not a service with SLIs.
14691459
14701460* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
1471- At a high level, this usually will be in the form of "high percentile of SLI
1472- per day <= X". It's impossible to provide comprehensive guidance, but at the very
1473- high level (needs more precise definitions) those may be things like:
1474- - per-day percentage of API calls finishing with 5XX errors <= 1%
1475- - 99% percentile over day of absolute value from (job creation time minus expected
1476- job creation time) for cron job <= 10%
1477- - 99,9% of /health requests per day finish with 200 code
1461+ N/A
14781462
14791463* **Are there any missing metrics that would be useful to have to improve observability
14801464of this feature?**
1481- Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
1482- implementation difficulties, etc.).
1465+
1466+ 1. For services:
1467+
1468+ Services are not in the path of pod creation. Thus, any malfunction or
1469+ bugs in services will not affect pods.
1470+
1471+ Services/Endpoint selection is not in path of pod creation. It runs in
1472+ kube-controller-manager, thus this is N/A.
1473+
1474+ 2. For pods:
1475+
1476+ Dual-stack components are not in path of pod creation. It is in the path
1477+ of reporting pod ips. So pod creation will not be affected; if it is
1478+ affected, then it is a CNI issue.
1479+
1480+ Dual-stack components are in the path of PodIPs reporting which affects
1481+ kubelet. If there is a problem (or if there are persistent problems)
1482+ then disabling the featuregate on api-server will mitigate.
14831483
14841484### Dependencies
14851485
14861486* **Does this feature depend on any specific services running in the cluster?**
1487- Think about both cluster-level services (e.g. metrics-server) as well
1488- as node-level agents (e.g. specific version of CRI). Focus on external or
1489- optional services that are needed. For example, if this feature depends on
1490- a cloud provider API, or upon an external software-defined storage or network
1491- control plane.
1492-
14931487 This feature does not have dependency beyond kube-apiserver and standard controllers
14941488 shipped with Kubernetes releases.
14951489
@@ -1503,7 +1497,8 @@ of this feature?**
15031497
15041498* **Will enabling / using this feature result in any new calls to the cloud
15051499provider?**
1506- No
1500+ No. Errors are surfaced when the user makes a call. We don't focus on
1501+ metrics-server at this point.
15071502
15081503* **Will enabling / using this feature result in increasing size or count of
15091504the existing API objects?**
@@ -1521,16 +1516,10 @@ resource usage (CPU, RAM, disk, IO, ...) in any components?**
15211516
15221517### Troubleshooting
15231518
1524- The Troubleshooting section currently serves the `Playbook` role. We may consider
1525- splitting it into a dedicated `Playbook` document (potentially with some monitoring
1526- details). For now, we leave it here.
1527-
1528-
15291519* **How does this feature react if the API server and/or etcd is unavailable?**
15301520 This feature will not be operable if either kube-apiserver or etcd is unavailable.
15311521
15321522* **What are other known failure modes?**
1533- For each of them, fill in the following information by copying the below template:
15341523
15351524 * Failure to create dual-stack services. Operator must perform the following steps:
15361525 1. Ensure that the cluster has `IPv6DualStack` feature enabled.
0 commit comments