| 
 | 1 | +# NPEP-133: FQDN Selector for Egress Traffic  | 
 | 2 | + | 
 | 3 | +* Issue:  | 
 | 4 | +  [#133](https://github.com/kubernetes-sigs/network-policy-api/issues/133)  | 
 | 5 | +* Status: Provisional  | 
 | 6 | + | 
 | 7 | +## TLDR  | 
 | 8 | + | 
 | 9 | +This enhancement proposes adding a new optional selector to specify egress peers  | 
 | 10 | +using [Fully Qualified Domain  | 
 | 11 | +Names](https://www.wikipedia.org/wiki/Fully_qualified_domain_name) (FQDNs).  | 
 | 12 | + | 
 | 13 | +## Goals  | 
 | 14 | + | 
 | 15 | +* Provide a selector to specify egress peers using a Fully Qualified Domain Name  | 
 | 16 | +  (for example `kubernetes.io`).  | 
 | 17 | +* Support basic wildcard matching capabilities when specifying FQDNs (for  | 
 | 18 | +  example `*.cloud-provider.io`)  | 
 | 19 | +* Currently only `ALLOW` type rules are proposed.  | 
 | 20 | +  * Safely enforcing `DENY` rules based on FQDN selectors is difficult as there  | 
 | 21 | +    is no guarantee a Network Policy plugin is aware of all IPs backing a FQDN  | 
 | 22 | +    policy. If a Network Policy plugin has incomplete information, it may  | 
 | 23 | +    accidentally allow traffic to an IP belonging to a denied domain. This would  | 
 | 24 | +    constitute a security breach.  | 
 | 25 | +      | 
 | 26 | +    By contrast, `ALLOW` rules, which may also have an incomplete list of IPs,  | 
 | 27 | +    would not create a security breach. In case of incomplete information, valid  | 
 | 28 | +    traffic would be dropped as the plugin believes the destination IP does not  | 
 | 29 | +    belong to the domain. While this is definitely undesirable, it is at least  | 
 | 30 | +    not an unsafe failure.  | 
 | 31 | + | 
 | 32 | +* Currently only AdminNetworkPolicy is the intended scope for this proposal.  | 
 | 33 | +  * Since Kubernetes NetworkPolicy does not have a FQDN selector, adding this  | 
 | 34 | +    capability to BaselineAdminNetworkPolicy could result in writing baseline  | 
 | 35 | +    rules that can't be replicated by an overriding NetworkPolicy. For example,  | 
 | 36 | +    if BANP allows traffic to `example.io`, but the namespace admin installs a  | 
 | 37 | +    Kubernetes Network Policy, the namespace admin has no way to replicate the  | 
 | 38 | +    `example.io` selector using just Kubernetes Network Policies.  | 
 | 39 | + | 
 | 40 | +## Non-Goals  | 
 | 41 | + | 
 | 42 | +* This enhancement does not include a FQDN selector for allowing ingress  | 
 | 43 | +  traffic.  | 
 | 44 | +* This enhancement only describes enhancements to the existing L4 filtering as  | 
 | 45 | +  provided by AdminNetworkPolicy. It does not propose any new L7 matching or  | 
 | 46 | +  filtering capabilities, like matching HTTP traffic or URL paths.  | 
 | 47 | +  * This selector should not control what DNS records are resolvable from a  | 
 | 48 | +    particular workload.  | 
 | 49 | +* This enhancement does not provide a mechanism for selecting in-cluster  | 
 | 50 | +  endpoints using FQDNs. To select Pods, Nodes, or the API Server,  | 
 | 51 | +  AdminNetworkPolicy has other more specific selectors.  | 
 | 52 | +  * Using the FQDN selector to refer to other Kubernetes endpoints, while not  | 
 | 53 | +    explicitly disallowed, is not defined by this spec and left up to individual  | 
 | 54 | +    providers. Trying to allow traffic to the following domains is NOT  | 
 | 55 | +    guaranteed to work:  | 
 | 56 | +    * `my-svc.my-namespace.svc.cluster.local` (the generated DNS record for a  | 
 | 57 | +      Service as defined  | 
 | 58 | +      [here](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#services))  | 
 | 59 | +    * `pod-ip-address.my-namespace.pod.cluster.local` (the generated DNS record  | 
 | 60 | +      for a Pod as defined  | 
 | 61 | +      [here](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pods))  | 
 | 62 | +* This enhancement does not add any new mechanisms for specifying how traffic is  | 
 | 63 | +  routed to a destination (egress gateways, alternative SNAT IPs, etc). It just  | 
 | 64 | +  adds a new way of specifying packets to be allowed or dropped on the normal  | 
 | 65 | +  egress data path.  | 
 | 66 | +* This enhancement does not require any mechanism for securing DNS resolution  | 
 | 67 | +  (e.g. DNSSEC or DNS-over-TLS). Unsecured DNS requests are expected to be  | 
 | 68 | +  sufficient for looking up FQDNs.  | 
 | 69 | + | 
 | 70 | +## Introduction  | 
 | 71 | + | 
 | 72 | +FQDN-based egress controls are a common enterprise security practice.  | 
 | 73 | +Administrators often prefer to write security policies using DNS names such as  | 
 | 74 | +“www.kubernetes.io” instead of capturing all the IP addresses the DNS name might  | 
 | 75 | +resolve to. Keeping up with changing IP addresses is a maintenance burden, and  | 
 | 76 | +hampers the readability of the network policies.  | 
 | 77 | + | 
 | 78 | +## User Stories  | 
 | 79 | + | 
 | 80 | +* As a cluster admin, I want to allow all Pods in the cluster to send traffic to  | 
 | 81 | +  an external service specified by a well-known domain name. For example, all  | 
 | 82 | +  Pods must be able to talk to `my-service.com`.  | 
 | 83 | + | 
 | 84 | +* As a cluster admin, I want to allow Pods in the "monitoring" namespace to be  | 
 | 85 | +  able to send traffic to a logs-sink, hosted at `logs-storage.com`  | 
 | 86 | + | 
 | 87 | +* As a cluster admin, I want to allow all Pods in the cluster to send traffic to  | 
 | 88 | +  any of the managed services provided by my Cloud Provider. Since the cloud  | 
 | 89 | +  provider has a well known parent domain, I want to allow Pods to send traffic  | 
 | 90 | +  to all sub-domains using a wild-card selector -- `*.my-cloud-provider.com`  | 
 | 91 | + | 
 | 92 | +### Future User Stories  | 
 | 93 | + | 
 | 94 | +These are some user stories we want to keep in mind, but due to limitations of  | 
 | 95 | +the existing Network Policy API, cannot be implemented currently. The design  | 
 | 96 | +goal in this case is to ensure we do not make these unimplementable down the  | 
 | 97 | +line.  | 
 | 98 | + | 
 | 99 | +* As a cluster admin, I want to block all cluster egress traffic by default, and  | 
 | 100 | +  require namespace admins to create NetworkPolicies explicitly allowing egress  | 
 | 101 | +  to the domains they need to talk to.  | 
 | 102 | + | 
 | 103 | +  The Cluster admin would use a `BaselineAdminNetworkPolicy` object to switch  | 
 | 104 | +  the default disposition of the cluster. Namespace admins would then use a FQDN  | 
 | 105 | +  selector in the Kubernetes `NetworkPolicy` objects to allow `my-service.com`.  | 
 | 106 | +    | 
 | 107 | +## API  | 
 | 108 | + | 
 | 109 | +TODO: https://github.com/kubernetes-sigs/network-policy-api/issues/133  | 
 | 110 | + | 
 | 111 | +## Alternatives  | 
 | 112 | + | 
 | 113 | +### IP Block Selector  | 
 | 114 | + | 
 | 115 | +IP blocks are an important tool for specifying Network Policies. However, they  | 
 | 116 | +do not address all user needs and have a few short-comings when compared to FQDN  | 
 | 117 | +selectors:  | 
 | 118 | + | 
 | 119 | +* IP-based selectors can become verbose if a single logical service has numerous  | 
 | 120 | +  IPs backing it.  | 
 | 121 | +* IP-based selectors pose an ongoing maintenance burden for administrators, who  | 
 | 122 | +  need to be aware of changing IPs.  | 
 | 123 | +* IP-based selectors can result in policies that are difficult to read and  | 
 | 124 | +  audit.  | 
 | 125 | + | 
 | 126 | +### L4 Proxy  | 
 | 127 | + | 
 | 128 | +Users can also configure a L4 Proxy (e.g. using SOCKS) to inspect their traffic  | 
 | 129 | +and implement egress firewalls. They present a few trade-ofs when compared to a  | 
 | 130 | +FQDN selector:  | 
 | 131 | + | 
 | 132 | +* Additional configuration and maintenance burden of the proxy application  | 
 | 133 | +  itself  | 
 | 134 | +* Configuring new routes to direct traffic leaving the application to the L4  | 
 | 135 | +  proxy.    | 
 | 136 | + | 
 | 137 | +### L7 Policy  | 
 | 138 | + | 
 | 139 | +Another alternative is to provide a L7 selector, similar to the policies  | 
 | 140 | +provided by Service Mesh providers. While L7 selectors can offer more  | 
 | 141 | +expressivity, they often come trade-offs that are not suitable for all users:  | 
 | 142 | + | 
 | 143 | +* L7 selectors necessarily support a select set of protocols. Users may be  | 
 | 144 | +  using a custom protocol for application-level communication, but still want  | 
 | 145 | +  the ability to specify endpoints using DNS.  | 
 | 146 | +* L7 selectors often require proxies to perform deep packet inspection and  | 
 | 147 | +  enforce the policies. These proxies can introduce un-desireable latencies in  | 
 | 148 | +  the datapath of applications.  | 
 | 149 | + | 
 | 150 | +## References  | 
 | 151 | + | 
 | 152 | +* [NPEP #126](https://github.com/kubernetes-sigs/network-policy-api/issues/126):  | 
 | 153 | +  Egress Control in ANP  | 
 | 154 | + | 
 | 155 | +### Implementations  | 
 | 156 | + | 
 | 157 | +* [Antrea](https://antrea.io/docs/main/docs/antrea-network-policy/#fqdn-based-filtering)  | 
 | 158 | +* [Calico](https://docs.tigera.io/calico-enterprise/latest/network-policy/domain-based-policy)  | 
 | 159 | +* [Cilium](https://docs.cilium.io/en/latest/security/policy/language/#dns-based)  | 
 | 160 | +* [OpenShift](https://docs.openshift.com/container-platform/latest/networking/openshift_sdn/configuring-egress-firewall.html)  | 
 | 161 | + | 
 | 162 | +The following is a best-effort breakdown of capabilities of different  | 
 | 163 | +NetworkPolicy providers, as of 2023-09-25. This information may be out-of-date,  | 
 | 164 | +or inaccurate.  | 
 | 165 | + | 
 | 166 | +|                | Antrea                         | Calico       | Cilium       | OpenShift <br/> (current) | OpenShift <br/> (future) |  | 
 | 167 | +| -------------- | ------------------------------ | ------------ | ------------ | ------------------------- | ------------------------ |  | 
 | 168 | +| Implementation | DNS Snooping <br/> + Async DNS | DNS Snooping | DNS Snooping | Async DNS                 | DNS Snooping             |  | 
 | 169 | +| Wildcards      | ✔                              | ️✔           | ✔            | ❌                         | ✔                        |  | 
 | 170 | +| Egress Rules   | ✔                              | ️✔           | ✔            | ✔                         | ✔                        |  | 
 | 171 | +| Ingress Rules  | ❌                              | ️❌           | ❌            | ❌                         | ❌                        |  | 
 | 172 | +| Allow Rules    | ✔                              | ️✔           | ✔            | ✔                         | ✔                        |  | 
 | 173 | +| Deny Rules     | ✔                              | ️❌(?)        | ❌            | ✔                         | ❌(?)                     |  | 
0 commit comments