Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Upstream Failover in k8s Gateway API #10651

Open
pszeto opened this issue Feb 21, 2025 · 2 comments
Open

Support Upstream Failover in k8s Gateway API #10651

pszeto opened this issue Feb 21, 2025 · 2 comments

Comments

@pszeto
Copy link

pszeto commented Feb 21, 2025

Gloo Edge Product

Open Source

Gloo Edge Version

1.18.x

Is your feature request related to a problem? Please describe.

Currently it doesn't look like Upstream Failover is support in the k8 gateway api. It works when it's configured in with the edge apis.

Using Edge API on 1.18.4:

apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"gateway.solo.io/v1","kind":"VirtualService","metadata":{"annotations":{},"name":"vs-httpbin","namespace":"gloo-system"},"spec":{"virtualHost":{"domains":["*"],"routes":[{"matchers":[{"prefix":"/httpbin"}],"options":{"prefixRewrite":"/"},"routeAction":{"single":{"upstream":{"name":"httpbin-static","namespace":"gloo-system"}}}},{"directResponseAction":{"status":200},"matchers":[{"prefix":"/"}],"options":{"prefixRewrite":"/"}}]}}}
  creationTimestamp: "2025-02-21T18:37:53Z"
  generation: 14
  name: vs-httpbin
  namespace: gloo-system
  resourceVersion: "37210"
  uid: 7a147e92-3a5c-413e-a605-603ffe3c1dd1
spec:
  virtualHost:
    domains:
    - '*'
    routes:
    - matchers:
      - prefix: /
      options:
        headerManipulation:
          requestHeadersToAdd:
          - header:
              key: source
              value: gloo-gateway-1-18-edge-api
      routeAction:
        single:
          upstream:
            name: httpbin-priority-endpoint
            namespace: httpbin
status:
  statuses:
    gloo-system:
      reportedBy: gloo
      state: Accepted
      subresourceStatuses:
        '*v1.Proxy.gateway-proxy_gloo-system':
          reportedBy: gloo
          state: Accepted
apiVersion: gloo.solo.io/v1
kind: Upstream
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"gloo.solo.io/v1","kind":"Upstream","metadata":{"annotations":{},"name":"httpbin-priority-endpoint","namespace":"httpbin"},"spec":{"failover":{"policy":{"overprovisioningFactor":600},"prioritizedLocalities":[{"localityEndpoints":[{"lbEndpoints":[{"address":"failover-gateway.duckdns.org","port":80}],"locality":{"region":"west","zone":"alpha"}}]}]},"healthChecks":[{"healthyThreshold":2,"httpHealthCheck":{"path":"/status/200"},"interval":"10s","timeout":"1s","unhealthyThreshold":3}],"ignoreHealthOnHostRemoval":true,"loadBalancerConfig":{"healthyPanicThreshold":10},"outlierDetection":{"baseEjectionTime":"30s","consecutive5xx":3,"interval":"10s","maxEjectionPercent":100},"static":{"hosts":[{"addr":"primary-gateway.duckdns.org","port":80}]}}}
  creationTimestamp: "2025-02-21T18:38:07Z"
  generation: 2
  name: httpbin-priority-endpoint
  namespace: httpbin
  resourceVersion: "4771"
  uid: dee21e16-5d57-4590-8a97-6ffccd68d335
spec:
  failover:
    policy:
      overprovisioningFactor: 600
    prioritizedLocalities:
    - localityEndpoints:
      - lbEndpoints:
        - address: failover-gateway.duckdns.org
          port: 80
        locality:
          region: west
          zone: alpha
  healthChecks:
  - healthyThreshold: 2
    httpHealthCheck:
      path: /status/200
    interval: 10s
    timeout: 1s
    unhealthyThreshold: 3
  ignoreHealthOnHostRemoval: true
  loadBalancerConfig:
    healthyPanicThreshold: 10
  outlierDetection:
    baseEjectionTime: 30s
    consecutive5xx: 3
    interval: 10s
    maxEjectionPercent: 100
  static:
    hosts:
    - addr: primary-gateway.duckdns.org
      port: 80
status:
  statuses:
    gloo-system:
      reportedBy: gloo
      state: Accepted

The following cluster is generated:

{
     "version_info": "11970201800754775813",
     "cluster": {
      "@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster",
      "name": "httpbin-priority-endpoint_httpbin",
      "type": "STRICT_DNS",
      "connect_timeout": "5s",
      "health_checks": [
       {
        "timeout": "1s",
        "interval": "10s",
        "unhealthy_threshold": 3,
        "healthy_threshold": 2,
        "http_health_check": {
         "path": "/status/200"
        }
       }
      ],
      "dns_lookup_family": "V4_ONLY",
      "outlier_detection": {
       "consecutive_5xx": 3,
       "interval": "10s",
       "base_ejection_time": "30s",
       "max_ejection_percent": 100
      },
      "metadata": {},
      "common_lb_config": {
       "healthy_panic_threshold": {
        "value": 10
       }
      },
      "ignore_health_on_host_removal": true,
      "load_assignment": {
       "cluster_name": "httpbin-priority-endpoint_httpbin",
       "endpoints": [
        {
         "lb_endpoints": [
          {
           "endpoint": {
            "address": {
             "socket_address": {
              "address": "primary-gateway.duckdns.org",
              "port_value": 80
             }
            },
            "health_check_config": {
             "hostname": "primary-gateway.duckdns.org"
            },
            "hostname": "primary-gateway.duckdns.org"
           },
           "metadata": {
            "filter_metadata": {
             "envoy.transport_socket_match": {
              "primary-gateway.duckdns.org;primary-gateway.duckdns.org:80": true
             }
            }
           }
          }
         ]
        },
        {
         "locality": {
          "region": "west",
          "zone": "alpha"
         },
         "lb_endpoints": [
          {
           "endpoint": {
            "address": {
             "socket_address": {
              "address": "failover-gateway.duckdns.org",
              "port_value": 80
             }
            },
            "health_check_config": {
             "hostname": "failover-gateway.duckdns.org"
            },
            "hostname": "failover-gateway.duckdns.org"
           }
          }
         ],
         "priority": 1
        }
       ],
       "policy": {
        "overprovisioning_factor": 600
       }
      }
     },
     "last_updated": "2025-02-21T18:38:09.699Z"
    }

It has the failover-gateway.duckdns.org as a lb_endpoint. However, when I use the same upstream and configure an HTTPRoute and Gateway API to point to the upstream:

apiVersion: gateway.solo.io/v1
kind: VirtualHostOption
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"gateway.solo.io/v1","kind":"VirtualHostOption","metadata":{"annotations":{},"name":"header-manipulation","namespace":"gloo-system"},"spec":{"options":{"headerManipulation":{"requestHeadersToAdd":[{"header":{"key":"source","value":"gloo-gateway-1-18-k8-api"}}]}},"targetRefs":[{"group":"gateway.networking.k8s.io","kind":"Gateway","name":"http-gateway","namespace":"gloo-system"}]}}
  creationTimestamp: "2025-02-21T19:17:37Z"
  generation: 2
  name: header-manipulation
  namespace: gloo-system
  resourceVersion: "1076279"
  uid: 56aabd0e-91b3-45c5-a6f1-01129a8513d7
spec:
  options:
    headerManipulation:
      requestHeadersToAdd:
      - header:
          key: source
          value: gloo-gateway-1-18-k8-api
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: http-gateway
    namespace: gloo-system
status:
  statuses:
    gloo-system:
      reportedBy: gloo-kube-gateway
      state: Accepted
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  annotations:
    gateway.gloo.solo.io/gateway-parameters-name: custom-gw-params
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"gateway.networking.k8s.io/v1","kind":"Gateway","metadata":{"annotations":{"gateway.gloo.solo.io/gateway-parameters-name":"custom-gw-params"},"name":"http-gateway","namespace":"gloo-system"},"spec":{"gatewayClassName":"gloo-gateway","listeners":[{"allowedRoutes":{"namespaces":{"from":"All"}},"name":"http","port":80,"protocol":"HTTP"}]}}
  creationTimestamp: "2025-02-20T21:05:39Z"
  generation: 1
  name: http-gateway
  namespace: gloo-system
  resourceVersion: "932915"
  uid: f33b0e48-8176-4e64-8246-876660311669
spec:
  gatewayClassName: gloo-gateway
  listeners:
  - allowedRoutes:
      namespaces:
        from: All
    name: http
    port: 80
    protocol: HTTP
status:
  addresses:
  - type: IPAddress
    value: 104.196.213.15
  conditions:
  - lastTransitionTime: "2025-02-20T21:05:39Z"
    message: ""
    observedGeneration: 1
    reason: Accepted
    status: "True"
    type: Accepted
  - lastTransitionTime: "2025-02-20T21:05:39Z"
    message: ""
    observedGeneration: 1
    reason: Programmed
    status: "True"
    type: Programmed
  listeners:
  - attachedRoutes: 1
    conditions:
    - lastTransitionTime: "2025-02-20T21:05:39Z"
      message: ""
      observedGeneration: 1
      reason: Accepted
      status: "True"
      type: Accepted
    - lastTransitionTime: "2025-02-20T21:05:39Z"
      message: ""
      observedGeneration: 1
      reason: NoConflicts
      status: "False"
      type: Conflicted
    - lastTransitionTime: "2025-02-20T21:05:39Z"
      message: ""
      observedGeneration: 1
      reason: ResolvedRefs
      status: "True"
      type: ResolvedRefs
    - lastTransitionTime: "2025-02-20T21:05:39Z"
      message: ""
      observedGeneration: 1
      reason: Programmed
      status: "True"
      type: Programmed
    name: http
    supportedKinds:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
apiVersion: gloo.solo.io/v1
kind: Upstream
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"gloo.solo.io/v1","kind":"Upstream","metadata":{"annotations":{},"name":"httpbin-priority-endpoint","namespace":"httpbin"},"spec":{"failover":{"policy":{"overprovisioningFactor":600},"prioritizedLocalities":[{"localityEndpoints":[{"lbEndpoints":[{"address":"failover-gateway.duckdns.org","port":80}],"locality":{"region":"west","zone":"alpha"}}]}]},"healthChecks":[{"healthyThreshold":2,"httpHealthCheck":{"path":"/status/200"},"interval":"10s","timeout":"1s","unhealthyThreshold":3}],"ignoreHealthOnHostRemoval":true,"loadBalancerConfig":{"healthyPanicThreshold":10},"outlierDetection":{"baseEjectionTime":"30s","consecutive5xx":3,"interval":"10s","maxEjectionPercent":100},"static":{"hosts":[{"addr":"primary-gateway.duckdns.org","port":80}]}}}
  creationTimestamp: "2025-02-21T16:32:02Z"
  generation: 12
  name: httpbin-priority-endpoint
  namespace: httpbin
  resourceVersion: "988349"
  uid: 444dcec4-cb8f-4863-9e58-276c52f80ba8
spec:
  failover:
    policy:
      overprovisioningFactor: 600
    prioritizedLocalities:
    - localityEndpoints:
      - lbEndpoints:
        - address: failover-gateway.duckdns.org
          port: 80
        locality:
          region: west
          zone: alpha
  healthChecks:
  - healthyThreshold: 2
    httpHealthCheck:
      path: /status/200
    interval: 10s
    timeout: 1s
    unhealthyThreshold: 3
  ignoreHealthOnHostRemoval: true
  loadBalancerConfig:
    healthyPanicThreshold: 10
  outlierDetection:
    baseEjectionTime: 30s
    consecutive5xx: 3
    interval: 10s
    maxEjectionPercent: 100
  static:
    hosts:
    - addr: primary-gateway.duckdns.org
      port: 80
status:
  statuses:
    gloo-system:
      reportedBy: gloo
      state: Accepted
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"gateway.networking.k8s.io/v1","kind":"HTTPRoute","metadata":{"annotations":{},"name":"httpbin-static-httproute","namespace":"httpbin"},"spec":{"parentRefs":[{"group":"gateway.networking.k8s.io","kind":"Gateway","name":"http-gateway","namespace":"gloo-system"}],"rules":[{"backendRefs":[{"group":"gloo.solo.io","kind":"Upstream","name":"httpbin-priority-endpoint","weight":1}],"matches":[{"path":{"type":"PathPrefix","value":"/"}}]}]}}
  creationTimestamp: "2025-02-20T21:05:42Z"
  generation: 10
  name: httpbin-static-httproute
  namespace: httpbin
  resourceVersion: "982285"
  uid: 51ad6bac-0bb6-4643-8756-4450a3794058
spec:
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: http-gateway
    namespace: gloo-system
  rules:
  - backendRefs:
    - group: gloo.solo.io
      kind: Upstream
      name: httpbin-priority-endpoint
      weight: 1
    matches:
    - path:
        type: PathPrefix
        value: /
status:
  parents:
  - conditions:
    - lastTransitionTime: "2025-02-20T21:05:43Z"
      message: ""
      observedGeneration: 10
      reason: Accepted
      status: "True"
      type: Accepted
    - lastTransitionTime: "2025-02-20T21:11:38Z"
      message: ""
      observedGeneration: 10
      reason: ResolvedRefs
      status: "True"
      type: ResolvedRefs
    controllerName: solo.io/gloo-gateway
    parentRef:
      group: gateway.networking.k8s.io
      kind: Gateway
      name: http-gateway
      namespace: gloo-system

The envoy config doesn't have the failover address in the lb_endpoint:

 {
     "version_info": "1345311607506656158",
     "cluster": {
      "@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster",
      "name": "httpbin-priority-endpoint_httpbin",
      "type": "STRICT_DNS",
      "connect_timeout": "5s",
      "health_checks": [
       {
        "timeout": "1s",
        "interval": "10s",
        "unhealthy_threshold": 3,
        "healthy_threshold": 2,
        "http_health_check": {
         "path": "/status/200"
        }
       }
      ],
      "dns_lookup_family": "V4_ONLY",
      "outlier_detection": {
       "consecutive_5xx": 3,
       "interval": "10s",
       "base_ejection_time": "30s",
       "max_ejection_percent": 100
      },
      "metadata": {},
      "common_lb_config": {
       "healthy_panic_threshold": {
        "value": 10
       }
      },
      "alt_stat_name": "httpbin-priority-endpoint_httpbin",
      "ignore_health_on_host_removal": true,
      "load_assignment": {
       "cluster_name": "httpbin-priority-endpoint_httpbin",
       "endpoints": [
        {
         "lb_endpoints": [
          {
           "endpoint": {
            "address": {
             "socket_address": {
              "address": "primary-gateway.duckdns.org",
              "port_value": 80
             }
            },
            "health_check_config": {
             "hostname": "primary-gateway.duckdns.org"
            },
            "hostname": "primary-gateway.duckdns.org"
           },
           "metadata": {
            "filter_metadata": {
             "envoy.transport_socket_match": {
              "primary-gateway.duckdns.org;primary-gateway.duckdns.org:80": true
             }
            }
           }
          }
         ]
        }
       ]
      }
     },
     "last_updated": "2025-02-21T17:47:24.286Z"
    }

So when you try to ping the endpoint after the upstream host primary-gateway.duckdns.org:80 is down, it never switchs to the failover endpoint:

 curl http://104.196.213.15/get -kv                                                                                                                                                                               
*   Trying 104.196.213.15:80...
* Connected to 104.196.213.15 (104.196.213.15) port 80
> GET /get HTTP/1.1
> Host: 104.196.213.15
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 503 Service Unavailable
< content-length: 91
< content-type: text/plain
< date: Fri, 21 Feb 2025 21:09:34 GMT
< server: envoy
<
* Connection #0 to host 104.196.213.15 left intact
upstream connect error or disconnect/reset before headers. reset reason: connection timeout

Describe the solution you'd like

Feature parity with edge api.

Describe alternatives you've considered

No response

Additional Context

No response

@soloio-bot
Copy link

Zendesk ticket #5430 has been linked to this issue.

@erie149
Copy link

erie149 commented Feb 21, 2025

I would also state that it does not even work for "kube" Upstream types either. In fact, no lb_endpoints are every created in envoy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants