[bitnami/etcd] retrieable preupgrade.sh

### Name and Version

bitnami/etcd:3.5.21-debian-12-r5

### What is the problem this feature will solve?

The `preupgrade.sh` script which runs as a helm hook during `helm upgrade etcd ...` is too fast, and it occasionally initiates network connection before pod's networking is ready. When `etcdctl` opens socket trying to contact etcd cluster and retrieve member list, kubernetes networing still may not be fully initialized, so etcdctl times out after 5s. Increasing timeout will not help as the socket is already open and packets were sent.

In our case, we use Calico Networking and can see in our logs, that eth0 endpoint is brought up just about 300ms later than `etcdctl member list` is run by the preupgrade hook. Preupgrade fails, although the etcd cluster is fully operational:

```
2025-06-13T07:17:22.755728886Z etcd 07:17:22.75 INFO  ==> Welcome to the Bitnami etcd container
2025-06-13T07:17:22.756982359Z etcd 07:17:22.75 INFO  ==> Subscribe to project updates by watching https://github.com/bitnami/containers
2025-06-13T07:17:22.758264511Z etcd 07:17:22.75 INFO  ==> Did you know there are enterprise versions of the Bitnami catalog? For enhanced secure software supply chain features, unlimited pulls from Docker, LTS support, or application customization, see Bitnami Premium or Tanzu Application Catalog. See https://www.arrow.com/globalecs/na/vendors/bitnami/ for more information.
2025-06-13T07:17:22.759453290Z etcd 07:17:22.75 INFO  ==> 
2025-06-13T07:17:22.851452032Z 
2025-06-13T07:17:28.149734556Z {"level":"warn","ts":"2025-06-13T07:17:28.149528Z","logger":"etcd-client","caller":"v3@v3.5.21/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0x4000438000/etcd-0.etcd-headless.dev-latest.svc.cluster.local:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
2025-06-13T07:17:28.149770159Z Error: context deadline exceeded
2025-06-13T07:17:28.152789715Z etcd 07:17:28.15 ERROR ==> Unable to list members, are all members healthy?
```

I know it sounds weird to complain that "app is too fast" 😕, but...

### What is the feature you are proposing to solve the problem?

Allow more control to when (or how many times) the initial `etcdctl member list` is run:
* optional, configurable `sleep $DELAY` before running `etcdctl` - easy and good enough.
* configurable reties when the etcdctl command times out. This solution can't distinguish **why** etcdctl failed (disconnected network, unresponsive etcd?) - more complex solution with various edge cases.
* A completely different, Kubernetes-only approach - e.g. modify bitnami/etcd chart by adding initContainer which would check network availabilty first (how?).

Any other comments are warmly welcome.


### What alternatives have you considered?

For now, I managed to decrease the failure probablility from 100% to 60% by loweing container's CPU limit to 100m. Since the preupgrade job is retries six times in case of failure, the helm upgrade usually succeeds. Further decreasing of CPU limit for that job would probably improve chances of successful run, but I don't like much this approach.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bitnami/etcd] retrieable preupgrade.sh #82322

Name and Version

What is the problem this feature will solve?

What is the feature you are proposing to solve the problem?

What alternatives have you considered?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bitnami/etcd] retrieable preupgrade.sh #82322

Description

Name and Version

What is the problem this feature will solve?

What is the feature you are proposing to solve the problem?

What alternatives have you considered?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions