Skip to content

three_data_hall inconsistent connection string problem #2171

Open
@gm42

Description

@gm42

What happened?

I have found a quirky issue happening when using three_data_hall; the issue consists in the k8s FoundationDB cluster status containing an outdated connection string which does not correspond to what the cluster is currently using:

A:  connectionString:     mydb:[email protected]:4501,10.20.22.84:4501,10.20.60.215:4501,10.20.67.98:4501,10.20.81.100:4501,10.10.265.72:4501,10.20.217.63:4501,10.20.240.122:4501,10.20.243.153:4501
A:  seedConnectionString: <none>

B:  seedConnectionString: mydb:[email protected]:4501,10.20.22.192:4501,10.20.67.101:4501,10.10.210.33:4501,10.10.241.115:4501,10.10.268.53:4501,10.10.277.35:4501,10.10.285.107:4501,10.20.220.3:4501
B:  connectionString: mydb:[email protected]:4501,10.20.60.215:4501,10.20.67.98:4501,10.20.81.100:4501,10.10.220.14:4501,10.10.265.72:4501,10.20.200.1:4501,10.20.217.63:4501,10.20.243.153:4501

C:  seedConnectionString: mydb:[email protected]:4501,10.20.22.192:4501,10.20.67.101:4501,10.10.210.33:4501,10.10.241.115:4501,10.10.268.53:4501,10.10.277.35:4501,10.10.285.107:4501,10.20.220.3:4501
C:  connectionString: mydb:[email protected]:4501,10.20.22.84:4501,10.20.60.215:4501,10.20.67.98:4501,10.20.81.100:4501,10.10.265.72:4501,10.20.217.63:4501,10.20.240.122:4501,10.20.243.153:4501

When checking directly via get \xff\xff/connection_string, the correct connection string for the cluster appears to be the one with generation ID 9pION4FdMvW53gB4ikdjo61cs7HQKtK3.

What did you expect to happen?

The operator-maintained connection string field should always match get \xff\xff/connection_string.

How can we reproduce it (as minimally and precisely as possible)?

  1. trigger pod rotation of some pods in data hall A; for example coordinators, by changing something in the podTemplate for coordinators
  2. repeat for for halls B and C until issue is manifest

Anything else we need to know?

Related: #1958

By my analysis the client issues are symptoms and not the cause: there is always going to be a client with an incorrect connection string if the operator-provided configmap does not contain the same connection string for all halls.

FDB Kubernetes operator

v1.47.0

Kubernetes version

$ kubectl version
Client Version: v1.29.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.4-eks-a737599

Cloud provider

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions