fix(Helm)!: Refresh helm #880

DiamondJoseph · 2025-03-28T15:53:19Z

Re-applies blueapi specific configuration over the result of a fresh helm create blueapi, recovering where we had previously removed configuration that was thought not necessary.

Breaking changes to the helm chart:

Ingress form changes:
create is now enabled
host is now ``` hosts:
- host: foo.diamond.ac.uk
  paths:
  - path: /
    pathType: Prefix```

Changes worth considering:

Can define arbitrary volumes, volumeMounts
Liveness/Readiness probes use new /healthz endpoints

codecov · 2025-03-28T15:54:04Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.44%. Comparing base (3f6fd8c) to head (a275362).
Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #880   +/-   ##
=======================================
  Coverage   94.44%   94.44%           
=======================================
  Files          41       41           
  Lines        2537     2537           
=======================================
  Hits         2396     2396           
  Misses        141      141

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

callumforrester · 2025-04-01T09:38:30Z

Feels like this should really go in behind #871

helm/blueapi/templates/configmap.yaml

DiamondJoseph · 2025-04-01T10:22:51Z

helm/blueapi/templates/configmap.yaml

  OTLP_EXPORT_ENABLED: "true"
-  OTEL_EXPORTER_OTLP_TRACES_PROTOCOL: {{ .Values.tracing.otlp.protocol | default "http/protobuf" }}
-  OTEL_EXPORTER_OTLP_ENDPOINT: {{ required "OTLP export enabled but server address not set" .Values.tracing.otlp.server.host }}:{{ .Values.tracing.otlp.server.port | default 4318 }}
+  OTEL_EXPORTER_OTLP_TRACES_PROTOCOL: {{ default "http/protobuf" .Values.tracing.otlp.protocol }}


.Values.tracing.otlp must exist for us to have reached here, but protocol could be None, so use the default.

.github/workflows/_helm.yml

callumforrester

I'll be honest, I'm not very happy about merging this. The majority of helm changes we have merged have included bugs/regressions, we should have done something like #871 a long time ago. Most of the time we have done small, incremental changes to the helm chart so we could feel more confident, but even those broke. This, meanwhile, is a big change to a lot of high-risk areas. #871 is a good start but is not a comprehensive test suite. I think to merge this I will need:

More tests
An analysis of breaking changes
A migration guide to go into the changelog
Any areas of uncertainty we can explore on the test rigs

callumforrester · 2025-04-02T08:43:54Z

helm/blueapi/templates/hpa.yaml

Is there any point in adding HPA to an inherently stateful service?

Not much, but I could see a use case if we suddenly decide to start running a bunch of simulated device beamlines, scaling out as they become busy (if our readiness probe started reflecting whether the pod had an active task perhaps). End of the day, it comes for free with your helm create and if we remove it, it's one more thing to remove every time we refresh helm create again. I don't think it harms anyone to have it available but turned off.

callumforrester · 2025-04-02T08:44:53Z

.github/workflows/_container.yml

@@ -16,44 +12,6 @@ jobs:
        with:
          # Need this to get version number from last tag
          fetch-depth: 0
-      - name: Validate SemVer2 version compliance


I think we may have already discussed the pros and cons of getting rid of this, but I don't remember, I think it should stay

It is no longer required, as the blueapi release tag is no longer the version of the Helm chart, and therefore does not require being a semantic version. If you want to enforce that the blueapi container release is a semver it can be restored, but this was added for issues publishing the Helm chart.

helm/blueapi/Chart.yaml

helm/blueapi/templates/configmap.yaml

helm/blueapi/templates/ingress.yaml

helm/blueapi/values.yaml

callumforrester · 2025-04-02T11:54:42Z

helm/blueapi/templates/statefulset.yaml

+          livenessProbe:
+            {{- toYaml . | nindent 12 }}
+          {{- end }}
+          {{- with .Values.readinessProbe }}
+          readinessProbe:
+            {{- toYaml . | nindent 12 }}
+          {{- end }}
+          {{- with .Values.resources }}


Should: I think these were removed because they caused problems, are you trying to fix with #884?

Yes, that's the purpose of 884.

#884 has now gone in, so the probes now get /healthz which returns a 200 code

DiamondJoseph · 2025-04-02T13:21:00Z

Differences by document with ingress/tracing/initContainer enabled:

main config: unchanged
otel-config: unchanged
init container config:

No longer mounts enabled incorrectly in the initContainer applicationConfig. See also potential fix in comments in #887
ConfigMap and yaml file names adjusted
release-name-blueapi-service:

Gain labels
Mapping of service to container port now done via name of the port, meaning the service definition in Values.yaml refers to the external port and not the container port, as is the default behaviour.
release-name-blueapi-ingress:

Gain labels
secretName explicitly null rather than implicitly, allows overriding when e.g. not using Diamond ingress controller
test container:

Names adjusted
Uses version from appVersion rather than 0.1.0 container (in case of adding additional tests for cli functions that do not exist in 0.1.0)
statefulset:

DiamondJoseph · 2025-04-07T10:19:20Z

tests/unit_tests/test_helm_chart.py

-                """)
-            )
-        group[name] = manifest
+        if manifest is not None:


New handling of init config means it may now be None instead of an empty configMap

DiamondJoseph · 2025-04-07T11:10:12Z

Added Keith and Zoheb for extra points of view, but I won't merge until Callum is happy.

ZohebShaikh

The PR looks good overall! One suggestion that I believe would be helpful for developers using this Helm chart is to include a table in the documentation listing all the parameters—along with their names, descriptions, and default values. Something similar to what's done here

tpoliaw · 2025-06-27T09:22:02Z

helm/blueapi/README.md

+| worker.stomp | object | `{"auth":{"password":"guest","username":"guest"},"enabled":false,"url":"http://rabbitmq:61613/"}` | Message bus configuration for returning status to GDA/forwarding documents downstream |
+
+----------------------------------------------
+Autogenerated from chart metadata using [helm-docs v1.14.2](https://github.com/norwoodj/helm-docs/releases/v1.14.2)


Is there a way to not have generated code in the repo or a way to ensure it stays up to date without someone having to remember to update it?

The linting will fail if the output should have changed, as it's run as a pre-commit hook

He's too busy with the vertical slice sprints to be interested in the likes of us

ZohebShaikh · 2025-06-30T13:35:57Z

There were somethings about the readiness and liveliness probes that you are using

As the pod will take time to startup due to scratch, I was just going through the docs to see if it starts the probe after the initContainer is done booting up

And the other was we can have a cookie created on the fly and pass it to the healthz probe as a header so that at the fastapi side we can just accept requests from that which has that header for the probable case of the healthz getting DDOS'ed and making blueapi go into boot-loop

I also didn't see anything about if the liveliness probe fails it trying to restart the pod

…helm

DiamondJoseph · 2025-06-30T13:58:11Z

As the pod will take time to startup due to scratch, I was just going through the docs to see if it starts the probe after the initContainer is done booting up

The probes are defined on the container level, so they not attempt to run until the initContainer has run to completion. We could have a startupProbe that is slightly more permissive in case we are slow to start up, but with the default resources and nothing in scratch startup takes ~4s. The probes runs every 10s and needs to fail 3 in a row to die, so our startup is fast enough to not fail too many- whether they're permissive enough to not die on a filesystem blip I'm not sure. We do probably want a startupProbe for the potential 10s to connect to devices, but I don't know if we start serving the API prior to the subprocess being ready?

And the other was we can have a cookie created on the fly and pass it to the healthz probe as a header so that at the fastapi side we can just accept requests from that which has that header for the probable case of the healthz getting DDOS'ed and making blueapi go into boot-loop

It's a possibility, I think if we're going to add that it can be after 1.0.0, since it's non breaking. If the REST API is being hammered enough to freeze and kill the subprocess running a scan, someone is doing something wrong or malicious. After the document store we can consider re-architecting to extract the subprocess- then if the API goes down the scan is unaffected.

I also didn't see anything about if the liveliness probe fails it trying to restart the pod

If the liveness probes fails failureThreshold times (default 3), the pod is killed and subject to its restartpolicy.

ZohebShaikh · 2025-06-30T14:14:35Z

For the hammering I had a malicious user in mind, If a issue is created for the cookie we can work on it at a later stage...
But I think having a startupProbe with reasonable values will be good, to let the devices startup

…helm

helm/blueapi/README.md

…helm

ZohebShaikh

It looks good overall, few minor change requests.

As this is all helm changes can we see this working with all the configuration before merge? Just for more confidence in helm changes

helm/blueapi/README.md

helm/blueapi/values.yaml

…helm

DiamondJoseph force-pushed the refresh-helm branch from c3bd350 to d4c1295 Compare March 28, 2025 15:56

DiamondJoseph force-pushed the refresh-helm branch 6 times, most recently from 114042e to 6e4908b Compare April 1, 2025 12:23

DiamondJoseph commented Apr 1, 2025

View reviewed changes

.github/workflows/_helm.yml Outdated Show resolved Hide resolved

DiamondJoseph mentioned this pull request Apr 2, 2025

Add probe for k8s #884

Merged

callumforrester previously requested changes Apr 2, 2025

View reviewed changes

DiamondJoseph mentioned this pull request Apr 2, 2025

Tidy Helm action #771

Closed

DiamondJoseph commented Apr 7, 2025

View reviewed changes

DiamondJoseph requested review from callumforrester, keithralphs and ZohebShaikh April 7, 2025 11:09

DiamondJoseph mentioned this pull request Apr 23, 2025

Remove unused features from repository DiamondLightSource/daq-config-server#64

Closed

DiamondJoseph changed the title ~~Refresh helm~~ fix (Helm): Refresh helm Apr 23, 2025

DiamondJoseph changed the title ~~fix (Helm): Refresh helm~~ fix(Helm): Refresh helm Apr 23, 2025

ZohebShaikh requested changes Apr 23, 2025

View reviewed changes

olliesilvester mentioned this pull request Jun 11, 2025

Create an mx-bluesky-blueapi helmchart for common MX config DiamondLightSource/mx-bluesky#1113

Open

DiamondJoseph added 5 commits June 26, 2025 12:34

Add tests to ensure service/ingress reliability

b22c8ff

chore: Consolidate ConfigMaps

7f6db96

Refresh service and ingress

09bd101

Add automount to service account

fd19e3e

Extra documentation nice to haves

5c4c5f7

DiamondJoseph changed the title ~~fix(Helm): Refresh helm~~ fix(Helm)!: Refresh helm Jun 26, 2025

DiamondJoseph requested a review from ZohebShaikh June 26, 2025 11:42

Fix tests that relied on old Ingress name

cecf616

DiamondJoseph force-pushed the refresh-helm branch from 5e5bfe1 to cecf616 Compare June 26, 2025 11:46

Merge branch 'main' into refresh-helm

8c2f287

tpoliaw reviewed Jun 27, 2025

View reviewed changes

Configure Helm docs as a chart

95cc1c2

DiamondJoseph force-pushed the refresh-helm branch from 6d66e17 to 95cc1c2 Compare June 27, 2025 09:42

DiamondJoseph and others added 3 commits June 27, 2025 14:34

Merge branch 'main' into refresh-helm

c384692

Adjust to change in default config

20592e3

Merge branch 'main' into refresh-helm

764b411

Merge commit '3f6fd8c5fd127e9f2daed0e51f2111afff7d2f34' into refresh-…

d02760b

…helm

DiamondJoseph added 2 commits June 30, 2025 15:16

Merge commit '4f2b41373d0c1a0fe690a8402f53922a0a27785d' into refresh-…

0553dce

…helm

Add startupProbe ot allow devices to connect

87faaf4

ZohebShaikh reviewed Jun 30, 2025

View reviewed changes

helm/blueapi/README.md Outdated Show resolved Hide resolved

DiamondJoseph added 2 commits June 30, 2025 15:21

Merge commit '764b4119c820ca2bab67bc4da4a5192b8ab479c3' into refresh-…

daf2461

…helm

pebcak

a693fce

ZohebShaikh reviewed Jun 30, 2025

View reviewed changes

helm/blueapi/README.md Outdated Show resolved Hide resolved

helm/blueapi/README.md Outdated Show resolved Hide resolved

helm/blueapi/values.yaml Show resolved Hide resolved

helm/blueapi/values.yaml Outdated Show resolved Hide resolved

DiamondJoseph added 2 commits June 30, 2025 17:12

More documentation and defaults from review

94de2cb

Re-enable readinessProbe

16cb19d

ZohebShaikh self-requested a review July 1, 2025 09:03

ZohebShaikh approved these changes Jul 1, 2025

View reviewed changes

DiamondJoseph added 2 commits July 1, 2025 16:29

Re-enable readinessProbe

5bb0927

Merge commit '16cb19d48930572d00cc6f4d9d3bcbd75eb61e32' into refresh-…

a275362

…helm

DiamondJoseph merged commit 67a645a into main Jul 1, 2025
18 checks passed

DiamondJoseph deleted the refresh-helm branch July 1, 2025 15:35

fix(Helm)!: Refresh helm #880

fix(Helm)!: Refresh helm #880

Uh oh!

Conversation

DiamondJoseph commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

callumforrester commented Apr 1, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

callumforrester left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DiamondJoseph commented Apr 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DiamondJoseph commented Apr 7, 2025

Uh oh!

ZohebShaikh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZohebShaikh commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DiamondJoseph commented Jun 30, 2025

Uh oh!

ZohebShaikh commented Jun 30, 2025

Uh oh!

Uh oh!

ZohebShaikh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DiamondJoseph commented Mar 28, 2025 •

edited

Loading

codecov bot commented Mar 28, 2025 •

edited

Loading

ZohebShaikh commented Jun 30, 2025 •

edited

Loading