Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm Chart versions, values and upgrading #1884

Open
jefflill opened this issue Feb 18, 2024 · 2 comments
Open

Helm Chart versions, values and upgrading #1884

jefflill opened this issue Feb 18, 2024 · 2 comments

Comments

@jefflill
Copy link
Collaborator

We haven't really tried keeping our Helm chart versions in sync with the container images actually being deployed. Our thinking has been that chart versions don't matter because we're not storing these in a chart museum, etc.

This has worked fine for cluster setup, but might be problematic when the time comes to support upgrading NEONKUBE clusters. We should be able to upgrade Kubernetes, but we'll need to at least bump the versions of our charts for Helm to realize the chart has changed and that upgrade needs to be performed, right?

We really don't want to tackle cluster upgrades right now (or even spend time thinking about it). I think we should assume that clusters created by early users won't be completely up-gradable and perhaps not up-gradable at all. We should tag out early releases as beta with a warning about the possible lack of upgrades.

@jefflill jefflill changed the title Helm Chart versions and cluster upgrading Helm Chart versions, values and upgrading Feb 18, 2024
@jefflill
Copy link
Collaborator Author

jefflill commented Feb 18, 2024

OpenEBS has completely restructured their Helm charts since we built our v3.2.0 related charts. This now embeds charts for cStor, Jiva, and NFS as subcharts in the main chart where we deploy these as separate charts. This makes upgrading a bit of a mess.

After thinking about this, I believe we've been handling Helm charts wrong from the very beginning of this project. We've been focused on just getting NEONKUBE clusters deployed but we haven't been thinking much about how we'll upgrade our code or existing clusters in the future (which makes sense until we release a first beta).

The problem is that we've been manually editing Helm chart values and even other things like template files. This makes it tricky to upgrade Helm charts because we need to diff these files to identify the changes we've made and also identify upstream changes made to the official charts. We also pass some values explicitly to Helm when we install charts; more complexity.

We should look into refactoring all of this:

  • Use official Helm charts without making any changes
  • Pass any custom values when installing the charts
  • Include special comments for any Helm chart file changes we can't work around any other way.

This should make Helm chart upgrades dramatically easier.

NOTE: I ran into this with OpenEBS because the Helm charts has dramatically diverged from what we currently have. They've maintained Helm chart backwards compatibility when folks just change values but we get no benefit from that since we're so different. I'm going to migrate OpenEBS to this new scheme now since I essentially have to do the 3-way diff anyway and it looks like we're already passing most custom values to Helm when installing.

@jefflill
Copy link
Collaborator Author

jefflill commented Mar 7, 2024

Here are some remaining upgrade related tasks:

  • OpenEBS Mayastor is not currently supported. This works like cStor so this should be straightforward.

  • OpenEBS service monitoring is not currently enabled. We'll probably want to use this:

    https://github.com/openebs/monitoring
    https://github.com/openebs/monitoring/blob/161b5af7f7525c223b9165a13db2d6b667d08aad/docs/guide.md?plain=1

  • Grafana is currently very old v5x vs v9x. There's a ton of cool new features.

  • We should constrain the cStor operator and perhaps other related components to run on the storage nodes:

    • Add node selector for the storage nodes
    • Set replicas to the number of storage nodes
    • localpv-provisioner should probably also be spread across storage nodes when there are any
    • Use topologySpreadConstraints to ensure that the operator gets spread across the storage nodes
  • Review other components for potential topologySpreadConstraints use.

  • ClusterAdvice: We're currently using IsSmallCluster internally when computing advice. This seems somewhat arbitrary since it's possible that a user could deploy a 9 node cluster on huge VMs with huge disks, which probably shouldn't be considered resource constrained.

    We should review this usage and consider taking the actual cluster resources when making decisions.

  • We should be calculating and using the new ClusterAdvice.NodeSelector property everywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant