Skip to content

[DPE-9134]: add rollback compatibility#786

Open
skourta wants to merge 9 commits into2/edgefrom
dpe-9134-rollback-comp-matrix
Open

[DPE-9134]: add rollback compatibility#786
skourta wants to merge 9 commits into2/edgefrom
dpe-9134-rollback-comp-matrix

Conversation

@skourta
Copy link
Contributor

@skourta skourta commented Jan 13, 2026

Description of issue or feature:

Rollbacks are not supported on OpenSearch.

Solution:

This pull request introduces a controlled rollback override mechanism for OpenSearch upgrades, improves upgrade safety by enforcing compatibility checks, and adds infrastructure for managing a compatibility matrix. The changes ensure that rollbacks are only allowed when explicitly supported, and provide a manual override action for exceptional cases, along with robust logging and error handling.

Rollback and upgrade safety enhancements:

  • Added a new force-refresh-start action that allows operators to manually override the OpenSearch version on disk to enable a rollback, with warnings about potential data loss and downtime. This action is only available when a rollback is in progress and the unit is outdated. [1] [2] [3] [4]
  • Enforced compatibility checks during rollback: the charm now blocks unsupported rollbacks and instructs the operator to use the new override action if a rollback is incompatible. Clear status messages and logs are provided in these cases.

Compatibility matrix management:

  • Introduced a compatibility matrix (compatibility_matrix.json) to record supported upgrade and rollback paths. The matrix is reconciled at startup and used to determine if a rollback is supported. [1] [2] [3] [4]
  • Added methods to read and write the compatibility matrix file, and updated the upgrade logic to use this matrix for compatibility checks. [1] [2]

OpenSearch upgrade/rollback mechanics:

  • Extended the upgrade event to accept an override_version flag, and implemented logic to call the OpenSearch override-version command when this flag is set, with error handling and logging. [1] [2]
  • Improved the upgrade reconciliation process to handle rollback and override scenarios, including releasing node locks and updating unit status as appropriate. [1] [2]

How was this change tested?

  • Manually
  • Unit tests
  • Integration tests

Checklist

  • I have added or updated any relevant documentation.
  • I have cleaned any remaining cloud resources from my accounts.

@skourta skourta marked this pull request as draft January 13, 2026 05:25
@skourta skourta changed the title [DPE-9134]: add new rollback compatibility [DPE-9134]: add rollback compatibility Jan 14, 2026
@skourta skourta marked this pull request as ready for review January 27, 2026 08:02
@skourta skourta marked this pull request as draft January 27, 2026 17:21
@skourta skourta marked this pull request as ready for review March 9, 2026 09:57
Copilot AI review requested due to automatic review settings March 9, 2026 09:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds rollback-safety mechanics for OpenSearch upgrades by introducing a compatibility matrix, enforcing rollback compatibility checks, and providing a manual override action intended to help operators recover from otherwise-blocked rollback scenarios.

Changes:

  • Added rollback detection + compatibility gating logic, backed by a persisted compatibility matrix.
  • Introduced a new operator action (force-refresh-start) to force a rollback recovery path by overriding the on-disk OpenSearch version.
  • Updated upgrade/rollback flows to toggle additional cluster settings (including action.auto_create_index) and updated integration rollback recovery helper accordingly.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tests/integration/upgrades/helpers.py Updates rollback recovery helper to also restore action.auto_create_index.
src/upgrade.py Adds rollback detection and compatibility-matrix-based rollback allowance checks.
src/machine_upgrade.py Adds action constant and adjusts rollback/upgrade cluster-setting reset behavior.
src/charm.py Wires the new action and adds rollback gating + operator messaging to upgrade reconciliation.
lib/charms/opensearch/v0/opensearch_distro.py Adds compatibility matrix file path + read/write helpers and an override-version command wrapper.
lib/charms/opensearch/v0/opensearch_base_charm.py Extends upgrade event payload to include override_version, reconciles the compatibility matrix, and adds auto-index setting toggles around upgrade.
actions.yaml Defines the new force-refresh-start action and its parameters/documentation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants