Skip to content

Pass rollback window duration to upgrade watcher command #8177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented May 19, 2025

What does this PR do?

This PR enhances the upgrade process to invoke the Upgrade Watcher with an additional CLI option, --rollback-window. This option accepts any value that can be parsed by time.ParseDuration. However, the code that invokes the Upgrade Watcher passes this value in seconds, e.g. 180s.

Why is it important?

The Upgrade Watcher will use the value of the --rollback-window CLI option to help ensure that the upgraded Agent can be rolled back manually within the specified rollback window duration. This functionality will be implemented in future PRs.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

None; the Upgrade Watcher is invoked internally as part of the Agent upgrade process.

How to test this PR locally

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

Copy link
Contributor

mergify bot commented May 19, 2025

This pull request does not have a backport label. Could you fix it @ycombinator? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@ycombinator ycombinator force-pushed the upgrade-watcher-rollback-window-cli branch 3 times, most recently from b491de6 to 625abbb Compare May 19, 2025 22:35
@ycombinator ycombinator requested a review from pchila May 19, 2025 23:32
@ycombinator ycombinator added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label May 19, 2025
@ycombinator ycombinator marked this pull request as ready for review May 19, 2025 23:32
@ycombinator ycombinator requested a review from a team as a code owner May 19, 2025 23:32
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@ycombinator ycombinator force-pushed the upgrade-watcher-rollback-window-cli branch from dbb9df9 to 625abbb Compare May 20, 2025 18:57
@ycombinator
Copy link
Contributor Author

Some upgrade integration tests in CI are failing with errors that look related to the changes in this PR:

Error:      	Received unexpected error:
        	            	failed to start agent upgrade to version "9.0.1": exit status 1
        	            	Error initializing version information: reading package version from file "/opt/Elastic/Agent/data/elastic-agent-9.1.0-SNAPSHOT-fb8f43/package.version": open /opt/Elastic/Agent/data/elastic-agent-9.1.0-SNAPSHOT-fb8f43/package.version: no such file or directory
        	            	Error: Failed trigger upgrade of daemon: watcher did not start in time
        	            	context deadline exceeded
        	            	os: process already finished

Moving PR back to draft while I investigate and fix.

@ycombinator ycombinator marked this pull request as draft May 20, 2025 23:14
@ycombinator ycombinator marked this pull request as ready for review May 21, 2025 05:39
@ycombinator ycombinator requested review from kaanyalti and removed request for michel-laterman May 21, 2025 16:16
@ycombinator ycombinator requested review from kaanyalti and pchila June 11, 2025 00:02
kaanyalti
kaanyalti previously approved these changes Jun 16, 2025
@ycombinator ycombinator force-pushed the upgrade-watcher-rollback-window-cli branch from da545a0 to 0711763 Compare June 27, 2025 23:48
@ycombinator ycombinator enabled auto-merge (squash) June 27, 2025 23:53
@ycombinator ycombinator requested review from pchila and kaanyalti June 27, 2025 23:53
@ycombinator
Copy link
Contributor Author

@pchila @kaanyalti Thank you for your patience with this PR as I shifted my focus to other, more time-critical work (FIPS). This PR is ready for re-review now, when you get a chance.

@elasticmachine
Copy link
Collaborator

elasticmachine commented Jun 28, 2025

💔 Build Failed

Failed CI Steps

History

cc @ycombinator

Copy link

@pchila
Copy link
Member

pchila commented Jun 30, 2025

@pchila @kaanyalti Thank you for your patience with this PR as I shifted my focus to other, more time-critical work (FIPS). This PR is ready for re-review now, when you get a chance.

@ycombinator can we put this one on hold for the moment? The discussion around locking for the upgrade marker may have impacts on the distributions of responsibilities (we may even want the elastic-agent main process to handle the writing of the rollback window, in which case we don't need to pass an extra parameter to the watcher)

@pchila pchila disabled auto-merge July 16, 2025 12:47
@pchila
Copy link
Member

pchila commented Jul 16, 2025

Superseded by #8767

@pchila pchila closed this Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Elastic agent should add an available rollback entry in update marker if rollback_window is set
4 participants