-
Notifications
You must be signed in to change notification settings - Fork 869
feat: Add Reason Variable to Failovers #7451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
zawadzkidiana
wants to merge
7
commits into
cadence-workflow:master
Choose a base branch
from
zawadzkidiana:reasonUpdate
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+96
−11
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Diana Zawadzki <[email protected]>
Signed-off-by: Diana Zawadzki <[email protected]>
Signed-off-by: Diana Zawadzki <[email protected]>
- Add --reason flag to 'cadence domain failover' command with default value 'default maintenance' - Add --reason flag to 'cadence admin cluster failover start' command - Extend FailoverParams and FailoverActivityParams to propagate reason through workflow - Update FailoverActivity to use FailoverDomain API when reason is provided for proper tracking - Add Reason field to FailoverDomainRequest type definition - Maintain backward compatibility by falling back to UpdateDomain when no reason provided This allows operators to provide context for failover operations which will be stored in the failover history for better operational transparency and debugging. Signed-off-by: Diana Zawadzki <[email protected]>
- Add FailoverReason field to UpdateDomainRequest type - Update ToUpdateDomainRequest mapper to include reason field - Add Reason field to FailoverEvent struct for history storage - Update NewFailoverEvent to accept and store reason parameter - Pass reason through handleFailoverRequest to failover history - Update admin failover tests to expect reason in workflow input This completes the end-to-end flow for tracking failover reasons from CLI through to database storage in the failover history. Signed-off-by: Diana Zawadzki <[email protected]>
Regenerate .gen/go/shared/shared.go after IDL submodule update to include the reason field in FailoverDomainRequest from the updated thrift definition. Signed-off-by: Diana Zawadzki <[email protected]>
- Remove reason field from admin cluster failover workflow (as per reviewer suggestion) - Remove Reason from FailoverParams and FailoverActivityParams - Revert FailoverActivity to use UpdateDomain (original implementation) - Remove --reason flag from admin cluster failover start command - Change --reason default value from "default maintenance" to empty string - Update tests to remove reason from workflow input expectations Only the domain failover command now has the --reason flag, allowing users to optionally provide context without enforcing a default. Signed-off-by: Diana Zawadzki <[email protected]>
davidporter-id-au
approved these changes
Nov 18, 2025
timl3136
approved these changes
Nov 18, 2025
Member
timl3136
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, maybe we can make the pr title more descriptive.
Contributor
Author
|
Re-running 1 failed Golang integratiuon test with sqlite (pull_request); all other tests pass. |
Contributor
Author
|
Github outage is likely cause of sqlite integration error; will re-run test again after problem is resolved. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changed?
Why?
We needed better visibility into why failovers were happening. Right now when you look at failover history you can see when and where failovers happened, but not why. This makes it hard to debug issues or understand patterns. With this change, operators can provide context like "planned maintenance", "emergency DR", "testing", etc. and it gets stored with the failover event. Makes it way easier to go back and understand what was going on during production incidents.
How did you test it?
./cadence domain failover --helpThe flow from CLI to database is tested through the existing test suite. The reason goes: CLI flag → FailoverDomainRequest → UpdateDomainRequest → FailoverEvent → gets stored in domain data.
Potential risks
Pretty low risk overall since it's backward compatible. The reason field is optional in all the types so old clients can still talk to new servers and vice versa. If you don't provide a reason it just stays empty.
Main thing to watch would be if someone passes in a really long reason string - it goes into domain data which is stored as a map, so there might be size limits there. But that's more of an edge case than a real risk.
Also if someone is relying on the exact format of the FailoverEvent JSON in domain data, this adds a new field. But since it's optional and has the omitempty tag, it shouldn't break anything.
Release notes
Added optional --reason flag to domain failover command. This lets you specify why a failover is happening and the reason gets stored in the failover history. No migration or schema changes needed.
Documentation Changes
hould probably update the CLI docs to mention the new --reason flag and show some examples of how to use it. The flag shows up in --help but would be good to have it in the actual documentation too.