-
Notifications
You must be signed in to change notification settings - Fork 475
[FLINK-37515] Basic support for Blue/Green deployments #969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
...tor-api/src/main/java/org/apache/flink/kubernetes/operator/api/bluegreen/DeploymentType.java
Show resolved
Hide resolved
...main/java/org/apache/flink/kubernetes/operator/api/status/FlinkBlueGreenDeploymentState.java
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
I am concerned about the number of TODOs. Things like dealing with errors and timing conditions should be in place in the initial drop of code - unless we can mark this capability as beta or the like. WDYT @gyfora ? |
@davidradl thanks for your review/comments. I've cleaned up all the outdated/invalid TODOs which pointed me 1 missing unit test, added. At this point the few remaining TODOs do not interfere with the main functionality and are not critical/open for discussion. |
… BASIC is supported).Tests to confirm the deployments are deleted accurately after the specified deletion delay.
👍 |
when this feature gonna be released? |
Hi @hansh0801, trying to release it ASAP but I also want to make sure the logic and contracts are as robust as possible to minimize changes later. Looks pretty stable in my current testing and I've reopened it for review. Thanks! |
cool ! and I'm also waiting for blue green phase 2 :) |
@hansh0801 indeed... Phase 2 will take a little bit longer due to its complexity but already working on it. |
...tor-api/src/main/java/org/apache/flink/kubernetes/operator/api/bluegreen/DeploymentType.java
Outdated
Show resolved
Hide resolved
...tor-api/src/main/java/org/apache/flink/kubernetes/operator/api/bluegreen/TransitionMode.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/flink/kubernetes/operator/api/spec/FlinkDeploymentTemplateSpec.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/flink/kubernetes/operator/api/status/FlinkBlueGreenDeploymentStatus.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/flink/kubernetes/operator/api/status/FlinkBlueGreenDeploymentStatus.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
if (deploymentStatus == null) { | ||
deploymentStatus = new FlinkBlueGreenDeploymentStatus(); | ||
return patchStatusUpdateControl( | ||
bgDeployment, | ||
deploymentStatus, | ||
FlinkBlueGreenDeploymentState.INITIALIZING_BLUE, | ||
null) | ||
.rescheduleAfter(100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BlueGreenDeployment could simply implement initStatus and this can be removed completely.
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
private boolean hasSpecChanged( | ||
FlinkBlueGreenDeploymentSpec newSpec, FlinkBlueGreenDeploymentStatus deploymentStatus) { | ||
|
||
String lastReconciledSpec = deploymentStatus.getLastReconciledSpec(); | ||
String newSpecSerialized = SpecUtils.serializeObject(newSpec, "spec"); | ||
|
||
// TODO: in FLIP-504 check here the TransitionMode has not been changed | ||
|
||
return !lastReconciledSpec.equals(newSpecSerialized); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to use the DiffBuilder/ existing utilities to get the type of the spec change here and make sure we don't trigger deployments on non-upgrade changes.
Things like operator config changes etc should not go through the Blue/Green flow I believe but simply applied on the currently active deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me get familiar with the DiffBuilder
and see how to incorporate it
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Show resolved
Hide resolved
TRANSITIONING_TO_BLUE, | ||
|
||
/** Identifies the system is transitioning from "Blue" to "Green". */ | ||
TRANSITIONING_TO_GREEN, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what state are we in during shutdown?
...tes-operator-api/src/main/java/org/apache/flink/kubernetes/operator/api/utils/SpecUtils.java
Outdated
Show resolved
Hide resolved
return objectMapper.writeValueAsString(wrapper); | ||
} catch (JsonProcessingException e) { | ||
throw new RuntimeException( | ||
"Could not serialize " + wrapperKey + ", this indicates a bug...", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should be include object.toString() in the error also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it could be done, I just kept the same existing pattern as the rest of the functions in this class.
Hi @schongloo @davidradl @gyfora , I hope you're doing well. I've been following this PR as blue-green deployments would solve a critical use case for our production needs. I noticed there have been no updates to this PR for a few weeks. Was wondering if there is anything blocking the release of this and if there is something myself or the OSS community could help with (testing, fixing any changes, etc). Thanks for all the work so far! 🙏 |
@maheepm-lyft , @schongloo is currently on holiday but once he is back we can try to merge this as soon as reasonably possible :) In the meantime you can help us by trying out the current version and by providing a code review for the PR, that can definitely speed things up. |
@schongloo can you please re-open this PR against the |
Great thanks! Taking us a bit of time to test this out in our stack, but we're working on it and I'll post any comments here. |
Done @gyfora ! |
What is the purpose of the change
This pull request adds basic support for Blue/Green deployments as outlined by FLIP-503 (https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=337677648).
Brief change log
FlinkBlueGreenDeploymentController
controller, with the capability of managing the lifecycle of these deployments.FlinkBlueGreenDeploymentController
will manage this CRD and hide from the user the details of the actual Blue/Green (Active/StandBy) jobs.FlinkDeployment
controller.Verifying this change
Does this pull request potentially affect one of the following parts:
CustomResourceDescriptors
: noDocumentation
docs/content/docs