You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cruise-control manages the distribution of replicas across brokers in a kafka cluster, according to proposals that it calculates for optimal resource utilization.
cruise-control also has the concept of "anomalies", which it seeks to remediate by specific manipulations of the proposals that it generates.
Illustrative anomalies are disk failure, broker failure, and goal violation.
Further, within goal violation, cruise-control seeks to detect brokers that are "slow", which is an additional acknowledgement of hardware-reality impinging on the abstraction of kafka brokers with perfectly idealized resources.
Finally, cruise-control is taking steps toward adding and removing kafka brokers from the cluster, as-needed to correctly provision the cluster (#1494)
Problem
Currently, while cruise-control may put brokers in "timeout" for a variety of performance reasons, it has no power to eject them from the kafka cluster.
Proposed Solution
cruise-control should consider a "broker-ejection interface", similar to the provisioner interface under current development.
Specifically, cruise-control should allow users to supply custom classes that define how cruise-control can effectuate the removal of a host from the cluster.
Since it is likely that these exact steps will differ wildly across kafka deployments, this is probably as specific as the interface can get without being inapplicable for some deployments.
Of course, for deployments in standardized platforms like Azure, these custom classes may only need to be implemented once, and can be widely shared among similar users.
For each code path that cruise-control traverses that would result in the automated removal of all replicas from a broker (for whatever reason), users should be able to toggle whether or not that path would call the broker-ejection interface.
This allows for maximum flexibility, since some deployments may want to keep brokers in the cluster under all circumstances, others may want always to eject problem brokers, and still others will be somewhere in between.
Probably, for maximum safety, cruise-control should call the ejection hook only after all replicas have been removed from the targeted brokers. Conceivably, users may want to eject prior to evacuation, but this seems unsafe to me, and therefore unlikely.
Finally, it's an open question to me whether cruise-control should require the interface to return an affirmative ejection-succeeded or ejection-failed signal. This would be nice in theory, but may be complicated in practice to require many disparate systems to now additionally check back in with cruise-control.
The text was updated successfully, but these errors were encountered:
@mgrubent Thanks for creating this issue and providing a detailed explanation!
I feel that we can make such an interface even more generic to support (1) removal, (2) addition, and (3) swap (when applicable) of resources from/to/in the cluster. Resources may include, but are not limited to
Can I ask the status of this issue? Is this issue handled by #1710?
Probably, for maximum safety, cruise-control should call the ejection hook only after all replicas have been removed from the targeted brokers. Conceivably, users may want to eject prior to evacuation, but this seems unsafe to me, and therefore unlikely.
Background
cruise-control
manages the distribution of replicas across brokers in a kafka cluster, according to proposals that it calculates for optimal resource utilization.cruise-control
also has the concept of "anomalies", which it seeks to remediate by specific manipulations of the proposals that it generates.Illustrative anomalies are disk failure, broker failure, and goal violation.
Further, within goal violation,
cruise-control
seeks to detect brokers that are "slow", which is an additional acknowledgement of hardware-reality impinging on the abstraction of kafka brokers with perfectly idealized resources.Finally,
cruise-control
is taking steps toward adding and removing kafka brokers from the cluster, as-needed to correctly provision the cluster (#1494)Problem
Currently, while
cruise-control
may put brokers in "timeout" for a variety of performance reasons, it has no power to eject them from the kafka cluster.Proposed Solution
cruise-control
should consider a "broker-ejection interface", similar to the provisioner interface under current development.Specifically,
cruise-control
should allow users to supply custom classes that define how cruise-control can effectuate the removal of a host from the cluster.Since it is likely that these exact steps will differ wildly across kafka deployments, this is probably as specific as the interface can get without being inapplicable for some deployments.
Of course, for deployments in standardized platforms like Azure, these custom classes may only need to be implemented once, and can be widely shared among similar users.
For each code path that
cruise-control
traverses that would result in the automated removal of all replicas from a broker (for whatever reason), users should be able to toggle whether or not that path would call the broker-ejection interface.This allows for maximum flexibility, since some deployments may want to keep brokers in the cluster under all circumstances, others may want always to eject problem brokers, and still others will be somewhere in between.
Probably, for maximum safety,
cruise-control
should call the ejection hook only after all replicas have been removed from the targeted brokers. Conceivably, users may want to eject prior to evacuation, but this seems unsafe to me, and therefore unlikely.Finally, it's an open question to me whether
cruise-control
should require the interface to return an affirmative ejection-succeeded or ejection-failed signal. This would be nice in theory, but may be complicated in practice to require many disparate systems to now additionally check back in withcruise-control
.The text was updated successfully, but these errors were encountered: