Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement leaderelector.Interface in terms of the new leader elector client #5499

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

twz123
Copy link
Member

@twz123 twz123 commented Jan 31, 2025

Description

Leaderelector.LeasePool and K0sControllersLeaseCounter now use the new leader elector client internally. Deprecate all old interface methods in favor of CurrentStatus, which reports the current leader status along with a channel that is closed when the status changes. Callers should then call CurrentStatus again to get the new status.

This way, the leader elector doesn't have to keep track of callback methods, callers can't block each other, and there is no need to "deregister" when callers are no longer interested in leader status changes.

The old API is fully emulated via the new API, and can be easily removed once it's no longer being called.

As a PoC, convert the applier manager to the new API as the first component.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

How Has This Been Tested?

  • Manual test
  • Auto test added

Checklist:

  • My code follows the style guidelines of this project
  • My commit messages are signed-off
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

This is a useful abstraction that runs a callback whose context gets
cancelled as soon as the lead gets lost.

Signed-off-by: Tom Wieczorek <[email protected]>
@twz123 twz123 force-pushed the lease-pool-via-new-client branch from 013e254 to ed5071e Compare January 31, 2025 19:19
@twz123 twz123 marked this pull request as ready for review January 31, 2025 19:52
@twz123 twz123 requested review from a team as code owners January 31, 2025 19:52
@twz123 twz123 requested review from kke and jnummelin January 31, 2025 19:52
Comment on lines +95 to +101
leaderelection.RunLeaderTasks(ctx, l.status.Peek, func(ctx context.Context) {
l.log.Info("acquired leader lease")
runCallbacks(l.acquiredLeaseCallbacks)
<-ctx.Done()
l.log.Infof("lost leader lease (%v)", context.Cause(ctx))
runCallbacks(l.lostLeaseCallbacks)
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't seem to get this 🤔

Why are we running both acquired and lost callback at the same round? Is the ctx done when we lose the leadership? If so, maybe consider naming it somehow to reflect that, currently it looks too much like a "normal" context and thus hard to figure out the intention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ctx done when we lose the leadership?

Yes. This is the doc string on the RunLeaderTasks function:

// Runs the provided tasks function when the lead is taken. It continuously
// monitors the leader election status using statusFunc. When the lead is taken,
// the tasks function is called with a context that is cancelled either when the
// lead has been lost or ctx is done. After the tasks function returns, the
// process is repeated until ctx is done.

That function tries to encapsulate a common pattern: "Do something as long as you're holding the lease". When the context is done, this means that you should stop, no matter why. Could be because the lease got lost, could be because the parent context got cancelled for whichever reason (timeout, application shutdown, ...).

If so, maybe consider naming it somehow to reflect that, currently it looks too much like a "normal" context and thus hard to figure out the intention.

This indeed is is a normal context, there's nothing special about it. RunLeaderTasks just "adds" another cancellation cause to it (loosing the lease). Pretty much the same way as signal.NotifyContext can be used to "add" SIGINT/SIGTERM and friends as cancellation causes to a context.

That being said, do you have any suggestions on the naming, or how to structure it differently? We could add more explanatory comments, of course, and improve the RunLeaderTasks docstring. On the other hand, this exact function is sort of a compatibility layer anyways, as the plan is to completely replace the callbacks in the first place. Maybe that's why it's looking a bit weird. The new way would be to use RunLeaderTasks directly in the components that need it. The applier manager, for example, does this now:

leaderelection.RunLeaderTasks(ctx, m.LeaderElector.CurrentStatus, func(ctx context.Context) {
	wait.UntilWithContext(ctx, m.runWatchers, time.Minute)
})

This is kinda succinct and doesn't look that awkward, I think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, do you have any suggestions on the naming, or how to structure it differently?

I think the structure is quite ok. Just bit hard, at least for me, to grasp the context usage. Since we're essentially using the context to tell that we've lost the lease, maybe something like leaderContext would make it bit more obvious.

@twz123 twz123 marked this pull request as draft February 13, 2025 10:41
Copy link
Contributor

The PR is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added Stale and removed Stale labels Mar 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants