Skip to content

Conversation

@davinci26
Copy link
Contributor

@davinci26 davinci26 commented Nov 1, 2025

Implements slow start functionality as described in gRFC A100

I still need to do some work on tests and do an e2e experiment with xDS to ensure that things are working as expected, but the core implementation is in a state that can be reviewed.

At this phase I'm mainly looking for feedback on the overall approach and design and in the test code because I am worried that I might be missing some test utility that would make testing with time easier.

Pending Items before this is ready to be merged:

  • Add more unit tests for slow start config cases
  • Manually verify with an xDS experiment that slow start is working as expected

RELEASE NOTES:

  • weightedroundrobin: Implements slow start functionality as described in gRFC A100

Implements slow start functionality as described in
[gRFC A100](grpc/proposal#498)

I still need to do some work on tests and do an e2e experiment with xDS to
ensure that things are working as expected, but the core implementation is
in a state that can be reviewed.

At this phase I'm mainly looking for feedback on the overall approach and design and
in the test code because I am worried that I might be missing some test utility that would
make testing with `time` easier.

Pending Items before this is ready to be merged:
- [ ] Add more unit tests for slow start config cases
- [ ] Manually verify with an xDS experiment that slow start is working as expected

Release Notes
* weightedroundrobin: Implements slow start functionality as described in
  [gRFC A100](grpc/proposal#498)

Signed-off-by: sotiris <[email protected]>
@codecov
Copy link

codecov bot commented Nov 1, 2025

Codecov Report

❌ Patch coverage is 64.28571% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.32%. Comparing base (363018c) to head (e8884a6).
⚠️ Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
balancer/weightedroundrobin/balancer.go 72.97% 5 Missing and 5 partials ⚠️
...xds/xdsclient/xdslbregistry/converter/converter.go 0.00% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8689      +/-   ##
==========================================
- Coverage   83.43%   83.32%   -0.12%     
==========================================
  Files         415      416       +1     
  Lines       32195    32342     +147     
==========================================
+ Hits        26863    26949      +86     
- Misses       3980     4016      +36     
- Partials     1352     1377      +25     
Files with missing lines Coverage Δ
...xds/xdsclient/xdslbregistry/converter/converter.go 70.00% <0.00%> (-3.34%) ⬇️
balancer/weightedroundrobin/balancer.go 83.94% <72.97%> (-1.61%) ⬇️

... and 34 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

go.mod Outdated
github.com/envoyproxy/go-control-plane/envoy v1.35.0
github.com/cncf/xds/go v0.0.0-20251014123835-2ee22ca58382
github.com/envoyproxy/go-control-plane v0.13.4
github.com/envoyproxy/go-control-plane/envoy v1.35.1-0.20251015221300-4138018a492b
Copy link
Contributor Author

@davinci26 davinci26 Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports a commit because the proto is not in a release yet, validate this is the correct version for all deps

Signed-off-by: sotiris <[email protected]>
Signed-off-by: sotiris <[email protected]>
@davinci26
Copy link
Contributor Author

Don't want to commit this code because it introduces a new dependencies but wanted to make sure the aggression math is reasonable so I plotted them over time (https://gist.github.com/davinci26/d29bf9633fe9afd3b4164656a8013eb4)

slow_start_aggression_plot

(Note the flatline at start is due to math.Max(timeSinceReady.Seconds(), 1.0))

Signed-off-by: sotiris <[email protected]>
Signed-off-by: sotiris <[email protected]>
Signed-off-by: sotiris <[email protected]>
@davinci26 davinci26 changed the title weightedroundrobin: Implements slow start [WiP] weightedroundrobin: Implements Slow Start (gRFC A100) [WiP] Nov 3, 2025
if slowStartConfigCfg := cswrrProto.GetSlowStartConfig(); slowStartConfigCfg != nil {
wrrLBCfg.SlowStartConfig = &weightedroundrobin.SlowStartConfig{
SlowStartWindow: internalserviceconfig.Duration(slowStartConfigCfg.GetSlowStartWindow().AsDuration()),
// SlowStartConfig uses a runtime value in the proto definition so we need to get either the default value or the runtime value
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to call some attention here since this is a bit of subtle point, where I made a decision but maybe its worth clarifying in the gRFC

Signed-off-by: sotiris <[email protected]>
ew.stopORCAListener()
}
ew.mu.Lock()
ew.readySince = time.Time{} // Reset readySince when going non-READY.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation should be adjusted based on the convo grpc/proposal#498 (comment)

@easwars easwars self-assigned this Nov 13, 2025
@easwars easwars added the Type: Feature New features or improvements in behavior label Nov 13, 2025
@easwars
Copy link
Contributor

easwars commented Nov 13, 2025

Apologies for the delay in reviewing this PR. I'm starting on this today and hopefully will have comments soon. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Type: Feature New features or improvements in behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants