Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite FailureDetector interface and implementations to also work with the multi-stage engine #15005

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yashmayya
Copy link
Collaborator

  • The connection based failure detector was added in Add connection based FailureDetector #8491 and doesn't currently support the multi-stage engine.
  • This patch rewrites the FailureDetector interface and implementations so that it works with both the engines.
  • If a multi-stage query fails to be dispatched to a server due to connectivity issues, the broker should remove the server from its routing table to prevent further query failures.
  • A server will be removed from the broker routing table in case of failures in either QueryRouter (v1) or QueryDispatcher (v2). In the FailureDetector implementation that retries the connection with an exponential backoff, we'll only re-add a server to the routing table if both the connections succeed - the Netty channel used for v1 queries as well as the gRPC channel used for v2 queries.
  • The FailureDetector and listener based interface has been rewritten significantly in order to avoid multiple attempts to modify the broker routing table whenever a server is detected as healthy / unhealthy.
  • This patch also updated the single-stage engine's GrpcBrokerRequestHandler to make use of the FailureDetector.

@yashmayya yashmayya added enhancement multi-stage Related to the multi-stage query engine labels Feb 6, 2025
@codecov-commenter
Copy link

codecov-commenter commented Feb 6, 2025

Codecov Report

Attention: Patch coverage is 47.95918% with 51 lines in your changes missing coverage. Please review.

Project coverage is 63.68%. Comparing base (59551e4) to head (e8c1cfa).
Report is 1680 commits behind head on master.

Files with missing lines Patch % Lines
...roker/requesthandler/GrpcBrokerRequestHandler.java 0.00% 28 Missing ⚠️
.../pinot/query/service/dispatch/QueryDispatcher.java 50.00% 8 Missing ⚠️
...requesthandler/MultiStageBrokerRequestHandler.java 33.33% 6 Missing ⚠️
...thandler/SingleConnectionBrokerRequestHandler.java 28.57% 5 Missing ⚠️
...e/pinot/broker/broker/helix/BaseBrokerStarter.java 71.42% 2 Missing ⚠️
...che/pinot/broker/routing/BrokerRoutingManager.java 0.00% 1 Missing ⚠️
...pache/pinot/common/utils/grpc/GrpcQueryClient.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #15005      +/-   ##
============================================
+ Coverage     61.75%   63.68%   +1.93%     
- Complexity      207     1483    +1276     
============================================
  Files          2436     2716     +280     
  Lines        133233   152298   +19065     
  Branches      20636    23544    +2908     
============================================
+ Hits          82274    96988   +14714     
- Misses        44911    48013    +3102     
- Partials       6048     7297    +1249     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.65% <47.95%> (+1.94%) ⬆️
java-21 63.56% <47.95%> (+1.94%) ⬆️
skip-bytebuffers-false 63.68% <47.95%> (+1.93%) ⬆️
skip-bytebuffers-true 63.55% <47.95%> (+35.82%) ⬆️
temurin 63.68% <47.95%> (+1.93%) ⬆️
unittests 63.67% <47.95%> (+1.93%) ⬆️
unittests1 56.24% <80.43%> (+9.35%) ⬆️
unittests2 33.95% <17.34%> (+6.22%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +27 to +29
* <p>
* This class doesn't currently implement any additional logic over BaseExponentialBackoffRetryFailureDetector and is
* retained for backward compatibility.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought we could probably remove this class altogether since the class name is not exposed via the user configuration directly.

@yashmayya yashmayya force-pushed the failure-detector-msqe branch from d5ecadd to d38daa6 Compare February 6, 2025 17:20
@yashmayya yashmayya force-pushed the failure-detector-msqe branch from 67f10b0 to e8c1cfa Compare February 7, 2025 02:50
@yashmayya yashmayya marked this pull request as ready for review February 7, 2025 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement multi-stage Related to the multi-stage query engine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants