-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing MSE result holder config to minimize rehashing for high cardinality group by #14981
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14981 +/- ##
============================================
+ Coverage 61.75% 63.69% +1.94%
- Complexity 207 1480 +1273
============================================
Files 2436 2713 +277
Lines 133233 152195 +18962
Branches 20636 23533 +2897
============================================
+ Hits 82274 96943 +14669
- Misses 44911 47947 +3036
- Partials 6048 7305 +1257
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
public static GroupIdGenerator getGroupIdGenerator(ColumnDataType[] keyTypes, int numKeyColumns, | ||
int numGroupsLimit, int maxInitialResultHolderCapacity) { | ||
// Initial capacity is one more than expected to avoid rehashing if container is full. | ||
int initialCapacity = 1 + Math.min(maxInitialResultHolderCapacity, numGroupsLimit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hash table resize would happen based on load factor though right? Adding 1 here might not do much. (default is usually 0.75)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, updated.
@@ -756,6 +757,8 @@ public static class Server { | |||
"pinot.server.query.executor.group.trim.size"; | |||
public static final String CONFIG_OF_QUERY_EXECUTOR_MAX_INITIAL_RESULT_HOLDER_CAPACITY = | |||
"pinot.server.query.executor.max.init.group.holder.capacity"; | |||
public static final String CONFIG_OF_QUERY_EXECUTOR_MSE_MAX_INITIAL_RESULT_HOLDER_CAPACITY = | |||
"pinot.server.query.executor.mse.max.init.group.holder.capacity"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could remove query.executor
fragment from this altogether. afaik query.executor refers to ServerQueryExecutorV1Impl and the corresponding interface, which are V1 Engine constructs. That would yield:
pinot.server.mse.max.init.group.holder.capacity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, updated.
@@ -62,6 +62,7 @@ public static class AggregateOptions { | |||
public static final String GROUP_TRIM_SIZE = "group_trim_size"; | |||
|
|||
public static final String MAX_INITIAL_RESULT_HOLDER_CAPACITY = "max_initial_result_holder_capacity"; | |||
public static final String MSE_MAX_INITIAL_RESULT_HOLDER_CAPACITY = "mse_max_initial_result_holder_capacity"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jackie-Jiang : do you have any recommendation for the hint name?
@@ -756,6 +757,8 @@ public static class Server { | |||
"pinot.server.query.executor.group.trim.size"; | |||
public static final String CONFIG_OF_QUERY_EXECUTOR_MAX_INITIAL_RESULT_HOLDER_CAPACITY = | |||
"pinot.server.query.executor.max.init.group.holder.capacity"; | |||
public static final String CONFIG_OF_QUERY_EXECUTOR_MSE_MAX_INITIAL_RESULT_HOLDER_CAPACITY = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove QUERY_EXECUTOR
from the var name too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
int maxInitialResultHolderCapacity = getMaxInitialResultHolderCapacity(opChainMetadata, nodeHint); | ||
Integer mseCapacity = getMSEMaxInitialResultHolderCapacity(opChainMetadata, nodeHint); | ||
if (mseCapacity != null) { | ||
maxInitialResultHolderCapacity = mseCapacity; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the behavior we are implementing is:
- By default use the previous behavior
- If a user has explicitly set a hint for the mse initial capacity, or set a server config, then use that.
Can you also call it out in the Issue Description? We'll have to update Pinot Docs too later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
int maxInitialResultHolderCapacity = getMaxInitialResultHolderCapacity(opChainMetadata, nodeHint); | ||
Integer mseCapacity = getMSEMaxInitialResultHolderCapacity(opChainMetadata, nodeHint); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: naming could be made slightly more precise (e.g. mseMaxInitialResultHolderCapacity
).
Also you could move this logic to a separate method in the class to keep this clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm. Was slightly confused by the fact that we have both query options as well as hints to control this config, but I see that other configs/hints also have that, so it's best to be consistent.
A new configuration
pinot.server.mse.max.init.group.holder.capacity
is introduced to control the size of result holders for MSE is necessary to avoid resizing and rehashing operations in use cases where grouping is needed on high-cardinality columns (e.g., UUIDs).It can also be set at the query level by using the query option
mse_max_initial_result_holder_capacity
.To preserve backward compatibility, if the aforementioned config is not set,
MultistagegroupByExecutor
will revert to the current behavior of reading the result holder size frompinot.server.query.executor.max.init.group.holder.capacity
.A simple query where it is necessary is
where a group by step occurs on
user_uuid
fortable_B
before the colocated join withtable_A
which has a high cardinality.More details in the following issue: #14685