Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing MSE result holder config to minimize rehashing for high cardinality group by #14981

Merged
merged 4 commits into from
Feb 6, 2025

Conversation

shauryachats
Copy link
Contributor

@shauryachats shauryachats commented Feb 3, 2025

A new configuration pinot.server.mse.max.init.group.holder.capacity is introduced to control the size of result holders for MSE is necessary to avoid resizing and rehashing operations in use cases where grouping is needed on high-cardinality columns (e.g., UUIDs).
It can also be set at the query level by using the query option mse_max_initial_result_holder_capacity.

To preserve backward compatibility, if the aforementioned config is not set, MultistagegroupByExecutor will revert to the current behavior of reading the result holder size from pinot.server.query.executor.max.init.group.holder.capacity.

A simple query where it is necessary is

SELECT
count(*)
FROM
  table_A
WHERE 
(
    user_uuid NOT IN (
      SELECT
        user_uuid
      FROM
        table_B
    )
  )
LIMIT
  100 option(useMultistageEngine=true, timeoutMs=120000, useColocatedJoin = true, maxRowsInJoin = 40000000)

where a group by step occurs on user_uuid for table_B before the colocated join with table_A which has a high cardinality.

More details in the following issue: #14685

@codecov-commenter
Copy link

codecov-commenter commented Feb 3, 2025

Codecov Report

Attention: Patch coverage is 73.91304% with 12 lines in your changes missing coverage. Please review.

Project coverage is 63.69%. Comparing base (59551e4) to head (0b663d0).
Report is 1677 commits behind head on master.

Files with missing lines Patch % Lines
...va/org/apache/pinot/query/runtime/QueryRunner.java 50.00% 2 Missing and 3 partials ⚠️
...ry/runtime/operator/MultistageGroupByExecutor.java 66.66% 1 Missing and 3 partials ⚠️
...e/operator/groupby/OneLongKeyGroupIdGenerator.java 0.00% 2 Missing ⚠️
...time/operator/groupby/GroupIdGeneratorFactory.java 87.50% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14981      +/-   ##
============================================
+ Coverage     61.75%   63.69%   +1.94%     
- Complexity      207     1480    +1273     
============================================
  Files          2436     2713     +277     
  Lines        133233   152195   +18962     
  Branches      20636    23533    +2897     
============================================
+ Hits          82274    96943   +14669     
- Misses        44911    47947    +3036     
- Partials       6048     7305    +1257     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.64% <73.91%> (+1.93%) ⬆️
java-21 63.58% <73.91%> (+1.95%) ⬆️
skip-bytebuffers-false 63.68% <73.91%> (+1.93%) ⬆️
skip-bytebuffers-true 63.54% <73.91%> (+35.81%) ⬆️
temurin 63.69% <73.91%> (+1.94%) ⬆️
unittests 63.69% <73.91%> (+1.94%) ⬆️
unittests1 56.22% <73.91%> (+9.33%) ⬆️
unittests2 34.02% <0.00%> (+6.29%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

public static GroupIdGenerator getGroupIdGenerator(ColumnDataType[] keyTypes, int numKeyColumns,
int numGroupsLimit, int maxInitialResultHolderCapacity) {
// Initial capacity is one more than expected to avoid rehashing if container is full.
int initialCapacity = 1 + Math.min(maxInitialResultHolderCapacity, numGroupsLimit);
Copy link
Contributor

@ankitsultana ankitsultana Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash table resize would happen based on load factor though right? Adding 1 here might not do much. (default is usually 0.75)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, updated.

@@ -756,6 +757,8 @@ public static class Server {
"pinot.server.query.executor.group.trim.size";
public static final String CONFIG_OF_QUERY_EXECUTOR_MAX_INITIAL_RESULT_HOLDER_CAPACITY =
"pinot.server.query.executor.max.init.group.holder.capacity";
public static final String CONFIG_OF_QUERY_EXECUTOR_MSE_MAX_INITIAL_RESULT_HOLDER_CAPACITY =
"pinot.server.query.executor.mse.max.init.group.holder.capacity";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could remove query.executor fragment from this altogether. afaik query.executor refers to ServerQueryExecutorV1Impl and the corresponding interface, which are V1 Engine constructs. That would yield:

pinot.server.mse.max.init.group.holder.capacity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, updated.

@@ -62,6 +62,7 @@ public static class AggregateOptions {
public static final String GROUP_TRIM_SIZE = "group_trim_size";

public static final String MAX_INITIAL_RESULT_HOLDER_CAPACITY = "max_initial_result_holder_capacity";
public static final String MSE_MAX_INITIAL_RESULT_HOLDER_CAPACITY = "mse_max_initial_result_holder_capacity";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jackie-Jiang : do you have any recommendation for the hint name?

@@ -756,6 +757,8 @@ public static class Server {
"pinot.server.query.executor.group.trim.size";
public static final String CONFIG_OF_QUERY_EXECUTOR_MAX_INITIAL_RESULT_HOLDER_CAPACITY =
"pinot.server.query.executor.max.init.group.holder.capacity";
public static final String CONFIG_OF_QUERY_EXECUTOR_MSE_MAX_INITIAL_RESULT_HOLDER_CAPACITY =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove QUERY_EXECUTOR from the var name too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

int maxInitialResultHolderCapacity = getMaxInitialResultHolderCapacity(opChainMetadata, nodeHint);
Integer mseCapacity = getMSEMaxInitialResultHolderCapacity(opChainMetadata, nodeHint);
if (mseCapacity != null) {
maxInitialResultHolderCapacity = mseCapacity;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the behavior we are implementing is:

  1. By default use the previous behavior
  2. If a user has explicitly set a hint for the mse initial capacity, or set a server config, then use that.

Can you also call it out in the Issue Description? We'll have to update Pinot Docs too later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

int maxInitialResultHolderCapacity = getMaxInitialResultHolderCapacity(opChainMetadata, nodeHint);
Integer mseCapacity = getMSEMaxInitialResultHolderCapacity(opChainMetadata, nodeHint);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: naming could be made slightly more precise (e.g. mseMaxInitialResultHolderCapacity).

Also you could move this logic to a separate method in the class to keep this clean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

Copy link
Contributor

@ankitsultana ankitsultana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. Was slightly confused by the fact that we have both query options as well as hints to control this config, but I see that other configs/hints also have that, so it's best to be consistent.

@ankitsultana ankitsultana merged commit 465c811 into apache:master Feb 6, 2025
21 checks passed
@shauryachats shauryachats deleted the group_id_gen branch February 6, 2025 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants