Skip to content

Conversation

@yuancu
Copy link
Collaborator

@yuancu yuancu commented Oct 17, 2025

Description

The timechart command with limit parameter was incorrectly handling non-cumulative aggregations (max, min, earliest, latest, etc) in the "OTHER" category. Instead of applying the proper aggregation function, it was incorrectly summing all values from categories that exceeded the limit.

Example:
source=index | timechart limit=1 span=1d max(severityNumber) by severityText

  • Expected: OTHER category shows max(severityNumber) = 23 (the actual maximum value)
  • Actual: OTHER category shows max(severityNumber) = 276 (sum of all values)

Root Cause

The timechart implementation made an incorrect assumption that all aggregations are cumulative (like sum/count). For non-cumulative functions like max, min, earliest, and latest, the "OTHER" category should apply the same aggregation function rather than summing the values.

Solution

  1. Enhanced aggregation handling: Added buildAggCall() method that properly maps aggregation functions:
  • MIN, EARLIEST → relBuilder.min()
  • MAX, LATEST → relBuilder.max()
  • AVG → relBuilder.avg()
  • Others (sum, count, etc.) → relBuilder.sum() (existing behavior)
  1. Updated top category selection: Modified sorting logic to handle min aggregations correctly (ascending sort for min, descending for others)
  2. Fixed OTHER category calculation: Replaced hardcoded sum aggregations with dynamic function selection based on the original aggregation type

Limitation:

  • It still does not handle all aggregation functions. E.g. TAKE, STDDEV_SAMP, although the use case of them in timechart is rare.
  • For avg, the result for other is not accurate enough. This is because it is aggregating on an aggregated results, but the count of the original rows in each category in not kept.

Related Issues

Resolves #4582

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@yuancu yuancu added the bug Something isn't working label Oct 17, 2025
Signed-off-by: Yuanchun Shen <[email protected]>
Copy link
Collaborator

@qianheng-aws qianheng-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please resolve the conflict.

context.relBuilder.sort(context.relBuilder.desc(context.relBuilder.field("grand_total")));
RexNode sortField = context.relBuilder.field("grand_total");
sortField =
aggFunction == BuiltinFunctionName.MIN ? sortField : context.relBuilder.desc(sortField);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason of handling MIN separately? Will it meet our expected behavior?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because limit is supposed to keep top categories, yet the expectation for top categories varies with the actual aggregation function -- for MIN and EARLIEST, the smallest ones are supposed to be of the top categories, while for most of the rest aggregations (SUM, MAX, AVG, COUNT, etc), the greatest ones are.

qianheng-aws
qianheng-aws previously approved these changes Oct 24, 2025

/** Helper method to get the function name for proper column naming */
private String getValueFunctionName(UnresolvedExpression aggregateFunction) {
private String getAggFieldAlias(UnresolvedExpression aggregateFunction) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getAggFieldAlias lacks precision. How about change to getMeasureAlias?

count(balance) as cnt

balance: aggregate Field
count(): aggregate function
count(balance): measure
cnt: alias of measure

Do not forget to update the java doc either.

Copy link
Collaborator Author

@yuancu yuancu Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out! I updated the name as getMetricAlias as metric conforms to how it is named in other parts of the codebase.

Signed-off-by: Yuanchun Shen <[email protected]>
… was introduced by other PRs that interferes the earliest function or the representation of timestamp

Signed-off-by: Yuanchun Shen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] timechart command returns incorrect results for non-cumulative aggregations

3 participants