Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#314: Add support for sum of truncated values #324

Merged
merged 4 commits into from
Mar 12, 2025

Conversation

ABLL526
Copy link
Contributor

@ABLL526 ABLL526 commented Mar 4, 2025

-Add support for sum of truncated values.
-Added the aggregatedTruncTotal and absAggregatedTruncTotal Measures.

Closes #314

Release notes:
-Added the aggregatedTruncTotal and absAggregatedTruncTotal Measures.
-Added the tests for these Measures.

- Added the aggregatedTruncTotal Measure and the absAggregatedTruncTotal Measure.
- Added the tests for these Measures.
@ABLL526 ABLL526 added enhancement New feature or request Agent Issues touching the agent part of the project labels Mar 4, 2025
@ABLL526 ABLL526 self-assigned this Mar 4, 2025
@ABLL526 ABLL526 linked an issue Mar 4, 2025 that may be closed by this pull request
@ABLL526 ABLL526 linked an issue Mar 4, 2025 that may be closed by this pull request
Copy link

github-actions bot commented Mar 4, 2025

JaCoCo model module code coverage report - scala 2.13.11

Overall Project 56.51% 🍏

There is no coverage information present for the Files changed

Copy link

github-actions bot commented Mar 4, 2025

JaCoCo agent module code coverage report - scala 2.13.11

Overall Project 78.98% -8.83% 🍏
Files changed 61.54%

File Coverage
MeasuresBuilder.scala 100% 🍏
Measure.scala 88.17% -31.12%

Copy link

github-actions bot commented Mar 4, 2025

JaCoCo reader module code coverage report - scala 2.13.11

Overall Project 95.16% 🍏

There is no coverage information present for the Files changed

Copy link

github-actions bot commented Mar 4, 2025

JaCoCo server module code coverage report - scala 2.13.11

Overall Project 68.39% 🍏

There is no coverage information present for the Files changed

@@ -44,6 +44,8 @@ object AtumMeasure {
DistinctRecordCount.measureName,
SumOfValuesOfColumn.measureName,
AbsSumOfValuesOfColumn.measureName,
SumOfTruncatedValuesOfColumn.measureName,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val supportedMeasureNames is not used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only added the names, it must be dead code possibly. I will comment out the val and rerun the tests.

@@ -117,6 +119,42 @@ object AtumMeasure {
def apply(measuredCol: String): AbsSumOfValuesOfColumn = AbsSumOfValuesOfColumn(measureName, measuredCol)
}

case class SumOfTruncatedValuesOfColumn private (measureName: String, measuredCol: String) extends AtumMeasure {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have you decided to use double casting approach instead of using standard functions from package org.apache.spark.sql to round values before summing them?

  /**
   * Returns the value of the column `e` rounded to 0 decimal places with HALF_UP round mode.
   *
   * @group math_funcs
   * @since 1.5.0
   */
  def round(e: Column): Column = round(e, 0)

  /**
   * Round the value of `e` to `scale` decimal places with HALF_UP round mode
   * if `scale` is greater than or equal to 0 or at integral part when `scale` is less than 0.
   *
   * @group math_funcs
   * @since 1.5.0
   */
  def round(e: Column, scale: Int): Column = withExpr { Round(e.expr, Literal(scale)) }

  /**
   * Returns the value of the column `e` rounded to 0 decimal places with HALF_EVEN round mode.
   *
   * @group math_funcs
   * @since 2.0.0
   */
  def bround(e: Column): Column = bround(e, 0)

  /**
   * Round the value of `e` to `scale` decimal places with HALF_EVEN round mode
   * if `scale` is greater than or equal to 0 or at integral part when `scale` is less than 0.
   *
   * @group math_funcs
   * @since 2.0.0
   */
  def bround(e: Column, scale: Int): Column = withExpr { BRound(e.expr, Literal(scale)) }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. It was mentioned in the issue that this method was used in ATUM but let me change it accordingly. Since I think this method you have mentioned is more correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The round function does not work with negative numbers, unfortunately. We need a truncation function that will simply remove the decimals. But I have found another way that also works to ensure proper functionality without resorting to a double cast.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, combination of floor and ceil should work fine

scala> df.show
+-----+
|value|
+-----+
| -1.1|
|  1.1|
| -1.7|
|  1.7|
| -0.8|
|  0.8|
+-----+


scala> df.select(when(col("value") >= 0, floor(col("value"))).otherwise(ceil(col("value")))).show
+-------------------------------------------------------------+
|CASE WHEN (value >= 0) THEN FLOOR(value) ELSE CEIL(value) END|
+-------------------------------------------------------------+
|                                                           -1|
|                                                            1|
|                                                           -1|
|                                                            1|
|                                                            0|
|                                                            0|
+-------------------------------------------------------------+

- Added the aggregatedTruncTotal Measure and the absAggregatedTruncTotal Measure.
- Added the tests for these Measures.
- Made amendments to the function to not include double casts.
}

override def measuredColumns: Seq[String] = Seq(measuredCol)
override val resultValueType: ResultValueType = ResultValueType.BigDecimalValue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResultValueType.LongValue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like that. Good catch. Thank you.

- Added the aggregatedTruncTotal Measure and the absAggregatedTruncTotal Measure.
- Added the tests for these Measures.
- Made amendments to the function to not include double casts.
- Changed the result from BigDecimal to LongValue.
Copy link
Collaborator

@salamonpavel salamonpavel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ABLL526 ABLL526 merged commit 07e4b4f into master Mar 12, 2025
9 checks passed
@ABLL526 ABLL526 deleted the 314-add-support-for-sum-of-truncated-values branch March 12, 2025 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Agent Issues touching the agent part of the project enhancement New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Add support for sum of truncated values
2 participants