-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#314: Add support for sum of truncated values #324
Conversation
- Added the aggregatedTruncTotal Measure and the absAggregatedTruncTotal Measure. - Added the tests for these Measures.
JaCoCo model module code coverage report - scala 2.13.11
|
JaCoCo agent module code coverage report - scala 2.13.11
|
JaCoCo reader module code coverage report - scala 2.13.11
|
JaCoCo server module code coverage report - scala 2.13.11
|
@@ -44,6 +44,8 @@ object AtumMeasure { | |||
DistinctRecordCount.measureName, | |||
SumOfValuesOfColumn.measureName, | |||
AbsSumOfValuesOfColumn.measureName, | |||
SumOfTruncatedValuesOfColumn.measureName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val supportedMeasureNames is not used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have only added the names, it must be dead code possibly. I will comment out the val and rerun the tests.
@@ -117,6 +119,42 @@ object AtumMeasure { | |||
def apply(measuredCol: String): AbsSumOfValuesOfColumn = AbsSumOfValuesOfColumn(measureName, measuredCol) | |||
} | |||
|
|||
case class SumOfTruncatedValuesOfColumn private (measureName: String, measuredCol: String) extends AtumMeasure { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have you decided to use double casting approach instead of using standard functions from package org.apache.spark.sql to round values before summing them?
/**
* Returns the value of the column `e` rounded to 0 decimal places with HALF_UP round mode.
*
* @group math_funcs
* @since 1.5.0
*/
def round(e: Column): Column = round(e, 0)
/**
* Round the value of `e` to `scale` decimal places with HALF_UP round mode
* if `scale` is greater than or equal to 0 or at integral part when `scale` is less than 0.
*
* @group math_funcs
* @since 1.5.0
*/
def round(e: Column, scale: Int): Column = withExpr { Round(e.expr, Literal(scale)) }
/**
* Returns the value of the column `e` rounded to 0 decimal places with HALF_EVEN round mode.
*
* @group math_funcs
* @since 2.0.0
*/
def bround(e: Column): Column = bround(e, 0)
/**
* Round the value of `e` to `scale` decimal places with HALF_EVEN round mode
* if `scale` is greater than or equal to 0 or at integral part when `scale` is less than 0.
*
* @group math_funcs
* @since 2.0.0
*/
def bround(e: Column, scale: Int): Column = withExpr { BRound(e.expr, Literal(scale)) }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. It was mentioned in the issue that this method was used in ATUM but let me change it accordingly. Since I think this method you have mentioned is more correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The round function does not work with negative numbers, unfortunately. We need a truncation function that will simply remove the decimals. But I have found another way that also works to ensure proper functionality without resorting to a double cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, combination of floor and ceil should work fine
scala> df.show
+-----+
|value|
+-----+
| -1.1|
| 1.1|
| -1.7|
| 1.7|
| -0.8|
| 0.8|
+-----+
scala> df.select(when(col("value") >= 0, floor(col("value"))).otherwise(ceil(col("value")))).show
+-------------------------------------------------------------+
|CASE WHEN (value >= 0) THEN FLOOR(value) ELSE CEIL(value) END|
+-------------------------------------------------------------+
| -1|
| 1|
| -1|
| 1|
| 0|
| 0|
+-------------------------------------------------------------+
- Added the aggregatedTruncTotal Measure and the absAggregatedTruncTotal Measure. - Added the tests for these Measures. - Made amendments to the function to not include double casts.
} | ||
|
||
override def measuredColumns: Seq[String] = Seq(measuredCol) | ||
override val resultValueType: ResultValueType = ResultValueType.BigDecimalValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ResultValueType.LongValue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I like that. Good catch. Thank you.
- Added the aggregatedTruncTotal Measure and the absAggregatedTruncTotal Measure. - Added the tests for these Measures. - Made amendments to the function to not include double casts. - Changed the result from BigDecimal to LongValue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
-Add support for sum of truncated values.
-Added the
aggregatedTruncTotal
andabsAggregatedTruncTotal
Measures.Closes #314
Release notes:
-Added the
aggregatedTruncTotal
andabsAggregatedTruncTotal
Measures.-Added the tests for these Measures.