Make `SUM` and `AVG` Aggregate Type Coercion Explicit #7369

tustvold · 2023-08-22T11:34:19Z

Which issue does this PR close?

Closes #.

Rationale for this change

Currently various accumulators contain internal logic to coerce inputs to supported types, this has a few issues:

The cast is hidden from the plan
- This prevents the optimizer from seeing it, potentially resulting in redundant casts
- Hides the cast from the user which makes it not immediately obvious what is occurring
Complicates specializing the accumulators as the input types aren't defined (Use Specialization Instead of ScalarValue Binary Operations #6842)

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

tustvold · 2023-08-22T11:45:15Z

datafusion/core/src/physical_plan/aggregates/mod.rs

@@ -1010,40 +1010,7 @@ fn aggregate_expressions(
        | AggregateMode::SinglePartitioned => Ok(aggr_expr
            .iter()
            .map(|agg| {
-                let pre_cast_type = if let Some(Sum {


We no longer need this custom logic, it is handled automatically by the optimizer

alamb

Thank you @tustvold

I think this logic was left over from an earlier time when DataFusion did the type coercion at physical plan time (rather than during logical planning)

I went through it carefully and it makes much more sense to me than what is on master

alamb · 2023-08-22T13:26:44Z

datafusion/expr/src/type_coercion/aggregates.rs

-        DataType::Float64 | DataType::Float32 => Ok(DataType::Float64),
+        DataType::Int64 => Ok(DataType::Int64),
+        DataType::UInt64 => Ok(DataType::UInt64),
+        DataType::Float64 => Ok(DataType::Float64),


It looked like we lost support for Float32 -- but then I see the test below for it, so 👍 (presumably what happens is the arguments are coerced first and arg_type refers to the post coerced type)

alamb · 2023-08-22T13:30:21Z

datafusion/optimizer/tests/optimizer_integration.rs

@@ -70,7 +70,7 @@ fn subquery_filter_with_cast() -> Result<()> {
    \n  Inner Join:  Filter: CAST(test.col_int32 AS Float64) > __scalar_sq_1.AVG(test.col_int32)\
    \n    TableScan: test projection=[col_int32]\
    \n    SubqueryAlias: __scalar_sq_1\
-    \n      Aggregate: groupBy=[[]], aggr=[[AVG(test.col_int32)]]\
+    \n      Aggregate: groupBy=[[]], aggr=[[AVG(CAST(test.col_int32 AS Float64))]]\


it is actually nice to see the explicit cast I think

Make Aggregate Type Coercion Explicit

82636e3

github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Aug 22, 2023

tustvold commented Aug 22, 2023

View reviewed changes

Clippy

5191230

tustvold mentioned this pull request Aug 22, 2023

Specialize Avg and Sum accumulators (#6842) #7358

Merged

alamb changed the title ~~Make Aggregate Type Coercion Explicit~~ Make SUM and AVG Aggregate Type Coercion Explicit Aug 22, 2023

alamb approved these changes Aug 22, 2023

View reviewed changes

tustvold merged commit 870857a into apache:main Aug 22, 2023

viirya mentioned this pull request Aug 22, 2023

Remove redundant type check in Avg #7374

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make `SUM` and `AVG` Aggregate Type Coercion Explicit #7369

Make `SUM` and `AVG` Aggregate Type Coercion Explicit #7369

Uh oh!

tustvold commented Aug 22, 2023

Uh oh!

tustvold Aug 22, 2023

Uh oh!

alamb left a comment

Uh oh!

alamb Aug 22, 2023

Uh oh!

alamb Aug 22, 2023

Uh oh!

Uh oh!

Make SUM and AVG Aggregate Type Coercion Explicit #7369

Make SUM and AVG Aggregate Type Coercion Explicit #7369

Uh oh!

Conversation

tustvold commented Aug 22, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

tustvold Aug 22, 2023

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Aug 22, 2023

Choose a reason for hiding this comment

Uh oh!

alamb Aug 22, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Make `SUM` and `AVG` Aggregate Type Coercion Explicit #7369

Make `SUM` and `AVG` Aggregate Type Coercion Explicit #7369