Skip to content

Make SUM and AVG Aggregate Type Coercion Explicit #7369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 22, 2023

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

Currently various accumulators contain internal logic to coerce inputs to supported types, this has a few issues:

  • The cast is hidden from the plan
    • This prevents the optimizer from seeing it, potentially resulting in redundant casts
    • Hides the cast from the user which makes it not immediately obvious what is occurring
  • Complicates specializing the accumulators as the input types aren't defined (Use Specialization Instead of ScalarValue Binary Operations #6842)

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Aug 22, 2023
@@ -1010,40 +1010,7 @@ fn aggregate_expressions(
| AggregateMode::SinglePartitioned => Ok(aggr_expr
.iter()
.map(|agg| {
let pre_cast_type = if let Some(Sum {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer need this custom logic, it is handled automatically by the optimizer

@alamb alamb changed the title Make Aggregate Type Coercion Explicit Make SUM and AVG Aggregate Type Coercion Explicit Aug 22, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tustvold

I think this logic was left over from an earlier time when DataFusion did the type coercion at physical plan time (rather than during logical planning)

I went through it carefully and it makes much more sense to me than what is on master

DataType::Float64 | DataType::Float32 => Ok(DataType::Float64),
DataType::Int64 => Ok(DataType::Int64),
DataType::UInt64 => Ok(DataType::UInt64),
DataType::Float64 => Ok(DataType::Float64),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looked like we lost support for Float32 -- but then I see the test below for it, so 👍 (presumably what happens is the arguments are coerced first and arg_type refers to the post coerced type)

@@ -70,7 +70,7 @@ fn subquery_filter_with_cast() -> Result<()> {
\n Inner Join: Filter: CAST(test.col_int32 AS Float64) > __scalar_sq_1.AVG(test.col_int32)\
\n TableScan: test projection=[col_int32]\
\n SubqueryAlias: __scalar_sq_1\
\n Aggregate: groupBy=[[]], aggr=[[AVG(test.col_int32)]]\
\n Aggregate: groupBy=[[]], aggr=[[AVG(CAST(test.col_int32 AS Float64))]]\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is actually nice to see the explicit cast I think

@tustvold tustvold merged commit 870857a into apache:main Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants