-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix(substrait): fix regressed edge case in renaming inner struct fields #15634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(substrait): fix regressed edge case in renaming inner struct fields #15634
Conversation
…and utf8view as logically equivalent
@@ -1061,7 +1061,7 @@ async fn roundtrip_literal_list() -> Result<()> { | |||
async fn roundtrip_literal_struct() -> Result<()> { | |||
let plan = generate_plan_from_sql( | |||
"SELECT STRUCT(1, true, CAST(NULL AS STRING)) FROM data", | |||
false, | |||
true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated, but it was passing with true so why not have that
@@ -1076,6 +1076,46 @@ async fn roundtrip_literal_struct() -> Result<()> { | |||
Ok(()) | |||
} | |||
|
|||
#[tokio::test] | |||
async fn roundtrip_literal_named_struct() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this uses a named_struct() func so it wasn't hitting the bug, but I kept it as an addition still
FYI @zhuqi-lucas @alamb @xudong963 from the earlier PR |
@@ -711,6 +711,9 @@ impl DFSchema { | |||
.zip(iter2) | |||
.all(|((t1, f1), (t2, f2))| t1 == t2 && Self::field_is_logically_equal(f1, f2)) | |||
} | |||
// Utf8 and Utf8View are logically equivalent | |||
(DataType::Utf8, DataType::Utf8View) => true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Blizzara for fix, this new relax looks good to me, and if we want more relax type, we can add later.
…ce, including ignoring decimal precision/scale
// This test aims to hit a case where the struct column itself has the expected name, but its | ||
// inner field needs to be renamed. | ||
let plan = generate_plan_from_sql( | ||
"SELECT CAST((STRUCT(1)) AS Struct<\"int_field\"Int>) AS 'Struct({c0:1})' FROM data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this query is converting String --> StringView
(this seems like a good test in my opinion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, there's no String -> StringView in this test. This is just the test the regression that happened in Substrait consumer, where we were no longer renaming the inner fields, as the "is the schema the same" check passed too easily.
The String -> StringView is tested by the slt tests added in #15239
Thanks @Blizzara and @zhuqi-lucas |
…ds (apache#15634) * add a failing test case for apache#15239 (comment) * fix invariants to use logical schema equality instead, and mark utf8 and utf8view as logically equivalent * fix logical equivalence to be strictly superset of semantic equivalence, including ignoring decimal precision/scale
Which issue does this PR close?
#15239 introduced a regression in Substrait consumer, where if a field was correctly named but contained struct fields which were not, those would not be renamed. See #15239 (comment)
However, the logic used in that PR feels wrong also on a more general level. The can_cast_types is a very general check, so has_equivalent_names_and_types should quite certainly not use it as a criteria.
Rationale for this change
What changes are included in this PR?
invariants
to uselogically_equivalent_names_and_types
instead, since that's meant to be a weaker comparisonThis does make the check in
invariants
tighter again. I'd argue that's a good thing, since the "can_cast_types" check is very relaxed and certainly doesn't fulfill the criteria in https://datafusion.apache.org/contributor-guide/specification/invariants.html#logical-schema-is-invariant-under-logical-optimization. That said, I'm not sure if the Utf8->Utf8View change fulfills that criteria either, but makes sense to me to allow it?Are these changes tested?
Added a new test to catch the regression, + existing tests for the invariants.
Are there any user-facing changes?