-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-51119][SQL][FOLLOW-UP] Add fallback to ResolveDefaultColumnsUtil existenceDefaultValues #49962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…il.existenceDefaultValues
6f13efb
to
64b62ca
Compare
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
Outdated
Show resolved
Hide resolved
log"with ${MDC(COLUMN_DATA_TYPE_SOURCE, field.dataType)}, " + | ||
log"falling back to full analysis.") | ||
|
||
field.getExistenceDefaultValue().map { text: String => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just the passed in defaultSQL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I guess I can re-use it or re-derive the redundant argument, i just changed to re-derive it as its a bit easier to get the method signature, but lmk if you prefer the first.
I re-ran the failing test and its green on the build link, but somehow still shows red on the PR. |
s" field name: ${field.name}, value: $defaultSQL") | ||
} | ||
// sanity check | ||
if (!literal.resolved) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think def analyze
already did this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea the original code for resolving existenceDefaultValues, which called analyze(), went through this check. It's even indicated as extra in the comments.
So I was just keeping the original behavior, do you suggest to remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, lets me know if its better
thanks, merging to master/4.0! |
…il existenceDefaultValues ### What changes were proposed in this pull request? The original change in #49840 was too optimistic and assumes that all column EXISTS_DEFAULT are already resolved and column folded. However, if there is bad EXISTS_DEFAULT metadata (an unresolved expression is persisted) it will break. Add fallback to use the original logic in that case. ### Why are the changes needed? There are some cases where bad EXISTS_DEFAULT metadata is persisted by external catalogs, due to some bugs such as #49942 or other problems. ### Does this PR introduce _any_ user-facing change? No, it should handle bad metadata better. ### How was this patch tested? Add unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #49962 from szehon-ho/SPARK-51119-follow-2. Authored-by: Szehon Ho <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 4ffc398) Signed-off-by: Wenchen Fan <[email protected]>
…il existenceDefaultValues ### What changes were proposed in this pull request? The original change in apache#49840 was too optimistic and assumes that all column EXISTS_DEFAULT are already resolved and column folded. However, if there is bad EXISTS_DEFAULT metadata (an unresolved expression is persisted) it will break. Add fallback to use the original logic in that case. ### Why are the changes needed? There are some cases where bad EXISTS_DEFAULT metadata is persisted by external catalogs, due to some bugs such as apache#49942 or other problems. ### Does this PR introduce _any_ user-facing change? No, it should handle bad metadata better. ### How was this patch tested? Add unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#49962 from szehon-ho/SPARK-51119-follow-2. Authored-by: Szehon Ho <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…orrupt exists_default value ### What changes were proposed in this pull request? Add another fallback for broken (non-resolved) exists_default values for SPARK-51119 original fix. ### Why are the changes needed? #49962 added a fallback in case there were already broken (ie, non-resolved) persisted default values in catalogs. A broken one is something like 'current_database, current_user, current_timestamp' , these are non-deterministic and will bring wrong results in EXISTS_DEFAULT, where user expects the value resolved when they set the default. However, this fallback missed one case when the current_xxx is in a cast. This fixes it. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add to existing unit test in StructTypeSuite ### Was this patch authored or co-authored using generative AI tooling? No Closes #50685 from szehon-ho/SPARK-51119-follow-3. Authored-by: Szehon Ho <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…orrupt exists_default value ### What changes were proposed in this pull request? Add another fallback for broken (non-resolved) exists_default values for SPARK-51119 original fix. ### Why are the changes needed? #49962 added a fallback in case there were already broken (ie, non-resolved) persisted default values in catalogs. A broken one is something like 'current_database, current_user, current_timestamp' , these are non-deterministic and will bring wrong results in EXISTS_DEFAULT, where user expects the value resolved when they set the default. However, this fallback missed one case when the current_xxx is in a cast. This fixes it. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add to existing unit test in StructTypeSuite ### Was this patch authored or co-authored using generative AI tooling? No Closes #50685 from szehon-ho/SPARK-51119-follow-3. Authored-by: Szehon Ho <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 516859f) Signed-off-by: Wenchen Fan <[email protected]>
…orrupt exists_default value ### What changes were proposed in this pull request? Add another fallback for broken (non-resolved) exists_default values for SPARK-51119 original fix. ### Why are the changes needed? apache#49962 added a fallback in case there were already broken (ie, non-resolved) persisted default values in catalogs. A broken one is something like 'current_database, current_user, current_timestamp' , these are non-deterministic and will bring wrong results in EXISTS_DEFAULT, where user expects the value resolved when they set the default. However, this fallback missed one case when the current_xxx is in a cast. This fixes it. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add to existing unit test in StructTypeSuite ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#50685 from szehon-ho/SPARK-51119-follow-3. Authored-by: Szehon Ho <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
The original change in #49840 was too optimistic and assumes that all column EXISTS_DEFAULT are already resolved and column folded. However, if there is bad EXISTS_DEFAULT metadata (an unresolved expression is persisted) it will break. Add fallback to use the original logic in that case.
Why are the changes needed?
There are some cases where bad EXISTS_DEFAULT metadata is persisted by external catalogs, due to some bugs such as #49942 or other problems.
Does this PR introduce any user-facing change?
No, it should handle bad metadata better.
How was this patch tested?
Add unit test
Was this patch authored or co-authored using generative AI tooling?
No