-
Notifications
You must be signed in to change notification settings - Fork 28.8k
[SPARK-53527][SQL] Improve fallback of analyzeExistenceDefaultValue #52274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…or parsing corrupt exists_default value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @szehon-ho and @dtenedor .
SPARK-51119 is a part of Apache Spark 4.0.0. We cannot make a follow-up for the released JIRA issue because the Fixed Version of this PR should be 4.1.0 and 4.0.2.
Please file a new JIRA issue to proceed this kind of bug fix.
BTW, it would be great if you can prefer a new JIRA issue instead of too many follow-ups like SPARK-51119.
|
yes sure! I made a new JIRA: https://issues.apache.org/jira/browse/SPARK-53527 , thanks for the suggestion |
thanks, merging to master/4.0! |
#49962 added a fallback in case there were already broken (ie, non-resolved) persisted default values in catalogs. A broken one is something like 'current_database, current_user, current_timestamp' , these are non-deterministic and will bring wrong results in EXISTS_DEFAULT, where user expects the value resolved when they set the default. Add yet another fallback for broken default default value, in this case one where there are nested function calls. Take the case where the EXISTS_DEFAULT is : ```CONCAT(YEAR(CURRENT_DATE), LPAD(WEEKOFYEAR(CURRENT_DATE), 2, '0'))``` the current code `Literal.fromSQL(defaultSQL)` will throw the exception before getting to the fallback: ``` Caused by: java.lang.AssertionError: assertion failed: function arguments must be resolved. at scala.Predef$.assert(Predef.scala:279) at org.apache.spark.sql.catalyst.analysis.FunctionRegistry$.$anonfun$expressionBuilder$1(FunctionRegistry.scala:1278) at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction(FunctionRegistry.scala:251) at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction$(FunctionRegistry.scala:245) at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:317) at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:325) at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:317) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$4(TreeNode.scala:586) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:121) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:586) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:579) at scala.collection.immutable.List.map(List.scala:251) at scala.collection.immutable.List.map(List.scala:79) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:768) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:579) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:556) at org.apache.spark.sql.catalyst.expressions.Literal$.fromSQL(literals.scala:317) at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.analyzeExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:393) at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:529) at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$getExistenceDefaultValues$1(ResolveDefaultColumnsUtil.scala:524) at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936) at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValues(ResolveDefaultColumnsUtil.scala:524) at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$existenceDefaultValues$2(ResolveDefaultColumnsUtil.scala:594) at scala.Option.getOrElse(Option.scala:201) at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.existenceDefaultValues(ResolveDefaultColumnsUtil.scala:592) ``` No Add unit test in StructTypeSuite No Closes #52274 from szehon-ho/more_default_value_fallback. Authored-by: Szehon Ho <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 2f305b6) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
#49962 added a fallback in case there were already broken (ie, non-resolved) persisted default values in catalogs. A broken one is something like 'current_database, current_user, current_timestamp' , these are non-deterministic and will bring wrong results in EXISTS_DEFAULT, where user expects the value resolved when they set the default.
Add yet another fallback for broken default default value, in this case one where there are nested function calls.
Why are the changes needed?
Take the case where the EXISTS_DEFAULT is :
CONCAT(YEAR(CURRENT_DATE), LPAD(WEEKOFYEAR(CURRENT_DATE), 2, '0'))
the current code
Literal.fromSQL(defaultSQL)
will throw the exception before getting to the fallback:Does this PR introduce any user-facing change?
No
How was this patch tested?
Add unit test in StructTypeSuite
Was this patch authored or co-authored using generative AI tooling?
No