Skip to content

Conversation

szehon-ho
Copy link
Member

@szehon-ho szehon-ho commented Sep 8, 2025

What changes were proposed in this pull request?

#49962 added a fallback in case there were already broken (ie, non-resolved) persisted default values in catalogs. A broken one is something like 'current_database, current_user, current_timestamp' , these are non-deterministic and will bring wrong results in EXISTS_DEFAULT, where user expects the value resolved when they set the default.

Add yet another fallback for broken default default value, in this case one where there are nested function calls.

Why are the changes needed?

Take the case where the EXISTS_DEFAULT is :
CONCAT(YEAR(CURRENT_DATE), LPAD(WEEKOFYEAR(CURRENT_DATE), 2, '0'))

the current code Literal.fromSQL(defaultSQL) will throw the exception before getting to the fallback:

Caused by: java.lang.AssertionError: assertion failed: function arguments must be resolved.
	at scala.Predef$.assert(Predef.scala:279)
	at org.apache.spark.sql.catalyst.analysis.FunctionRegistry$.$anonfun$expressionBuilder$1(FunctionRegistry.scala:1278)
	at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction(FunctionRegistry.scala:251)
	at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction$(FunctionRegistry.scala:245)
	at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:317)
	at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:325)
	at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:317)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$4(TreeNode.scala:586)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:121)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:586)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:579)
	at scala.collection.immutable.List.map(List.scala:251)
	at scala.collection.immutable.List.map(List.scala:79)
	at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:768)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:579)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:556)
	at org.apache.spark.sql.catalyst.expressions.Literal$.fromSQL(literals.scala:317)
	at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.analyzeExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:393)
	at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:529)
	at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$getExistenceDefaultValues$1(ResolveDefaultColumnsUtil.scala:524)
	at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936)
	at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValues(ResolveDefaultColumnsUtil.scala:524)
	at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$existenceDefaultValues$2(ResolveDefaultColumnsUtil.scala:594)
	at scala.Option.getOrElse(Option.scala:201)
	at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.existenceDefaultValues(ResolveDefaultColumnsUtil.scala:592)

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add unit test in StructTypeSuite

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Sep 8, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @szehon-ho and @dtenedor .

SPARK-51119 is a part of Apache Spark 4.0.0. We cannot make a follow-up for the released JIRA issue because the Fixed Version of this PR should be 4.1.0 and 4.0.2.

Please file a new JIRA issue to proceed this kind of bug fix.

@dongjoon-hyun
Copy link
Member

BTW, it would be great if you can prefer a new JIRA issue instead of too many follow-ups like SPARK-51119.

$ git log --oneline | grep SPARK-51119
4731c85c556 [SPARK-51119][SQL][FOLLOW-UP] Fix missing fallback case for parsing corrupt exists_default value
2a075956dc4 [SPARK-51119][SQL][FOLLOW-UP] Add fallback to ResolveDefaultColumnsUtil existenceDefaultValues
3455b822188 [SPARK-51119][SQL][FOLLOW-UP] ColumnDefinition.toV1Column should preserve EXISTS_DEFAULT resolution
63cb5204079 [SPARK-51119][SQL][FOLLOW-UP] Readers on executors resolving EXISTS_DEFAULT should not call catalogs
f959716556d [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs

@szehon-ho szehon-ho changed the title [SPARK-51119][SQL][FOLLOW-UP] Fix yet another missing fallback case for parsing corrupt exists_default value [SPARK-53527][SQL] Improve fallback of analyzeExistenceDefaultValue Sep 8, 2025
@szehon-ho
Copy link
Member Author

szehon-ho commented Sep 8, 2025

yes sure! I made a new JIRA: https://issues.apache.org/jira/browse/SPARK-53527 , thanks for the suggestion

@cloud-fan
Copy link
Contributor

thanks, merging to master/4.0!

@cloud-fan cloud-fan closed this in 2f305b6 Sep 9, 2025
cloud-fan pushed a commit that referenced this pull request Sep 9, 2025
#49962 added a fallback in case there were already broken (ie, non-resolved) persisted default values in catalogs. A broken one is something like 'current_database, current_user, current_timestamp' , these are non-deterministic and will bring wrong results in EXISTS_DEFAULT, where user expects the value resolved when they set the default.

Add yet another fallback for broken default default value, in this case one where there are nested function calls.

Take the case where the EXISTS_DEFAULT is :
```CONCAT(YEAR(CURRENT_DATE), LPAD(WEEKOFYEAR(CURRENT_DATE), 2, '0'))```

the current code `Literal.fromSQL(defaultSQL)` will throw the exception before getting to the fallback:
```
Caused by: java.lang.AssertionError: assertion failed: function arguments must be resolved.
	at scala.Predef$.assert(Predef.scala:279)
	at org.apache.spark.sql.catalyst.analysis.FunctionRegistry$.$anonfun$expressionBuilder$1(FunctionRegistry.scala:1278)
	at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction(FunctionRegistry.scala:251)
	at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistryBase.lookupFunction$(FunctionRegistry.scala:245)
	at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:317)
	at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:325)
	at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$fromSQL$1.applyOrElse(literals.scala:317)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$4(TreeNode.scala:586)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:121)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:586)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:579)
	at scala.collection.immutable.List.map(List.scala:251)
	at scala.collection.immutable.List.map(List.scala:79)
	at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:768)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:579)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:556)
	at org.apache.spark.sql.catalyst.expressions.Literal$.fromSQL(literals.scala:317)
	at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.analyzeExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:393)
	at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValue(ResolveDefaultColumnsUtil.scala:529)
	at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$getExistenceDefaultValues$1(ResolveDefaultColumnsUtil.scala:524)
	at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936)
	at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.getExistenceDefaultValues(ResolveDefaultColumnsUtil.scala:524)
	at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.$anonfun$existenceDefaultValues$2(ResolveDefaultColumnsUtil.scala:594)
	at scala.Option.getOrElse(Option.scala:201)
	at org.apache.spark.sql.catalyst.util.ResolveDefaultColumns$.existenceDefaultValues(ResolveDefaultColumnsUtil.scala:592)
```

No

Add unit test in StructTypeSuite

No

Closes #52274 from szehon-ho/more_default_value_fallback.

Authored-by: Szehon Ho <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 2f305b6)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants