[SPARK-52873][SQL][TESTS][FOLLOWUP] Fix test for non-ansi mode

bersprockets · dongjoon-hyun · commit 83d5ff15f544 · 2025-08-26T13:05:33.000-07:00
### What changes were proposed in this pull request? In the `JoinSuite` test expecting `ignoreDuplicateKey=true`, don't include a query where a build-side key is a string used in a numeric expression. ### Why are the changes needed? In non-ansi mode, casting of the string `t2.c1` to use in a numeric expression (`t2.c1 * 1000`) adds extra scaffolding around that expression in the build keys, but not in the condition. In the condition, `t2.c1 * 1000` is `(cast(c1#299 as double) * 1000.0` but in the build key, it is `knownfloatingpointnormalized(normalizenanandzero((cast(c1#299 as double) * 1000.0)` As a result, `t2.c1 * 1000` doesn't match between the build keys and the condition. Therefore, the optimization is not performed, and since the test checks for the optimization, the test fails in non-ansi mode. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually ran the offending test both with and without `SPARK_ANSI_SQL_MODE=false`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52128 from bersprockets/broken_nonansi_test. Authored-by: Bruce Robbins <bersprockets@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 20a6af7) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
@@ -1559,10 +1559,13 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan
       spark.range(10).map(i => (i.toString, i + 1)).toDF("c1", "c2").write.saveAsTable("t1")
       spark.range(10).map(i => ((i % 5).toString, i % 3)).toDF("c1", "c2").write.saveAsTable("t2")
 
+      spark.range(10).map(i => (i, i + 1)).toDF("c1", "c2").write.saveAsTable("t1a")
+      spark.range(10).map(i => (i % 5, i % 3)).toDF("c1", "c2").write.saveAsTable("t2a")
+
       val semiExpected1 = Seq(Row("0"), Row("1"), Row("2"), Row("3"), Row("4"))
       val antiExpected1 = Seq(Row("5"), Row("6"), Row("7"), Row("8"), Row("9"))
-      val semiExpected2 = Seq(Row("0"))
-      val antiExpected2 = Seq.tabulate(9) { x => Row((x + 1).toString) }
+      val semiExpected2 = Seq(Row(0))
+      val antiExpected2 = Seq.tabulate(9) { x => Row(x + 1) }
 
       val semiJoinQueries = Seq(
         // No join condition, ignore duplicated key.
@@ -1587,18 +1590,18 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan
         // the same as t2.c2). In this case, ignoreDuplicatedKey should be false
         (
           s"""
-             |SELECT /*+ SHUFFLE_HASH(t2) */ t1.c1 FROM t1 LEFT SEMI JOIN t2
-             |ON CAST((t1.c2+10000)/1000 AS INT) = CAST((t2.c2+10000)/1000 AS INT)
-             |AND t2.c2 >= t1.c2 + 1
+             |SELECT /*+ SHUFFLE_HASH(t2a) */ t1a.c1 FROM t1a LEFT SEMI JOIN t2a
+             |ON CAST((t1a.c2+10000)/1000 AS INT) = CAST((t2a.c2+10000)/1000 AS INT)
+             |AND t2a.c2 >= t1a.c2 + 1
              |""".stripMargin,
         false, semiExpected2, antiExpected2),
         // SPARK-52873: Have a join condition that contains the same expression as the
         // build-side join key,and does not violate any other rules for the join condition.
         // In this case, ignoreDuplicatedKey should be true
         (
           s"""
-             |SELECT /*+ SHUFFLE_HASH(t2) */ t1.c1 FROM t1 LEFT SEMI JOIN t2
-             |ON t1.c1 * 10000 = t2.c1 * 1000 AND t2.c1 * 1000 >= t1.c1
+             |SELECT /*+ SHUFFLE_HASH(t2a) */ t1a.c1 FROM t1a LEFT SEMI JOIN t2a
+             |ON t1a.c1 * 10000 = t2a.c1 * 1000 AND t2a.c1 * 1000 >= t1a.c1
              |""".stripMargin,
           true, semiExpected2, antiExpected2)
       )