Skip to content

Commit 83d5ff1

Browse files
bersprocketsdongjoon-hyun
authored andcommitted
[SPARK-52873][SQL][TESTS][FOLLOWUP] Fix test for non-ansi mode
### What changes were proposed in this pull request? In the `JoinSuite` test expecting `ignoreDuplicateKey=true`, don't include a query where a build-side key is a string used in a numeric expression. ### Why are the changes needed? In non-ansi mode, casting of the string `t2.c1` to use in a numeric expression (`t2.c1 * 1000`) adds extra scaffolding around that expression in the build keys, but not in the condition. In the condition, `t2.c1 * 1000` is `(cast(c1#299 as double) * 1000.0` but in the build key, it is `knownfloatingpointnormalized(normalizenanandzero((cast(c1#299 as double) * 1000.0)` As a result, `t2.c1 * 1000` doesn't match between the build keys and the condition. Therefore, the optimization is not performed, and since the test checks for the optimization, the test fails in non-ansi mode. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually ran the offending test both with and without `SPARK_ANSI_SQL_MODE=false`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52128 from bersprockets/broken_nonansi_test. Authored-by: Bruce Robbins <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 20a6af7) Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent f3c2d39 commit 83d5ff1

File tree

1 file changed

+10
-7
lines changed

1 file changed

+10
-7
lines changed

sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1559,10 +1559,13 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan
15591559
spark.range(10).map(i => (i.toString, i + 1)).toDF("c1", "c2").write.saveAsTable("t1")
15601560
spark.range(10).map(i => ((i % 5).toString, i % 3)).toDF("c1", "c2").write.saveAsTable("t2")
15611561

1562+
spark.range(10).map(i => (i, i + 1)).toDF("c1", "c2").write.saveAsTable("t1a")
1563+
spark.range(10).map(i => (i % 5, i % 3)).toDF("c1", "c2").write.saveAsTable("t2a")
1564+
15621565
val semiExpected1 = Seq(Row("0"), Row("1"), Row("2"), Row("3"), Row("4"))
15631566
val antiExpected1 = Seq(Row("5"), Row("6"), Row("7"), Row("8"), Row("9"))
1564-
val semiExpected2 = Seq(Row("0"))
1565-
val antiExpected2 = Seq.tabulate(9) { x => Row((x + 1).toString) }
1567+
val semiExpected2 = Seq(Row(0))
1568+
val antiExpected2 = Seq.tabulate(9) { x => Row(x + 1) }
15661569

15671570
val semiJoinQueries = Seq(
15681571
// No join condition, ignore duplicated key.
@@ -1587,18 +1590,18 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan
15871590
// the same as t2.c2). In this case, ignoreDuplicatedKey should be false
15881591
(
15891592
s"""
1590-
|SELECT /*+ SHUFFLE_HASH(t2) */ t1.c1 FROM t1 LEFT SEMI JOIN t2
1591-
|ON CAST((t1.c2+10000)/1000 AS INT) = CAST((t2.c2+10000)/1000 AS INT)
1592-
|AND t2.c2 >= t1.c2 + 1
1593+
|SELECT /*+ SHUFFLE_HASH(t2a) */ t1a.c1 FROM t1a LEFT SEMI JOIN t2a
1594+
|ON CAST((t1a.c2+10000)/1000 AS INT) = CAST((t2a.c2+10000)/1000 AS INT)
1595+
|AND t2a.c2 >= t1a.c2 + 1
15931596
|""".stripMargin,
15941597
false, semiExpected2, antiExpected2),
15951598
// SPARK-52873: Have a join condition that contains the same expression as the
15961599
// build-side join key,and does not violate any other rules for the join condition.
15971600
// In this case, ignoreDuplicatedKey should be true
15981601
(
15991602
s"""
1600-
|SELECT /*+ SHUFFLE_HASH(t2) */ t1.c1 FROM t1 LEFT SEMI JOIN t2
1601-
|ON t1.c1 * 10000 = t2.c1 * 1000 AND t2.c1 * 1000 >= t1.c1
1603+
|SELECT /*+ SHUFFLE_HASH(t2a) */ t1a.c1 FROM t1a LEFT SEMI JOIN t2a
1604+
|ON t1a.c1 * 10000 = t2a.c1 * 1000 AND t2a.c1 * 1000 >= t1a.c1
16021605
|""".stripMargin,
16031606
true, semiExpected2, antiExpected2)
16041607
)

0 commit comments

Comments
 (0)