-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Support arrays_overlap function #1312
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1312 +/- ##
=============================================
- Coverage 56.12% 39.09% -17.04%
- Complexity 976 2065 +1089
=============================================
Files 119 260 +141
Lines 11743 60269 +48526
Branches 2251 12834 +10583
=============================================
+ Hits 6591 23562 +16971
- Misses 4012 32226 +28214
- Partials 1140 4481 +3341 ☔ View full report in Codecov by Sentry. |
@@ -2300,6 +2300,12 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde with CometExprShim | |||
expr.children(1), | |||
inputs, | |||
(builder, binaryExpr) => builder.setArrayAppend(binaryExpr)) | |||
case ArraysOverlap(leftArrayExpr, rightArrayExpr) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we support this for all input data types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the PR only contains basic tests, could you add a check to enable this expression only if CometConf.COMET_CAST_ALLOW_INCOMPATIBLE
is enabled? We can remove this check in a future PR that adds comprehensive tests and demonstrates that we have Spark-compatible behavior for all supported data types.`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense and COMET_CAST_ALLOW_INCOMPATIBLE
check has been added.
@@ -2568,4 +2568,21 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSparkPlanHelper { | |||
} | |||
} | |||
} | |||
|
|||
test("arrays_overlap") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add tests where one or both input expressions evaluate to null on some rows? This can be achieved with a CASE expression as in some of the other array tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for more coverage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Case Expression based test case has been added.
@erenavsarogullari could you rebase this PR when you have time? We can go ahead and merge this and then improve tests as part of #1269 |
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
Outdated
Show resolved
Hide resolved
32b78fd
to
86d8f57
Compare
86d8f57
to
96f776e
Compare
Thanks @andygrove and @kazuyukitanimura for the review. PR has been rebased and addressed the comments. FYI, Currently, DataFusion does not support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @erenavsarogullari. LGTM.
checkSparkAnswerAndOperator(sql( | ||
"SELECT arrays_overlap(array('a', null), array('b', null)) from t1 where _1 is not null")) | ||
checkSparkAnswerAndOperator(spark.sql( | ||
"SELECT arrays_overlap((CASE WHEN _2 =_3 THEN array(_6, _7) END), array(_6, _7)) FROM t1")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think _2 = _3
may always be true. The problem with makeParquetFileAllTypes
is that every for each row, each column contains the same integer value cast to the column's type, so it is not ideal for tests like this. We can improve as part of #1269
Which issue does this PR close?
Related to Epic: #1042
arrays_overlap
:select arrays_overlap(array('hello', '-', 'world'), array('hello'))
=>true
DataFusion' s array_has_any has same behavior with Spark 's arrays_overlap function
Spark:
https://docs.databricks.com/en/sql/language-manual/functions/arrays_overlap.htmlDataFusion:
https://datafusion.apache.org/user-guide/sql/scalar_functions.html#array-has-anyRationale for this change
Defined under Epic: #1042
What changes are included in this PR?
planner.rs:
Maps Spark 'sarrays_overlap
function to DataFusionarray_has_any
physical expression from Spark physical expression with return type: DataType::Boolean,expr.proto:
arrays_overlap message has been added,QueryPlanSerde.scala:
arrays_overlap pattern matching case has been added,CometExpressionSuite.scala:
A new UT has been added for arrays_overlap function.How are these changes tested?
A new UT has been added.