Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Move all array_* serde to new framework, use correct INCOMPAT config #1349

Merged
merged 10 commits into from
Jan 30, 2025

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Jan 29, 2025

Which issue does this PR close?

Part of #1345

Rationale for this change

This PR updates the recently added array_* expression handling to mark them all as incompatible for now because we do not have comprehensive testing or validation for which data types are supported

The check is now based on a new COMET_EXPR_ALLOW_INCOMPATIBLE config instead of COMET_CAST_ALLOW_INCOMPATIBLE, which I had previously (mistakenly) suggested.

Rather than duplicate the if INCOMPAT check and the associated error message, there is now a new marker trait IncompatExpr.

What changes are included in this PR?

  • New IncompExpr trait
  • Moved logic for array_* serde into separate classes
  • Moved some tests from CometExpressionSuite to CometArrayExpressionSuite

How are these changes tested?

Existing tests

@andygrove andygrove marked this pull request as ready for review January 29, 2025 00:23
Comment on lines 2385 to 2390
case _: ArrayRemove => toProto(CometArrayRemove)
case _: ArrayContains => toProto(CometArrayContains)
case _: ArrayAppend => toProto(CometArrayAppend)
case _: ArrayIntersect => toProto(CometArrayIntersect)
case _: ArrayJoin => toProto(CometArrayJoin)
case _: ArraysOverlap => toProto(CometArraysOverlap)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually I would like to move this to a Map[SparkExpr, CometExpr]

@@ -3490,3 +3437,6 @@ trait CometExpressionSerde {
inputs: Seq[Attribute],
binding: Boolean): Option[ExprOuterClass.Expr]
}

/** Marker trait for an expression that is not guaranteed to be 100% compatible with Spark */
trait IncompatExpr {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later, we could add a method to this trait to provide the reason why an expression is incompatible and this could be used to generate documentation.

@andygrove andygrove requested a review from comphead January 29, 2025 14:18
@codecov-commenter
Copy link

codecov-commenter commented Jan 29, 2025

Codecov Report

Attention: Patch coverage is 70.42254% with 21 lines in your changes missing coverage. Please review.

Project coverage is 39.16%. Comparing base (f09f8af) to head (23accfa).
Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
...src/main/scala/org/apache/comet/serde/arrays.scala 73.21% 13 Missing and 2 partials ⚠️
.../scala/org/apache/comet/serde/QueryPlanSerde.scala 25.00% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##               main    #1349       +/-   ##
=============================================
- Coverage     56.12%   39.16%   -16.96%     
- Complexity      976     2065     +1089     
=============================================
  Files           119      262      +143     
  Lines         11743    60323    +48580     
  Branches       2251    12836    +10585     
=============================================
+ Hits           6591    23627    +17036     
- Misses         4012    32223    +28211     
- Partials       1140     4473     +3333     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

conf("spark.comet.expression.allowIncompatible")
.doc(
"Comet is not currently fully compatible with Spark for all expressions. " +
"Set this config to true to allow them anyway. See compatibility guide " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Set this config to true to allow them anyway. See compatibility guide " +
"Set this config to true to allow them anyway. See Compatibility Guide " +

.doc(
"Comet is not currently fully compatible with Spark for all expressions. " +
"Set this config to true to allow them anyway. See compatibility guide " +
"for more information.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"for more information.")
"for more information. (docs/user-guide/compatibility.md)")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit to update this by referencing a new CometConf.COMPAT_GUIDE variable with the link to the published documentation.

I regenerated the compatibility guide as well.

@@ -64,6 +64,7 @@ Comet provides the following configuration settings.
| spark.comet.explain.native.enabled | When this setting is enabled, Comet will provide a tree representation of the native query plan before execution and again after execution, with metrics. | false |
| spark.comet.explain.verbose.enabled | When this setting is enabled, Comet will provide a verbose tree representation of the extended information. | false |
| spark.comet.explainFallback.enabled | When this setting is enabled, Comet will provide logging explaining the reason(s) why a query stage cannot be executed natively. Set this to false to reduce the amount of logging. | false |
| spark.comet.expression.allowIncompatible | Comet is not currently fully compatible with Spark for all expressions. Set this config to true to allow them anyway. See compatibility guide for more information. | false |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| spark.comet.expression.allowIncompatible | Comet is not currently fully compatible with Spark for all expressions. Set this config to true to allow them anyway. See compatibility guide for more information. | false |
| spark.comet.expression.allowIncompatible | Comet is not currently fully compatible with Spark for all expressions. Set this config to true to allow them anyway. See Compatibility Guide for more information. (docs/user-guide/compatibility.md) | false |

case _: IncompatExpr if !CometConf.COMET_EXPR_ALLOW_INCOMPATIBLE.get() =>
withInfo(
expr,
s"$expr is not fully compatible with Spark. To enable it anyway, set " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s"$expr is not fully compatible with Spark. To enable it anyway, set " +
s"$expr is not fully compatible with Spark, see details in Compatibility Guide. To enable it anyway, set " +

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit to update this by referencing a new CometConf.COMPAT_GUIDE variable with the link to the published documentation.

@@ -64,6 +64,7 @@ Comet provides the following configuration settings.
| spark.comet.explain.native.enabled | When this setting is enabled, Comet will provide a tree representation of the native query plan before execution and again after execution, with metrics. | false |
| spark.comet.explain.verbose.enabled | When this setting is enabled, Comet will provide a verbose tree representation of the extended information. | false |
| spark.comet.explainFallback.enabled | When this setting is enabled, Comet will provide logging explaining the reason(s) why a query stage cannot be executed natively. Set this to false to reduce the amount of logging. | false |
| spark.comet.expression.allowIncompatible | Comet is not currently fully compatible with Spark for all expressions. Set this config to true to allow them anyway. For more information, refer to the Comet Compatibility Guide (https://datafusion.apache.org/comet/user-guide/compatibility.html). | false |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker but looks like the compatibility guide does not mention about the array related issues

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I had updated this locally but forgot that these changes have to go in compatibility-template.md instead of compatibility,md. 🤦‍♂️

I have added them now.

Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending ci

Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@andygrove andygrove merged commit 443b5db into apache:main Jan 30, 2025
74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants