-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51067][SQL] Revert session level collation as object level collation will be used instead #49772
base: master
Are you sure you want to change the base?
[SPARK-51067][SQL] Revert session level collation as object level collation will be used instead #49772
Conversation
@cloud-fan, @stefankandic, please take a look - this is just a revert of PR #48962, as we decided not to proceed with session level collations for now, and will do a follow up to apply object level collations for queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the other audience, could you provide a link for this decision, @dejankrak-db ?
The decision has since been made not to ship this functionality for now,
@dongjoon-hyun , there are 2 main reasons for this decision:
Therefore, it was decided to pause session level collation functionality for now, thus partially reverting unused parts of the original PR for maintaining a cleaner code moving forward, while still keeping other parts required to support object level collation resolution. Hope this clarifies the reasoning well! I have also updated the PR description with this info, thanks! |
@@ -47,7 +46,7 @@ object ResolveDefaultStringTypes extends Rule[LogicalPlan] { | |||
if (isDDLCommand(plan)) { | |||
transformDDL(plan) | |||
} else { | |||
transformPlan(plan, sessionDefaultStringType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we remove the transformPlan
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also remove the hack in the apply
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefankandic kindly helped refactor this code to remove all unnecessary/unused references, but we still need to do transform plan for DML statements using the default string type which is now UTF8_BINARY, and the apply method logic is still needed to ensure correct results where default string type is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire rule is useless now because there is no longer session collation. The DDL collation resolution is not implemented yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can think of it as writing a new rule to resolve DDL commands, and it should be very different from the current form.
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Outdated
Show resolved
Hide resolved
I'm good with removing this hacky feature. It's too fragile to use |
@cloud-fan, we actually agreed on fully removing the associated DEFAULT_COLLATION and defaultStringType from the code, which essentially removes the entire feature. |
What changes were proposed in this pull request?
This PR is a partial revert of the original PR #48962 that introduced the resolution of default session level collation for DDL and DML queries.
The part that is reverted is the default collation resolution for DML queries, whereas the part that is kept is the default collation resolution for DDL queries, which is required to apply the object level collation that was introduced as part of PR #49084.
Why are the changes needed?
As there were some unresolved technical issues when attempting to merge the functionality from PR #48962 on Delta side, due to its effect on DML queries, it was decided to pause this functionality for now, thus partially reverting unused parts for maintaining a cleaner code moving forward.
Also, this is inline with customer feedback where object level collation is much more requested functionality, so the focus is to introduce the resolution of object level collation for DDL queries instead, allowing the collation to be specified per table or view on their creation or modification, with propagating the default collation specified to subsequent queries on top of those entities.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing tests that cover the collations functionality, as well as some of the new dedicated tests.
Was this patch authored or co-authored using generative AI tooling?
No