-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type Widening in ALTER TABLE CHANGE COLUMN #2645
Conversation
872d7fd
to
9a5e7dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I left a bunch of test feedback.
val isEnabled = DeltaConfigs.ENABLE_TYPE_WIDENING.fromMetaData(metadata) | ||
if (isEnabled && !isSupported(protocol)) { | ||
throw new IllegalStateException( | ||
s"Table property '${DeltaConfigs.ENABLE_TYPE_WIDENING.key}' is " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this should be using the error framework?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should never happen unless there's a bug in the implementation, so I wouldn't give it an error class. We typically wouldn't want to document that error as a user-facing error
spark/src/test/scala/org/apache/spark/sql/delta/DeltaTypeWideningSuite.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/DeltaTypeWideningSuite.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/DeltaTypeWideningSuite.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/DeltaTypeWideningSuite.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/DeltaTypeWideningSuite.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/DeltaTypeWideningSuite.scala
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/DeltaTypeWideningSuite.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/DeltaTypeWideningSuite.scala
Outdated
Show resolved
Hide resolved
|
||
test("row group skipping Short -> Int") { | ||
withSQLConf( | ||
SQLConf.FILES_MAX_PARTITION_BYTES.key -> 1024.toString) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we verify that this actually leads to row group skipping? FWIW it looks like this should lead to regular min/max data skipping on the Delta log. The max partition bytes only affects how we read chunks, i.e., it'll split a parquet file of >1024 bytes over multiple tasks (1024 bytes each), but if the file is still a single row group, then only one of the tasks will actually read it (the task that gets the "middle" byte of the row group), and the other files will simply ignore the row group. And with the current way of writing, each file will have only one row group, so the row group skipping will be equal to the Delta-level file skipping, so it should never skip, really.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't make a lot of sense here anymore, I added it back when I found the overflow issue in parquet row group skipping and kept it around but it's redundant with the test I added when I fixed the issue in Spark:
https://github.com/databricks/runtime/blob/a9d6a9d6191264ced1f0658ff6675b0f30e8e77f/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala#L1216
I'm removing it, I don't think it's worth the effort to get it to accurately cover an issue that the Spark test already covers a lot better
@@ -625,6 +626,18 @@ object ManagedCommitTableFeature | |||
} | |||
} | |||
|
|||
object TypeWideningTableFeature extends ReaderWriterFeature(name = "typeWidening-dev") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are using -dev right now and its not ready for users using it. is it behind the isTesting flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Which Delta project/connector is this regarding?
Description
This change introduces the
typeWidening
delta table feature, allowing to widen the type of existing columns and fields in a delta table using theALTER TABLE CHANGE COLUMN TYPE
orALTER TABLE REPLACE COLUMNS
commands.The table feature is introduced as
typeWidening-dev
during implementation and is available in testing only.For now, only byte -> short -> int are supported. Other changes will require support in the Spark parquet reader that will be introduced in Spark 4.0
Type widening feature request: #2622
Type Widening protocol RFC: #2624
How was this patch tested?
A new test suite
DeltaTypeWideningSuite
is created, containing:DeltaTypeWideningAlterTableTests
: Covers applying supported and unsupported type changes on partitioned columns, non-partitioned columns and nested fieldsDeltaTypeWideningTableFeatureTests
: Covers adding thetypeWidening
table featureDoes this PR introduce any user-facing changes?
The table feature is available in testing only, there's no user-facing changes as of now.
The type widening table feature will introduce the following changes:
typeWidening
via a table property:Note: both ALTER TABLE commands reuse the existing syntax for setting a table property and applying a type change, no new SQL syntax is being introduced by this feature.