-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-3070: Add Variant logical type annotation to parquet-java #3072
base: master
Are you sure you want to change the base?
Conversation
Do we need to support writing and reading variant data? |
The Variant reading and writing are getting implemented in Iceberg and/or the engines themselves. I think later we can think of pulling the implementation to Parquet if needed. |
I think this is problematic if the spec lives in parquet and doesn't have a complete implementation per previously agreed upon guidelines for new parquet features. This probably warrants a discussion on the mailing list. CC @julienledem @rdblue @RussellSpitzer |
@aihuaxu I agree with @emkornfield that the It would also be great to drop some example parquet files in https://github.com/apache/parquet-testing, this will also help the adoption of other implementations, see apache/parquet-format#456 (comment) |
Usually we need two reference implementations for spec changes like this. I'm not sure if there is any chance to have another implementation ready in a timely manner. IMO, at least parquet-java should support basic roundtrip read and write. |
I see. Per guideline, we need to have the implementation in parquet-java and then another one. Do we usually include the implementation with this annotation change or should be separate?
|
I think it should be in one change. The parquet-format cannot be released without concrete PoC implementation in parquet-java. Without that release, separate changes may break CI and thus cannot be merged. |
@wgtmac With https://github.com/apache/parquet-java/pull/3117/files implementing encoding/decoding, should we consider merging this separately? |
I think at least it needs the conversion from/to thrift definition of the variant type. So we need to wait for the release of parquet-format 2.11.0. |
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
Show resolved
Hide resolved
9473e1f
to
e7c97e6
Compare
parquet-column/src/test/java/org/apache/parquet/parser/TestParquetParser.java
Outdated
Show resolved
Hide resolved
This reverts commit 12b4061.
b67c034
to
a683f6a
Compare
parquet-column/src/test/java/org/apache/parquet/schema/TestTypeBuilders.java
Show resolved
Hide resolved
@@ -516,14 +517,19 @@ public Optional<LogicalType> visit(LogicalTypeAnnotation.Float16LogicalTypeAnnot | |||
} | |||
|
|||
@Override | |||
public Optional<LogicalType> visit(LogicalTypeAnnotation.UnknownLogicalTypeAnnotation intervalLogicalType) { | |||
public Optional<LogicalType> visit(LogicalTypeAnnotation.UnknownLogicalTypeAnnotation unknownLogicalType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this to a separate PR?
parquet-format-structures/src/main/java/org/apache/parquet/format/LogicalTypes.java
Outdated
Show resolved
Hide resolved
6d66fda
to
af4576a
Compare
af4576a
to
ba4bbdf
Compare
parquet-format-structures/src/main/java/org/apache/parquet/format/LogicalTypes.java
Outdated
Show resolved
Hide resolved
parquet-format-structures/src/test/java/org/apache/parquet/format/TestLogicalTypes.java
Outdated
Show resolved
Hide resolved
|
||
@Override | ||
public Optional<LogicalType> visit(LogicalTypeAnnotation.VariantLogicalTypeAnnotation variantLogicalType) { | ||
return of(LogicalTypes.VARIANT()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other cases create new instances of the logical type, but I agree with using the LogicalTypes
constant. This just doesn't need to be a method call. It would also be nice to follow up with a PR that converts all of these to the correct constant rather than creating new instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
Rationale for this change
This is to add Variant logical type in parquet-java to be used by dependent projects.
What changes are included in this PR?
Variant logical type is added to LogicalTypeAnnotation.
Are these changes tested?
Yes.
Are there any user-facing changes?
Yes. Variant logical type is available.
Closes #3070