You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reading Complex Types from Parquet/Iceberg, Part 1
We now have new native_datafusion and native_iceberg_compat scans that use DataFusion's ParquetExec which already supports complex types. These new scans are not fully implemented yet and the first thing we need to do is fix all failing tests when these scans are made the default.
Goal is to complete this section for the 0.7.0 release before end of March.
Reduce code duplication betwen native_datafusion and native_iceberg_compat
Add Parquet reader metrics for both paths
Schema adapter handling of timestamps (including int96, timestamp_ntz)
Schema adapter handling of decimals (decimal128 config)
Fix all test failures with native_datafusion scan
Fix all test failures with native_iceberg_compat scan
Reading Complex Types from Parquet/Iceberg, Part 2
Aiming for 0.8.0 release.
Support reading complex types with native_datafusion scan
Array
Struct
Map
Support reading complex types with native_iceberg_compat scan
Array
Struct
Map
Reading Complex Types from Parquet/Iceberg, Part 3
These items may not be relevant to all users, but for some environments, there is more work required to allow the new ParquetExec scans to be used. Comet's current default native_comet scan is JVM-based and leverages Hadoop data source functionality that is not available in DataFusion.
Wrap Hadoop file readers in JNI so that we can call from Rust, to support use cases such as encryption
What is the problem the feature request solves?
We would like Comet to fully support complex types (arrays, structs, and maps). This issue is for tracking all of the individual issues.
Google doc: https://docs.google.com/document/d/1eiDFEScPjxBMahJW6lmBI8JjVlI6CwhiJgkTSsTvPVY/edit?usp=sharing
Reading Complex Types from Parquet/Iceberg, Part 1
We now have new
native_datafusion
andnative_iceberg_compat
scans that use DataFusion'sParquetExec
which already supports complex types. These new scans are not fully implemented yet and the first thing we need to do is fix all failing tests when these scans are made the default.Goal is to complete this section for the 0.7.0 release before end of March.
native_datafusion
andnative_iceberg_compat
native_datafusion
scannative_iceberg_compat
scanReading Complex Types from Parquet/Iceberg, Part 2
Aiming for 0.8.0 release.
native_datafusion
scannative_iceberg_compat
scanReading Complex Types from Parquet/Iceberg, Part 3
These items may not be relevant to all users, but for some environments, there is more work required to allow the new
ParquetExec
scans to be used. Comet's current defaultnative_comet
scan is JVM-based and leverages Hadoop data source functionality that is not available in DataFusion.Supporting expressions that operate on complex types
to_json
Performance
Testing
Older / related issues:
Describe the potential solution
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: