You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that it would be possible to get any type of TableMetadata that was desired through using the object directly, but all of the fields are restricted to pub(crate) scope. I suspect the reason for this is safety, i.e. ensuring that creation occurs through the builder pattern where the relevant checks are performed on call to build().
Questions:
Would it be problematic to lift the restriction on the TableMetadata fields to be pub1 or allow the creation of TableMetadata without reassigning field IDs?
If the above is not possible, is there an example of creating the iceberg metadata file hierarchy in the correct way?
For extra context, we're currently constructing Iceberg metadata around pre-existing parquet files written by another system; however, there is no Iceberg catalog or prior metadata JSON. I noticed there is also a StaticTable; however, this requires either pre-existing JSON from FileIO or an input TableMetadata, this 2nd option brings us back to the above issue.
This assignment leads to a mismatch in what is shown in the table metadata JSON vs the actual parquet file:
parquet schema
required group field_id=-1 arrow_schema {
optional binary field_id=2 cpu (String);
optional binary field_id=3 host1 (String);
optional int64 field_id=1 time (Timestamp(isAdjustedToUTC=false, timeUnit=microseconds, is_from_converted_type=false, force_set_converted_type=false));
}
iceberg metadata JSON schema snippet
This reassignment occurs to the order that they appear within the parquet/arrow Schema, rather than the given field IDs.
This is in part a question and open for discussion.
When building
TableMetadata
through theTableMetadataBuilder
, all options of building "from scratch" force a reassignment of field IDs:TableMetadataBuilder::new
TableMetadataBuilder::from_table_creation
, as this is a wrapper overTableMetadataBuilder::new
using theTableCreation
struct.I noticed that it would be possible to get any type of
TableMetadata
that was desired through using the object directly, but all of the fields are restricted topub(crate)
scope. I suspect the reason for this is safety, i.e. ensuring that creation occurs through the builder pattern where the relevant checks are performed on call tobuild()
.Questions:
TableMetadata
fields to bepub
1 or allow the creation ofTableMetadata
without reassigning field IDs?For extra context, we're currently constructing Iceberg metadata around pre-existing parquet files written by another system; however, there is no Iceberg catalog or prior metadata JSON. I noticed there is also a
StaticTable
; however, this requires either pre-existing JSON from FileIO or an inputTableMetadata
, this 2nd option brings us back to the above issue.This assignment leads to a mismatch in what is shown in the table metadata JSON vs the actual parquet file:
parquet schema
iceberg metadata JSON schema snippet
This reassignment occurs to the order that they appear within the parquet/arrow
Schema
, rather than the given field IDs.Footnotes
Considering this conflicts with the native Java implementation, I would also suspect it is problematic to do in the Rust version. ↩
The text was updated successfully, but these errors were encountered: