-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #305: ManifestEntry partition field schema should be dynamically … #307
base: main
Are you sure you want to change the base?
Fix #305: ManifestEntry partition field schema should be dynamically … #307
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of comments plus you need to add the Apache license blurb at the top of the new files.
utils.go
Outdated
switch f.Type.(type) { | ||
case *StringType: | ||
sch = avro.NewPrimitiveSchema(avro.String, nil) | ||
case *Int32Type: | ||
sch = avro.NewPrimitiveSchema(avro.Int, nil) | ||
case *Int64Type: | ||
sch = avro.NewPrimitiveSchema(avro.Long, nil) | ||
case *BinaryType: | ||
sch = avro.NewPrimitiveSchema(avro.Bytes, nil) | ||
case *BooleanType: | ||
sch = avro.NewPrimitiveSchema(avro.Boolean, nil) | ||
case *Float32Type: | ||
sch = avro.NewPrimitiveSchema(avro.Float, nil) | ||
case *Float64Type: | ||
sch = avro.NewPrimitiveSchema(avro.Double, nil) | ||
case *DateType: | ||
sch = avro.NewPrimitiveSchema(avro.Int, avro.NewPrimitiveLogicalSchema(avro.Date)) | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing some types here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Do we agree that we only need to handle primitive/scalar types here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I'm aware, yes. You can't partition on a struct/list/map type to my knowledge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you see missing ones.
return b.m | ||
} | ||
|
||
type manifestFileV2 struct { | ||
partitionType *StructType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be integrated with the FieldSummary
instead of being at the top level here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In either case, we should probably be populating this after reading in a manifest file too, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it belongs to FieldSummary
.
We also need to be able to do Avro
-> Iceberg
schema conversion to populate the field on read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done Avro
<--> Iceberg
schema conversions.
Can you please help me with the populate part ? Maybe point me to the right part of the code ?
… single generic Must function - create constants of non-parametric Avro schema to prevent wasting resources on rebuilding them each time - add license blurb to all new files
…etadata: https://iceberg.apache.org/spec/#manifests - fix a few bug in Metadata builder that triggers on table creation
I'm trying to create and maintain Iceberg tables from externally generated Parquet files with this lib. While writing the various Avro metadata files with the lib, I encountered many errors from ClickHouse, which complained about missing metadata on the Avro manifest files. As of the spec, Manifest files must have specific metadata to be valid: https://iceberg.apache.org/spec/#manifests I thus created the As of now, the table produced is queryable by ClickHouse. |
@arnaudbriche i'll take a look and see if I can condense and simplify this a bit for you, probably quicker than going back and forth like this |
Avro schemas for Iceberg metadata files are now built dynamically with code.
I know this changes a lot of code, but I did not find other less intrusive ways to fix the issue.