Description
Hello, I'm very interested by the library usage however I struggle to apply it to a parquet file other than the dremel example.
from struct2tensor import expression_impl
import struct2tensor as s2t
import pyarrow as pa
import pyarrow.parquet as pq
tbl = pa.table([pa.array([0, 1])], names='a')
pq.ParquetWriter('/tmp/test', tbl.schema).write_table(tbl)
filenames = ["/tmp/test"]
batch_size = 2
exp = s2t.expression_impl.parquet.create_expression_from_parquet_file(filenames)
ps = exp.project(['a'])
val = s2t.expression_impl.parquet.calculate_parquet_values([ps], exp,
filenames, batch_size)
for h in val:
break
segfaults with the error:
2021-04-15 15:30:40.254237: E struct2tensor/kernels/parquet/parquet_reader.cc:198]
The repetition type of the root node was 0, but should be 2. There may be something wrong with your supplied parquet schema. We will treat it as a repeated field.
2021-04-15 15:31:46.428109: W tensorflow/core/framework/dataset.cc:477]
Input of ParquetDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
I also tried saving again the dremel file loaded with Pyarrow and dumping it right away and I can reproduce the error.
How do you advise to save your parquet ?
Thanks for your help !