Usage with pyarrow parquet

Hello, I'm very interested by the library usage however I struggle to apply it to a parquet file other than the dremel example. 

````
from struct2tensor import expression_impl
import struct2tensor as s2t
import pyarrow as pa
import pyarrow.parquet as pq

tbl = pa.table([pa.array([0, 1])], names='a')
pq.ParquetWriter('/tmp/test', tbl.schema).write_table(tbl)
filenames = ["/tmp/test"]
batch_size = 2

exp = s2t.expression_impl.parquet.create_expression_from_parquet_file(filenames)
ps = exp.project(['a'])

val = s2t.expression_impl.parquet.calculate_parquet_values([ps], exp, 
                                        filenames, batch_size)
for h in val:
    break
````

segfaults with the error:
2021-04-15 15:30:40.254237: E struct2tensor/kernels/parquet/parquet_reader.cc:198]
The repetition type of the root node was 0, but should be 2. There may be something wrong with your supplied parquet schema. We will treat it as a repeated field.

2021-04-15 15:31:46.428109: W tensorflow/core/framework/dataset.cc:477]
Input of ParquetDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.

I also tried saving again the dremel file loaded with Pyarrow and dumping it right away and I can reproduce the error. 

How do you advise to save your parquet ? 

Thanks for your help !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Usage with pyarrow parquet #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Usage with pyarrow parquet #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions