-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage with pyarrow parquet #10
Comments
Hi Tanguy, The dremel example was created with parquet's c++ api [1]. The last time I checked (~2 years ago), pyarrow's parquet writer/reader did not properly support structured data. But this could have changed. Do you have the full stack trace? The errors you listed are not fatal errors. |
Hello thanks for the answer ! It's actually a core dump SEGFAULT:
In python side there is nothing except the log right before. I remember some conversations on Pyarrow ability to store those but I thought it was resolved. The parquet-cpp however seems to now be in Arrow repo. I'll try to see if I can understand the difference between both format ! |
Hello, I'm very interested by the library usage however I struggle to apply it to a parquet file other than the dremel example.
segfaults with the error:
2021-04-15 15:30:40.254237: E struct2tensor/kernels/parquet/parquet_reader.cc:198]
The repetition type of the root node was 0, but should be 2. There may be something wrong with your supplied parquet schema. We will treat it as a repeated field.
2021-04-15 15:31:46.428109: W tensorflow/core/framework/dataset.cc:477]
Input of ParquetDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
I also tried saving again the dremel file loaded with Pyarrow and dumping it right away and I can reproduce the error.
How do you advise to save your parquet ?
Thanks for your help !
The text was updated successfully, but these errors were encountered: