Skip to content

duckdb delta 1.1 Found unmasked nulls #84

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
djouallah opened this issue Sep 10, 2024 · 6 comments
Open

duckdb delta 1.1 Found unmasked nulls #84

djouallah opened this issue Sep 10, 2024 · 6 comments

Comments

@djouallah
Copy link

djouallah commented Sep 10, 2024

what is this ? i am getting the same error when using duckdb 1.0.0, does the extension for duckdb 1 get upgraded too ?

IOException: IO Error: Hit DeltaKernel FFI error (from: While trying to read from delta table: '/lakehouse/default/Tables/dbo/result/'): Hit error: 2 (ArrowError) with message (Invalid argument error: Found unmasked nulls for non-nullable StructArray field "predicate")

@djouallah djouallah changed the title duckdb delta 1.1 regression : Found unmasked nulls duckdb delta 1.1 Found unmasked nulls Sep 10, 2024
@djouallah
Copy link
Author

as a workaround, make sure to compact your delta table and duckdb should be able to read it

from deltalake import DeltaTable
dt = DeltaTable('xxxxxxxxxxxxxxxxxx',storage_options={"allow_unsafe_rename":"true"})
if len(dt.file_uris()) >= 50 :
            dt.optimize.compact()
            dt.vacuum()
            dt.cleanup_metadata()
            dt.create_checkpoint()

@29antonioac
Copy link

Commenting to confirm the issue still exists on 1.1.3 and extension version f71402e. For me it happens sometimes depending on the filter: on the partition column is okay but in other columns make the query to fail.

In my case I'm reading from GCP, I'm not able to reproduce in local so I can't provide a minimal example 😢

@Tommel71
Copy link

I am running into the same issue. Vacuuming didnt help for me

@haoyunbaby
Copy link

The same. My workaround is use dataset = DeltaTable(path, storage_options=storage=options).to_pyarrow_dataset()

@santosh-d3vpl3x
Copy link

Ran into this issue as well.

The issue, weirdly enough, didn't pop up when I changed one of the join to left instead of inner.

@phillipleblanc
Copy link

This Arrow PR should fix this issue: apache/arrow-rs#7436

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants