Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wrong result with as_subchunks and num_subchunks with array indices #172

Merged
merged 2 commits into from
Feb 15, 2024

Conversation

asmeurer
Copy link
Member

Fixes #170

The ndindices strategy requires short_shapes, or else it does not generate
useful boolean array indices very often.
The logic was not correct for multiple array indices (or multidimensional
boolean array indices).
@asmeurer
Copy link
Member Author

FYI @peytondmurray @ArvidJB, this bug affects versioned-hdf5:

>>> import h5py
>>> import numpy as np
>>> from versioned_hdf5 import VersionedHDF5File
>>> f = h5py.File('test.hdf5', 'w')
>>> file = VersionedHDF5File(f)
>>> with file.stage_version('version1') as vf:
...     data = np.arange(4).reshape((2, 2))
...     d = vf.create_dataset('test', data=data, chunks=(1, 1))
>>> file.close()
>>> f.close()
>>> f = h5py.File('test.hdf5', 'a')
>>> file = VersionedHDF5File(f)
>>> with file.stage_version('version2') as vf:
...     d = vf['test']
...     d[[[False, True], [True, True]]] = -1
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/Users/aaronmeurer/Documents/versioned-hdf5/versioned_hdf5/wrappers.py", line 1228, in __setitem__
    self.dataset.__setitem__(index, value)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/Users/aaronmeurer/Documents/versioned-hdf5/versioned_hdf5/wrappers.py", line 867, in __setitem__
    index = idx.as_subindex(c)
            ^^^^^^^^^^^^^^^^^^
  File "/Users/aaronmeurer/miniconda3/envs/versioned-hdf5/lib/python3.12/site-packages/ndindex/booleanarray.py", line 168, in as_subindex
    return Tuple(*self.array.nonzero()).as_subindex(index)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/aaronmeurer/miniconda3/envs/versioned-hdf5/lib/python3.12/site-packages/ndindex/tuple.py", line 730, in as_subindex
    raise ValueError("Indices do not intersect")
ValueError: Indices do not intersect

With this PR:

>>> import h5py
>>> import numpy as np
>>> from versioned_hdf5 import VersionedHDF5File
>>> f = h5py.File('test.hdf5', 'w')
>>> file = VersionedHDF5File(f)
>>> with file.stage_version('version1') as vf:
...     data = np.arange(4).reshape((2, 2))
...     d = vf.create_dataset('test', data=data, chunks=(1, 1))
>>> file.close()
>>> f.close()
>>> f = h5py.File('test.hdf5', 'a')
>>> file = VersionedHDF5File(f)
>>> with file.stage_version('version2') as vf:
...     d = vf['test']
...     d[[[False, True], [True, True]]] = -1
>>> file['version2']['test'][:]
array([[ 0, -1],
       [-1, -1]])

@asmeurer asmeurer changed the title Fix wrong result with as_subchunks and num_subchunks with integer arrays Fix wrong result with as_subchunks and num_subchunks with array indices Feb 15, 2024
@asmeurer asmeurer merged commit 3c29c71 into Quansight-Labs:main Feb 15, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

as_subchunks test failure
1 participant