-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add asynchronous load method #10327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add asynchronous load method #10327
Conversation
for more information, see https://pre-commit.ci
@@ -490,6 +490,23 @@ def test_sub_array(self) -> None: | |||
assert isinstance(child.array, indexing.NumpyIndexingAdapter) | |||
assert isinstance(wrapped.array, indexing.LazilyIndexedArray) | |||
|
|||
async def test_async_wrapper(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added new tests.
xarray/tests/test_async.py
Outdated
[ | ||
("sel", {"x": 2}), | ||
("sel", {"x": [2, 3]}), | ||
("sel", {"x": slice(2, 4)}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new test
print("inside LazilyVectorizedIndexedArray.async_get_duck_array") | ||
from xarray.backends.common import BackendArray | ||
|
||
if isinstance(self.array, BackendArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a lot cleaner. In my previous refactor, I was trying hard to not depend on BackendArray but that's unavoidable now for async stuff AFAICT
xarray/core/indexing.py
Outdated
def get_duck_array(): | ||
raise NotImplementedError | ||
|
||
async def async_get_duck_array(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
80/20 on this. Alternatively, we special case IndexingAdapter inside async_get_duck_array
which seems worse to me.
if isinstance(data, IndexingAdapter): | ||
# These wrap in-memory arrays, and async isn't needed | ||
return data.get_duck_array() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be removed now that I added async_get_duck_array
to the base class
if isinstance(data, IndexingAdapter): | |
# These wrap in-memory arrays, and async isn't needed | |
return data.get_duck_array() |
chunks=(5, 5), | ||
dtype="f4", | ||
dimension_names=["x", "y"], | ||
attributes={"add_offset": 1, "scale_factor": 2}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very important: now we test the decoding infra
duck_array = await self.array.async_get_duck_array() | ||
# ensure the array object is cached in-memory | ||
self.array = as_indexable(duck_array) | ||
return duck_array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might need a deep copy here to match previous behavior
Notes to self:
|
for more information, see https://pre-commit.ci
Adds an
.async_load()
method toVariable
, which works by plumbing asyncget_duck_array
all the way down until it finally gets to the async methods zarr v3 exposes.Needs a lot of refactoring before it could be merged, but it works.
whats-new.rst
api.rst
API:
Variable.load_async
DataArray.load_async
Dataset.load_async
DataTree.load_async
load_dataset
?load_dataarray
?