Skip to content

Add asynchronous load method #10327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 75 commits into
base: main
Choose a base branch
from
Draft

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented May 16, 2025

Adds an .async_load() method to Variable, which works by plumbing async get_duck_array all the way down until it finally gets to the async methods zarr v3 exposes.

Needs a lot of refactoring before it could be merged, but it works.

API:

  • Variable.load_async
  • DataArray.load_async
  • Dataset.load_async
  • DataTree.load_async
  • load_dataset?
  • load_dataarray?

@@ -490,6 +490,23 @@ def test_sub_array(self) -> None:
assert isinstance(child.array, indexing.NumpyIndexingAdapter)
assert isinstance(wrapped.array, indexing.LazilyIndexedArray)

async def test_async_wrapper(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added new tests.

[
("sel", {"x": 2}),
("sel", {"x": [2, 3]}),
("sel", {"x": slice(2, 4)}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new test

print("inside LazilyVectorizedIndexedArray.async_get_duck_array")
from xarray.backends.common import BackendArray

if isinstance(self.array, BackendArray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a lot cleaner. In my previous refactor, I was trying hard to not depend on BackendArray but that's unavoidable now for async stuff AFAICT

def get_duck_array():
raise NotImplementedError

async def async_get_duck_array():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80/20 on this. Alternatively, we special case IndexingAdapter inside async_get_duck_array which seems worse to me.

Comment on lines +160 to +162
if isinstance(data, IndexingAdapter):
# These wrap in-memory arrays, and async isn't needed
return data.get_duck_array()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be removed now that I added async_get_duck_array to the base class

Suggested change
if isinstance(data, IndexingAdapter):
# These wrap in-memory arrays, and async isn't needed
return data.get_duck_array()

chunks=(5, 5),
dtype="f4",
dimension_names=["x", "y"],
attributes={"add_offset": 1, "scale_factor": 2},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very important: now we test the decoding infra

duck_array = await self.array.async_get_duck_array()
# ensure the array object is cached in-memory
self.array = as_indexable(duck_array)
return duck_array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need a deep copy here to match previous behavior

@TomNicholas
Copy link
Member Author

TomNicholas commented May 30, 2025

Notes to self:

  • Try to consolidate indexing tests with those in test_variable.py, potentially by defining a subclass of Variable that only implements async methods
  • Use create_test_data, write to a zarr (memory)store, and open lazily - this will help test decoding machinery.
  • Raise informative error if you try to do o/v-indexing with a version of zarr that's too old? Or just fall back to blocking in that case...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration tools dependencies Pull requests that update a dependency file enhancement io topic-backends topic-documentation topic-indexing topic-NamedArray Lightweight version of Variable topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an asynchronous load method?
4 participants