-
-
Notifications
You must be signed in to change notification settings - Fork 373
Add methods for getting bytes + json to store abc #3638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
d-v-b
wants to merge
13
commits into
zarr-developers:main
Choose a base branch
from
d-v-b:feat/get_json
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
021bd44
add store routines for getting bytes and json
d-v-b 61cf6d0
Merge branch 'main' into feat/get_json
d-v-b 7d26b8e
check for FileNotFoundError when a key is missing
d-v-b 971c3e4
remove storepath methods
d-v-b b7f7e38
Merge branch 'feat/get_json' of https://github.com/d-v-b/zarr-python …
d-v-b d70a5e5
changelog
d-v-b a213058
rename methods
d-v-b 38ff517
continue renaming / test refactoring
d-v-b bdc4ef8
refactor new test functions
d-v-b e8ca484
Merge branch 'main' into feat/get_json
d-v-b 0a97eb4
Merge branch 'main' into feat/get_json
d-v-b 5d2018d
Merge branch 'main' into feat/get_json
d-v-b bf5e635
Merge branch 'main' into feat/get_json
d-v-b File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Add methods for reading stored objects as bytes and JSON-decoded bytes to store classes. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,11 +1,14 @@ | ||
| from __future__ import annotations | ||
|
|
||
| import asyncio | ||
| import json | ||
| from abc import ABC, abstractmethod | ||
| from asyncio import gather | ||
| from dataclasses import dataclass | ||
| from itertools import starmap | ||
| from typing import TYPE_CHECKING, Literal, Protocol, runtime_checkable | ||
|
|
||
| from zarr.core.sync import sync | ||
|
|
||
| if TYPE_CHECKING: | ||
| from collections.abc import AsyncGenerator, AsyncIterator, Iterable | ||
| from types import TracebackType | ||
|
|
@@ -206,6 +209,211 @@ async def get( | |
| """ | ||
| ... | ||
|
|
||
| async def get_bytes( | ||
| self, key: str, *, prototype: BufferPrototype, byte_range: ByteRequest | None = None | ||
| ) -> bytes: | ||
| """ | ||
| Retrieve raw bytes from the store asynchronously. | ||
|
|
||
| This is a convenience method that wraps ``get()`` and converts the result | ||
| to bytes. Use this when you need the raw byte content of a stored value. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| key : str | ||
| The key identifying the data to retrieve. | ||
| prototype : BufferPrototype | ||
| The buffer prototype to use for reading the data. | ||
| byte_range : ByteRequest, optional | ||
| If specified, only retrieve a portion of the stored data. | ||
| Can be a ``RangeByteRequest``, ``OffsetByteRequest``, or ``SuffixByteRequest``. | ||
|
|
||
| Returns | ||
| ------- | ||
| bytes | ||
| The raw bytes stored at the given key. | ||
|
|
||
| Raises | ||
| ------ | ||
| FileNotFoundError | ||
| If the key does not exist in the store. | ||
|
|
||
| See Also | ||
| -------- | ||
| get : Lower-level method that returns a Buffer object. | ||
| get_bytes : Synchronous version of this method. | ||
| get_json : Asynchronous method for retrieving and parsing JSON data. | ||
|
|
||
| Examples | ||
| -------- | ||
| >>> store = await MemoryStore.open() | ||
| >>> await store.set("data", Buffer.from_bytes(b"hello world")) | ||
| >>> data = await store.get_bytes("data", prototype=default_buffer_prototype()) | ||
| >>> print(data) | ||
| b'hello world' | ||
| """ | ||
| buffer = await self.get(key, prototype, byte_range) | ||
| if buffer is None: | ||
| raise FileNotFoundError(key) | ||
| return buffer.to_bytes() | ||
|
|
||
| def get_bytes_sync( | ||
| self, key: str = "", *, prototype: BufferPrototype, byte_range: ByteRequest | None = None | ||
| ) -> bytes: | ||
| """ | ||
| Retrieve raw bytes from the store synchronously. | ||
|
|
||
| This is a synchronous wrapper around ``get_bytes()``. It should only | ||
| be called from non-async code. For async contexts, use ``get_bytes()`` | ||
| instead. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| key : str, optional | ||
| The key identifying the data to retrieve. Defaults to an empty string. | ||
| prototype : BufferPrototype | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We'll want to align this with what we do in the async version. |
||
| The buffer prototype to use for reading the data. | ||
| byte_range : ByteRequest, optional | ||
| If specified, only retrieve a portion of the stored data. | ||
| Can be a ``RangeByteRequest``, ``OffsetByteRequest``, or ``SuffixByteRequest``. | ||
|
|
||
| Returns | ||
| ------- | ||
| bytes | ||
| The raw bytes stored at the given key. | ||
|
|
||
| Raises | ||
| ------ | ||
| FileNotFoundError | ||
| If the key does not exist in the store. | ||
|
|
||
| Warnings | ||
| -------- | ||
| Do not call this method from async functions. Use ``get_bytes()`` instead | ||
| to avoid blocking the event loop. | ||
|
|
||
| See Also | ||
| -------- | ||
| get_bytes : Asynchronous version of this method. | ||
| get_json_sync : Synchronous method for retrieving and parsing JSON data. | ||
|
|
||
| Examples | ||
| -------- | ||
| >>> store = MemoryStore() | ||
| >>> await store.set("data", Buffer.from_bytes(b"hello world")) | ||
| >>> data = store.get_bytes_sync("data", prototype=default_buffer_prototype()) | ||
| >>> print(data) | ||
| b'hello world' | ||
| """ | ||
|
|
||
| return sync(self.get_bytes(key, prototype=prototype, byte_range=byte_range)) | ||
|
|
||
| async def get_json( | ||
| self, key: str, *, prototype: BufferPrototype, byte_range: ByteRequest | None = None | ||
| ) -> Any: | ||
| """ | ||
| Retrieve and parse JSON data from the store asynchronously. | ||
|
|
||
| This is a convenience method that retrieves bytes from the store and | ||
| parses them as JSON. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| key : str | ||
| The key identifying the JSON data to retrieve. | ||
| prototype : BufferPrototype | ||
| The buffer prototype to use for reading the data. | ||
| byte_range : ByteRequest, optional | ||
| If specified, only retrieve a portion of the stored data. | ||
| Can be a ``RangeByteRequest``, ``OffsetByteRequest``, or ``SuffixByteRequest``. | ||
| Note: Using byte ranges with JSON may result in invalid JSON. | ||
|
|
||
| Returns | ||
| ------- | ||
| Any | ||
| The parsed JSON data. This follows the behavior of ``json.loads()`` and | ||
| can be any JSON-serializable type: dict, list, str, int, float, bool, or None. | ||
|
|
||
| Raises | ||
| ------ | ||
| FileNotFoundError | ||
| If the key does not exist in the store. | ||
| json.JSONDecodeError | ||
| If the stored data is not valid JSON. | ||
|
|
||
| See Also | ||
| -------- | ||
| get_bytes : Method for retrieving raw bytes. | ||
| get_json_sync : Synchronous version of this method. | ||
|
|
||
| Examples | ||
| -------- | ||
| >>> store = await MemoryStore.open() | ||
| >>> metadata = {"zarr_format": 3, "node_type": "array"} | ||
| >>> await store.set("zarr.json", Buffer.from_bytes(json.dumps(metadata).encode())) | ||
| >>> data = await store.get_json("zarr.json", prototype=default_buffer_prototype()) | ||
| >>> print(data) | ||
| {'zarr_format': 3, 'node_type': 'array'} | ||
| """ | ||
|
|
||
| return json.loads(await self.get_bytes(key, prototype=prototype, byte_range=byte_range)) | ||
|
|
||
| def get_json_sync( | ||
| self, key: str = "", *, prototype: BufferPrototype, byte_range: ByteRequest | None = None | ||
| ) -> Any: | ||
| """ | ||
| Retrieve and parse JSON data from the store synchronously. | ||
|
|
||
| This is a synchronous wrapper around ``get_json()``. It should only | ||
| be called from non-async code. For async contexts, use ``get_json()`` | ||
| instead. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| key : str, optional | ||
| The key identifying the JSON data to retrieve. Defaults to an empty string. | ||
| prototype : BufferPrototype | ||
| The buffer prototype to use for reading the data. | ||
| byte_range : ByteRequest, optional | ||
| If specified, only retrieve a portion of the stored data. | ||
| Can be a ``RangeByteRequest``, ``OffsetByteRequest``, or ``SuffixByteRequest``. | ||
| Note: Using byte ranges with JSON may result in invalid JSON. | ||
|
|
||
| Returns | ||
| ------- | ||
| Any | ||
| The parsed JSON data. This follows the behavior of ``json.loads()`` and | ||
| can be any JSON-serializable type: dict, list, str, int, float, bool, or None. | ||
|
|
||
| Raises | ||
| ------ | ||
| FileNotFoundError | ||
| If the key does not exist in the store. | ||
| json.JSONDecodeError | ||
| If the stored data is not valid JSON. | ||
|
|
||
| Warnings | ||
| -------- | ||
| Do not call this method from async functions. Use ``get_json()`` instead | ||
| to avoid blocking the event loop. | ||
|
|
||
| See Also | ||
| -------- | ||
| get_json : Asynchronous version of this method. | ||
| get_bytes_sync : Synchronous method for retrieving raw bytes without parsing. | ||
|
|
||
| Examples | ||
| -------- | ||
| >>> store = MemoryStore() | ||
| >>> metadata = {"zarr_format": 3, "node_type": "array"} | ||
| >>> store.set("zarr.json", Buffer.from_bytes(json.dumps(metadata).encode())) | ||
| >>> data = store.get_json_sync("zarr.json", prototype=default_buffer_prototype()) | ||
| >>> print(data) | ||
| {'zarr_format': 3, 'node_type': 'array'} | ||
| """ | ||
|
|
||
| return sync(self.get_json(key, prototype=prototype, byte_range=byte_range)) | ||
|
|
||
| @abstractmethod | ||
| async def get_partial_values( | ||
| self, | ||
|
|
@@ -278,7 +486,7 @@ async def _set_many(self, values: Iterable[tuple[str, Buffer]]) -> None: | |
| """ | ||
| Insert multiple (key, value) pairs into storage. | ||
| """ | ||
| await gather(*starmap(self.set, values)) | ||
| await asyncio.gather(*starmap(self.set, values)) | ||
|
|
||
| @property | ||
| def supports_consolidated_metadata(self) -> bool: | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the output of this method is a
bytesobject (CPU memory), I don't see the value of specifying aprototypein this method. can anyone suggest a situation where specifying theprototypewould be useful here?cc @TomAugspurger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with you, that a
prototypeprobably isn't needed. But I'd suggest including it for consistency with the other methods.