Skip to content

Conversation

RFLeijenaar
Copy link

Following #3433 (@d-v-b), I have implemented a registry for chunk key encodings. This allows users to subclass ChunkKeyEncoding and create their own implementation.

The scope of ChunkKeyEncoding.from_dict() function is reduced to what you expect it do: build from a dict. I placed the parsing function in chunk_key_encodings.py as it is used in both array and metadata v3 construction. I wasn't sure where else to put it.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Sep 5, 2025
Copy link

codecov bot commented Sep 5, 2025

Codecov Report

❌ Patch coverage is 83.01887% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.86%. Comparing base (3d0e40e) to head (fe11bf8).

Files with missing lines Patch % Lines
src/zarr/registry.py 65.38% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3436      +/-   ##
==========================================
- Coverage   94.92%   94.86%   -0.06%     
==========================================
  Files          79       79              
  Lines        9491     9518      +27     
==========================================
+ Hits         9009     9029      +20     
- Misses        482      489       +7     
Files with missing lines Coverage Δ
src/zarr/core/array.py 97.44% <100.00%> (-0.01%) ⬇️
src/zarr/core/chunk_key_encodings.py 89.70% <100.00%> (+3.55%) ⬆️
src/zarr/core/config.py 83.33% <ø> (ø)
src/zarr/core/metadata/v3.py 90.15% <100.00%> (ø)
src/zarr/registry.py 85.20% <65.38%> (-3.61%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@RFLeijenaar
Copy link
Author

RFLeijenaar commented Sep 5, 2025

Note that the current implementation does not use the global config. This means that it only supports a single implementation (registration) for each chunk key encoding 'type' as indicated by the name field. Like codecs (parse_codecs), it uses the name field to retrieve the correct class from the registry. This does mean that the chunk key encoding should always be registered with a qualname that matches the name field.

With the entrypoint plugin mechanism this currently doesn't work correctly. It uses the full class path rather than the entrypoint name e.name for the key.

I will update the registry to bring it in line with how codecs are implemented, with a dict of registries, that map a chunk key encoding type (name) to a registry that contains possibly multiple implementations for that type.

its implementation indicated by qualname.

Set default chunk key encodings implementations for `default` and `v2`
in global config.
@d-v-b
Copy link
Contributor

d-v-b commented Sep 5, 2025

I will update the registry to bring it in line with how codecs are implemented, with a dict of registries, that map a chunk key encoding type (name) to a registry that contains possibly multiple implementations for that type.

You don't need to copy the codec registry. A simple {name: class} mapping is fine for chunk key encodings.

@d-v-b
Copy link
Contributor

d-v-b commented Sep 5, 2025

for context, the codec registry associates multiple codec classes with a single codec identifier because of need specific to codecs (running the same codec algorithm on a CPU vs GPU). Chunk key encodings will never be run on specialized hardware, so we can use a simpler mapping.

@RFLeijenaar
Copy link
Author

for context, the codec registry associates multiple codec classes with a single codec identifier because of need specific to codecs (running the same codec algorithm on a CPU vs GPU). Chunk key encodings will never be run on specialized hardware, so we can use a simpler mapping.

Originally, I implemented it such that it would only allow one implementation, but there were some issues as noted in my other comment. That said, these can be resolved by always using the name field as the key, and for any entrypoint e setting the key to e.name.

Regarding different implementations, it doesn't need to be on special hardware, right? One could also make their own implementation, possibly with bindings in a faster language. Not that I think that this would be common for chunk key encodings.

Let me know if you want me to revert to the simpeler, one implementation only design.

This enables users to add additional fields to a custom ChunkKeyEncoding
without having to override __init__ and taking care of immutability of
the attrs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs release notes Automatically applied to PRs which haven't added release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants