Skip to content

Conversation

K-Meech
Copy link
Contributor

@K-Meech K-Meech commented Aug 26, 2025

Closes #2582

Prevents creation of arrays / groups under a parent array with .create_array and .create_group e.g. root.create_array(name='foo/bar') (where foo is an existing array)

This required changes to the _save_metadata function of both zarr/core/array.py and zarr/core/group.py. As both used pretty much identical code, I refactored this into a common function in zarr/core/metadata/io.py (along with the _build_parents function both relied upon). Happy to move this elsewhere - if there is a more suitable location for it!

I tried to avoid looping over the parents multiple times in _save_metadata for the sake of performance (potentially this path could be deeply nested). Hence looping once, and creating two sets of awaitables: one for checking if an array exists at the location + one to actually modify the metadata there. Again, happy to update this if there's a simpler solution.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Aug 26, 2025
Copy link

codecov bot commented Aug 26, 2025

Codecov Report

❌ Patch coverage is 94.23077% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.72%. Comparing base (c9509ee) to head (8541cc8).

Files with missing lines Patch % Lines
src/zarr/storage/_common.py 76.92% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3407      +/-   ##
==========================================
+ Coverage   94.70%   94.72%   +0.01%     
==========================================
  Files          79       80       +1     
  Lines        9532     9548      +16     
==========================================
+ Hits         9027     9044      +17     
+ Misses        505      504       -1     
Files with missing lines Coverage Δ
src/zarr/core/array.py 97.39% <100.00%> (-0.05%) ⬇️
src/zarr/core/group.py 95.01% <100.00%> (-0.03%) ⬇️
src/zarr/core/metadata/io.py 100.00% <100.00%> (ø)
src/zarr/storage/_common.py 93.12% <76.92%> (+0.68%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@d-v-b
Copy link
Contributor

d-v-b commented Aug 26, 2025

conceptually I don't think we want a zarr.core.metadata.io module. So far, the zarr.core.metadata module has been exclusively for definitions of the metadata classes, which is distinct from the routines for saving metadata documents to disk.

right now I think a better location would be zarr.core.group -- would that be possible?

@K-Meech
Copy link
Contributor Author

K-Meech commented Aug 26, 2025

Thanks @d-v-b - I think the only issue with this is it may introduce a circular import? zarr.core.group already imports multiple functions from zarr.core.array, and zarr.core.array would need to import these common metadata saving functions from zarr.core.group.

I could move those functions into zarr.core.array instead? _build_parents used to be in that file, and was imported by both zarr.core.array and zarr.core.group (although only by putting an import inside that function to avoid another circular import 😅 )

@d-v-b
Copy link
Contributor

d-v-b commented Aug 26, 2025

very good points @K-Meech. Now I do think we should have zarr.core.metadata.io. But I don't think it should depend on AsyncArray or AsyncGroup. Looking at _build_parents, all it does is save metadata. I don't think this class needs to use AsyncGroup for that, we can just write the metadata out directly without that class.

In fact, we might not need the metadata-saving logic in _build_parents, since it's being called by another routine that's already creating metadata (save_metadata). IMO _build_parents should just decide where metadata documents need to be created, and then return a {name: GroupMetadata} dict with the names of the groups that need to be created by the calling function (e.g., save_metadata).

I suspect all of this could be handled by create_hierarchy. In any case, that function should also be moved to zarr.core.metadata.io (i'm happy to do this in another PR if you don't want to)

edit: I mistakenly linked to the create_hierarchy method defined on the AsyncGroup class. I meant to link to the stand-alone function:

async def create_hierarchy(

@K-Meech
Copy link
Contributor Author

K-Meech commented Aug 26, 2025

Thanks @d-v-b - sounds like a good plan. I'll look into removing the dependency on AsyncArray + AsyncGroup.
I'm away for the next few weeks, but can look into this + any other comments left on the PR in the meantime once I'm back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs release notes Automatically applied to PRs which haven't added release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Silent failure when creating an array where there is an existing node
2 participants