Skip to content

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented Aug 11, 2025

This PR implements support for the ZEP 8 URL syntax in Zarr Python.

Some examples of what now works:

import zarr

root = zarr.open_group('s3://bucket/data.zip|zip:|zarr3:')  # S3 → ZIP → Zarr v3
arr = zarr.create_array('memory:|zarr2:group/array', shape=(10, ), dtype='i4')  # Memory → Zarr v2

# custom adapter for icechunk
ds = xr.open_zarr('s3://icechunk-public-data/v1/glad|icechunk:')  # icechunk (from xarray)

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

closes #2943
fixes #2831
xref: zarr-developers/zeps#48

cc @jbms

- Add comprehensive ZEP 8 URL parsing and resolution system
- Implement StoreAdapter ABC for extensible storage adapters
- Add built-in adapters for file, memory, S3, GCS, HTTPS schemes
- Support pipe-chained URLs like s3://bucket/data.zip|zip:|zarr3:
- Add URLSegment parsing with validation
- Integrate with zarr.open_group and zarr.open_array APIs
- Include demo script and comprehensive test suite
- Pass all existing tests + 35 new ZEP 8-specific tests
@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Aug 11, 2025
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Aug 24, 2025

@classmethod
def get_supported_schemes(cls) -> list[str]:
return ["s3"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically, "s3a" has also existed. I think it was a spark/hadoop implementation detail (i.e,. another codebase/driver for the same storage backend), so it might not be relevant any more, but I thought I would mention it.

https://stackoverflow.com/a/33356421/3821154

adapter_name = "gcs"

@classmethod
def get_supported_schemes(cls) -> list[str]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better, I think, to avoid supporting "gcs" in addition to "gs" since that will create implementation divergence.

return ["http", "https"]


class S3Adapter(RemoteAdapter):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neuroglancer also supports "s3+http://endpoint/path" and "s3+https://endpoint/path" syntax to allow custom s3 endpoints (rather than AWS) to be used. The plain "s3://" syntax always refers to the real AWS endpoints, not a custom s3-compatible server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support ZEP 8 URL Syntax Can't conveniently open zip store from path with zarr v3
3 participants