Skip to content

Support for DataTree.to_netcdf to write to a file-like object or bytes #10570

@mjwillson

Description

@mjwillson

Is your feature request related to a problem?

I want to write a DataTree as a netcdf file to a remote filesystem, either directly or via writing to in-memory bytes first. Right now this seems to be impossible.

  • datatree.to_netcdf requires a filepath argument (no option for it to return bytes like dataset.to_netcdf())
  • Passing a file-like object (e.g. io.BytesIO()) as the filepath results in ValueError: cannot save to a group with the scipy.io.netcdf backend, I believe it selects the scipy backend because it thinks (incorrectly) that's it's the only one capable of writing to a file-like object, but this backend then can't handle groups that are needed for DataTree
  • datatree.to_netcdf(bytes_io, engine='h5netcdf') doesn't help (same error, specified engine is ignored by _dataset_to_netcdf and scipy selected instead)
  • If I hack around the above (xr.backends.api.WRITEABLE_STORES['scipy'] = xr.backends.api.WRITEABLE_STORES['h5netcdf']), next obstacle is that H5NetCDFStore.open tries to read a magic number from the file-like object even when opening it in write mode: https://github.com/pydata/xarray/blob/main/xarray/backends/h5netcdf_.py#L166C9-L167C65
  • After hacking around that (xarray.backends.h5netcdf_.read_magic_number_from_file = lambda *a,**kw: b"\211HDF\r\n\032\n"), I'm finally able to get datatree.to_netcdf(bytes_io) to work.
  • Even then it still won't work with a file-like object that doesn't implement seek, but I believe this is a fundamental limitation of h5py.

Describe the solution you'd like

datatree.to_netcdf()   # Returns bytes like dataset.to_netcdf()

or

bytes_io = io.BytesIO()
datatree.to_netcdf(bytes_io)

or

with fsspec.open('remote-filesystem://path/to/file.nc', 'wb') as file_like_object:
  datatree.to_netcdf(file_like_object)

(although the latter may not be possible for file-like objects that don't implement seek.)

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions