Description
What happened?
I'm getting a traceback in the documentation example for opening a local reference file with engine='kerchunk'
https://docs.xarray.dev/en/stable/user-guide/io.html#kerchunk
What did you expect to happen?
Expecting output as in the docs when this section was introduced (https://docs.xarray.dev/en/v2024.09.0/user-guide/io.html#kerchunk):
ds1
<xarray.Dataset> Size: 264B
Dimensions: (x: 4, y: 5)
Coordinates:
* x (x) int64 32B 10 20 30 40
* y (y) datetime64[ns] 40B 2000-01-01 2000-01-02 ... 2000-01-05
z (x) object 32B ...
Data variables:
foo (x, y) float64 160B ...
Minimal Complete Verifiable Example
import xarray as xr
import numpy as np
import pandas as pd
ds = xr.Dataset(
{"foo": (("x", "y"), np.random.rand(4, 5))},
coords={
"x": [10, 20, 30, 40],
"y": pd.date_range("2000-01-01", periods=5),
"z": ("x", list("abcd")),
},
)
ds.to_netcdf("saved_on_disk.nc")
storage_options = {
"target_protocol": "file",
}
ds1 = xr.open_dataset(
"./combined.json",
engine="kerchunk",
storage_options=storage_options,
)
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[6], line 4
1 storage_options = {
2 "target_protocol": "file",
3 }
----> 4 ds1 = xr.open_dataset(
5 "./combined.json",
6 engine="kerchunk",
7 storage_options=storage_options,
8 )
File ~/GitHub/xarray/xarray/xarray/backends/api.py:687, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
675 decoders = _resolve_decoders_kwargs(
676 decode_cf,
677 open_backend_dataset_parameters=backend.open_dataset_parameters,
(...) 683 decode_coords=decode_coords,
684 )
686 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 687 backend_ds = backend.open_dataset(
688 filename_or_obj,
689 drop_variables=drop_variables,
690 **decoders,
691 **kwargs,
692 )
693 ds = _dataset_from_backend_dataset(
694 backend_ds,
695 filename_or_obj,
(...) 705 **kwargs,
706 )
707 return ds
File ~/miniforge3/envs/xarray-docs/lib/python3.13/site-packages/kerchunk/xarray_backend.py:13, in KerchunkBackend.open_dataset(self, filename_or_obj, storage_options, open_dataset_options, **kw)
9 def open_dataset(
10 self, filename_or_obj, *, storage_options=None, open_dataset_options=None, **kw
11 ):
12 open_dataset_options = (open_dataset_options or {}) | kw
---> 13 ref_ds = open_reference_dataset(
14 filename_or_obj,
15 storage_options=storage_options,
16 open_dataset_options=open_dataset_options,
17 )
18 return ref_ds
File ~/miniforge3/envs/xarray-docs/lib/python3.13/site-packages/kerchunk/xarray_backend.py:45, in open_reference_dataset(filename_or_obj, storage_options, open_dataset_options)
42 if open_dataset_options is None:
43 open_dataset_options = {}
---> 45 store = refs_as_store(filename_or_obj, **storage_options)
47 return xr.open_zarr(
48 store, zarr_format=2, consolidated=False, **open_dataset_options
49 )
TypeError: refs_as_store() got an unexpected keyword argument 'target_protocol'
Anything else we need to know?
Came across this working on #10383
Maybe @martindurant can shed some light on the traceback and best solution
Note combined.json looks like this
{
"version": 1,
"refs": {
".zgroup": "{\"zarr_format\":2}",
"foo/.zarray": "{\"chunks\":[4,5],\"compressor\":null,\"dtype\":\"<f8\",\"fill_value\":\"NaN\",\"filters\":null,\"order\":\"C\",\"shape\":[4,5],\"zarr_format\":2}",
"foo/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"x\",\"y\"],\"coordinates\":\"z\"}",
"foo/0.0": ["saved_on_disk.h5", 8192, 160],
"x/.zarray": "{\"chunks\":[4],\"compressor\":null,\"dtype\":\"<i8\",\"fill_value\":null,\"filters\":null,\"order\":\"C\",\"shape\":[4],\"zarr_format\":2}",
"x/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"x\"]}",
"x/0": ["saved_on_disk.h5", 8352, 32],
"y/.zarray": "{\"chunks\":[5],\"compressor\":null,\"dtype\":\"<i8\",\"fill_value\":null,\"filters\":null,\"order\":\"C\",\"shape\":[5],\"zarr_format\":2}",
"y/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"y\"],\"calendar\":\"proleptic_gregorian\",\"units\":\"days since 2000-01-01 00:00:00\"}",
"y/0": ["saved_on_disk.h5", 8384, 40],
"z/.zarray": "{\"chunks\":[4],\"compressor\":null,\"dtype\":\"|O\",\"fill_value\":null,\"filters\":[{\"allow_nan\":true,\"check_circular\":true,\"encoding\":\"utf-8\",\"ensure_ascii\":true,\"id\":\"json2\",\"indent\":null,\"separators\":[\",\",\":\"],\"skipkeys\":false,\"sort_keys\":true,\"strict\":true}],\"order\":\"C\",\"shape\":[4],\"zarr_format\":2}",
"z/0": "[\"a\",\"b\",\"c\",\"d\",\"|O\",[4]]",
"z/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"x\"]}"
}
}
Environment
INSTALLED VERSIONS
commit: None
python: 3.13.3 | packaged by conda-forge | (main, Apr 14 2025, 20:44:30) [Clang 18.1.8 ]
python-bits: 64
OS: Darwin
OS-release: 24.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: 1.14.4
libnetcdf: 4.9.2
xarray: 2024.5.1.dev524+g0c254da2.d20250421
pandas: 2.2.3
numpy: 2.2.5
scipy: 1.15.2
netCDF4: 1.7.2
pydap: None
h5netcdf: 1.6.1
h5py: 3.12.1
zarr: 3.0.8
cftime: 1.6.4
nc_time_axis: None
iris: 3.12.0
bottleneck: 1.4.2
dask: 2025.3.0
distributed: 2025.3.0
matplotlib: 3.10.1
cartopy: 0.24.0
seaborn: 0.13.2
numbagg: None
fsspec: 2025.5.1
cupy: None
pint: None
sparse: 0.16.0
flox: None
numpy_groupies: None
setuptools: 78.1.1
pip: 25.0.1
conda: None
pytest: None
mypy: None
IPython: 9.1.0
sphinx: 8.2.3