Experiment with stacking for kerchunk #38

jsignell · 2024-03-11T20:03:11Z

This idea came out of a comment here: #34 (comment)

Conceptually it seems like it should be possible to read and stack kerchunk and zarr data contained in an item's assets or a list of item's assets. Not sure if this is the most elegant way 🤷

import pystac
import xarray as xr

url_1 = "https://gist.githubusercontent.com/clausmichele/28efa0007731044db3a7752da2164fe0/raw/1cba235038f0aa20e16675a863224a4f3ab79e4a/CERRA-20010101000000_20011231000000.json"
url_2 = "https://gist.githubusercontent.com/clausmichele/6b78a70ef153c4c841401ec0b7d2b75f/raw/e0d2f307b1f8caef7ec19ae68b8100fb7d5f25dd/CERRA-20020101000000_20021231000000.json"

item_1 = pystac.read_file(url_1)
item_2 = pystac.read_file(url_2)
items = [item_1, item_2]

# these items don't specify the media_type and role that xpystac uses to assert that
# an asset refers to a kerchunk reference file. So first tidy that up.
for item in items:
    for asset in item.assets.values():
        if asset.href.endswith(".json"):
            asset.media_type = "application/json"
            asset.roles = ["index"]

data = xr.open_dataset(items, engine="stac", stacking_library="xpystac", chunks={})

clausmichele · 2024-03-18T15:27:17Z

@jsignell you can use these new version of the Items, with the correct media type and roles set to index:


url_1 = "https://gist.githubusercontent.com/clausmichele/b101fcf12f17c746b2c5db57ef43a650/raw/bd7c2c2d25a328d01b316ec9bbab2c7503c0e343/CERRA-20010101000000_20011231000000_2.json"
url_2 = "https://gist.githubusercontent.com/clausmichele/b101fcf12f17c746b2c5db57ef43a650/raw/bd7c2c2d25a328d01b316ec9bbab2c7503c0e343/CERRA-20020101000000_20021231000000_2.json"

jsignell · 2024-04-05T15:50:30Z

Nice! Yeah it works well with those versions:

import pystac
import xarray as xr


url_1 = "https://gist.githubusercontent.com/clausmichele/b101fcf12f17c746b2c5db57ef43a650/raw/bd7c2c2d25a328d01b316ec9bbab2c7503c0e343/CERRA-20010101000000_20011231000000_2.json"
url_2 = "https://gist.githubusercontent.com/clausmichele/b101fcf12f17c746b2c5db57ef43a650/raw/bd7c2c2d25a328d01b316ec9bbab2c7503c0e343/CERRA-20020101000000_20021231000000_2.json"

item_1 = pystac.read_file(url_1)
item_2 = pystac.read_file(url_2)
items = [item_1, item_2]

data = xr.open_dataset(items, engine="stac", stacking_library="xpystac", chunks={})
data

Since it's purely additive I don't see the harm in merging this once I write up some tests.

Experiment with native stacking for kerchunk

b1688a7

Merge branch 'main' into js/kerchunk-stacking

40f381b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment with stacking for kerchunk #38

Experiment with stacking for kerchunk #38

jsignell commented Mar 11, 2024

clausmichele commented Mar 18, 2024

jsignell commented Apr 5, 2024

Experiment with stacking for kerchunk #38

Are you sure you want to change the base?

Experiment with stacking for kerchunk #38

Conversation

jsignell commented Mar 11, 2024

clausmichele commented Mar 18, 2024

jsignell commented Apr 5, 2024