Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert all ndararys to lists in to_item_collection #3

Closed
TomAugspurger opened this issue Sep 23, 2022 · 4 comments
Closed

Convert all ndararys to lists in to_item_collection #3

TomAugspurger opened this issue Sep 23, 2022 · 4 comments

Comments

@TomAugspurger
Copy link
Collaborator

This currently raises a ValueError:

import planetary_computer
import adlfs
import pystac

collection = pystac.read_file("https://planetarycomputer.microsoft.com/api/stac/v1/collections/aster-l1t")
asset = planetary_computer.sign(collection.assets["geoparquet-items"])

import dask_geopandas

ddf = dask_geopandas.read_parquet(asset.href, storage_options=asset.extra_fields["table:storage_options"])
df = ddf.head()

def fix(x):
    assets = {k: v for k, v in x.items() if v}
    return assets

df["assets"] = df.assets.apply(fix)

import stac_geoparquet
stac_geoparquet.stac_geoparquet.to_item_collection(df)

with

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [106], line 20
     17 df["assets"] = df.assets.apply(fix)
     19 import stac_geoparquet
---> 20 stac_geoparquet.stac_geoparquet.to_item_collection(df)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/stac_geoparquet/stac_geoparquet.py:119, in to_item_collection(df)
    114 for k in datelike:
    115     df2[k] = (
    116         df2[k].dt.strftime("%Y-%m-%dT%H:%M:%S.%fZ").fillna("").replace({"": None})
    117     )
--> 119 return pystac.ItemCollection(
    120     [to_dict(record) for record in df2.to_dict(orient="records")]
    121 )

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pystac/item_collection.py:95, in ItemCollection.__init__(self, items, extra_fields, clone_items)
     92     else:
     93         return pystac.Item.from_dict(item_or_dict, preserve_dict=clone_items)
---> 95 self.items = list(map(map_item, items))
     96 self.extra_fields = extra_fields or {}

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pystac/item_collection.py:93, in ItemCollection.__init__.<locals>.map_item(item_or_dict)
     91     return item_or_dict.clone() if clone_items else item_or_dict
     92 else:
---> 93     return pystac.Item.from_dict(item_or_dict, preserve_dict=clone_items)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pystac/item.py:419, in Item.from_dict(cls, d, href, root, migrate, preserve_dict)
    416 d.pop("type")
    417 d.pop("stac_version")
--> 419 item = cls(
    420     id=id,
    421     geometry=geometry,
    422     bbox=bbox,
    423     datetime=datetime,
    424     properties=properties,
    425     stac_extensions=stac_extensions,
    426     collection=collection_id,
    427     extra_fields=d,
    428     assets={k: Asset.from_dict(v) for k, v in assets.items()},
    429 )
    431 has_self_link = False
    432 for link in links:

File /srv/conda/envs/notebook/lib/python3.10/site-packages/pystac/item.py:113, in Item.__init__(self, id, geometry, bbox, datetime, properties, stac_extensions, href, collection, extra_fields, assets)
    100 def __init__(
    101     self,
    102     id: str,
   (...)
    111     assets: Optional[Dict[str, Asset]] = None,
    112 ):
--> 113     super().__init__(stac_extensions or [])
    115     self.id = id
    116     self.geometry = geometry

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

We should verify that all list-likes objects (including those nested within dicts) are lists and not ndarrays.

@martindurant
Copy link

What's the status here, is there a way to convert these geoparqet files to STAC collections (or each row to items) ?

@TomAugspurger
Copy link
Collaborator Author

TomAugspurger commented Feb 25, 2024

to_item_collection is the function for that. Depending on how the data was written, you might need to convert some ndarrays to python lists.

@martindurant
Copy link

OK, Intake 2 now supports reading from these, including multi-banding; but I don't like the format :) Here is my recursive cleaning method.

@TomAugspurger
Copy link
Collaborator Author

This is mostly closed by #31

In [1]: import pystac_client, stac_geoparquet

In [2]: items = list(pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1").search(collections="aster-l1t", max_items=250).items_as_dicts())

In [3]: df = stac_geoparquet.stac_geoparquet.to_geodataframe(items, dtype_backend="pyarrow")

In [4]: type(stac_geoparquet.to_item_collection(df)[0].to_dict()['stac_extensions'])
Out[4]: list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants