-
Notifications
You must be signed in to change notification settings - Fork 128
Open
Description
If i want to do some iteration over all the values in a large key-value store, is there an established pattern for how to do that in tensorstore. My immediate application is for iterating over large scale precomputed annotations. I thought perhaps using KeyRange option on the .list method would enable chunking and therefore create a route toward an efficient iterator... however it takes 5 minutes to run over a large dataset even with a pretty restrictive range (see example below).
import tensorstore as ts
import os
source= "gs://neuroglancer-20191211_fafbv14_buhmann2019_li20190805"
ts_spec = {
"driver": "json",
"kvstore": os.path.join(source, "info"),
}
info_ts = ts.open(ts_spec).result()
info = info_ts.read().result().item()
by_id_info = info["by_id"]
ts_spec = {
"base": os.path.join(source, by_id_info['key'])
}
if "sharding" in by_id_info.keys():
ts_spec["driver"] = "neuroglancer_uint64_sharded"
ts_spec["metadata"] = by_id_info["sharding"]
else:
ts_spec["driver"] = "neuroglancer_precomputed"
ts_by_id = ts.KvStore.open(ts_spec).result()
start_bytes = np.ascontiguousarray(17317160-5000, dtype=">u8").tobytes()
end_bytes = np.ascontiguousarray(17317160+5000, dtype=">u8").tobytes()
key_range = ts.KvStore.KeyRange(
inclusive_min=start_bytes, exclusive_max=end_bytes
)
keys=ts_by_id.list(range=key_range).result()
Metadata
Metadata
Assignees
Labels
No labels