Skip to content

efficient interation over key-value store #224

@fcollman

Description

@fcollman

If i want to do some iteration over all the values in a large key-value store, is there an established pattern for how to do that in tensorstore. My immediate application is for iterating over large scale precomputed annotations. I thought perhaps using KeyRange option on the .list method would enable chunking and therefore create a route toward an efficient iterator... however it takes 5 minutes to run over a large dataset even with a pretty restrictive range (see example below).

import tensorstore as ts
import os

source= "gs://neuroglancer-20191211_fafbv14_buhmann2019_li20190805"
ts_spec = {
            "driver": "json",
            "kvstore": os.path.join(source, "info"),
        }
info_ts = ts.open(ts_spec).result()
info = info_ts.read().result().item()
by_id_info = info["by_id"]
ts_spec = {
    "base": os.path.join(source, by_id_info['key'])
}
if "sharding" in by_id_info.keys():
    ts_spec["driver"] = "neuroglancer_uint64_sharded"
    ts_spec["metadata"] = by_id_info["sharding"]
else:
    ts_spec["driver"] = "neuroglancer_precomputed"

ts_by_id = ts.KvStore.open(ts_spec).result()

start_bytes = np.ascontiguousarray(17317160-5000, dtype=">u8").tobytes()
end_bytes = np.ascontiguousarray(17317160+5000, dtype=">u8").tobytes()

key_range = ts.KvStore.KeyRange(
                inclusive_min=start_bytes, exclusive_max=end_bytes
            )
keys=ts_by_id.list(range=key_range).result()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions