Add Crawler #1

m-mohr · 2020-08-24T16:19:44Z

STAC Index is planned to crawl all collections from STAC static catalogs and APIs.

We plan to use PySTAC for it as it allows migrating from 0.8 and 0.9 to 1.0 with ease, validates data and it's planned to give us an easy way to get all collections after stac-utils/pystac#169 has been implemented.

This also requires us to migrate to MongoDB, which is mostly compatible to nedb, but will need some minor changes (e.g. check timestamps, check case-insensitive sort, add schema, ...)

m-mohr · 2021-01-07T19:38:23Z

Some ideas for faster crawling:

Don't crawl all items for APIs, instead use the API to query for specific data, but still crawl static catalogs.
Don't get an item per catalog, but an item per (1) root catalog and (2) per collection. Fewer items for a first run. All remaining items can be crawled later.

m-mohr changed the title ~~Add crawler~~ Add Crawler Aug 24, 2020

m-mohr self-assigned this Dec 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Crawler #1

Add Crawler #1

m-mohr commented Aug 24, 2020 •

edited

Loading

m-mohr commented Jan 7, 2021 •

edited

Loading

Add Crawler #1

Add Crawler #1

Comments

m-mohr commented Aug 24, 2020 • edited Loading

m-mohr commented Jan 7, 2021 • edited Loading

m-mohr commented Aug 24, 2020 •

edited

Loading

m-mohr commented Jan 7, 2021 •

edited

Loading