Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Crawler #1

Open
m-mohr opened this issue Aug 24, 2020 · 1 comment
Open

Add Crawler #1

m-mohr opened this issue Aug 24, 2020 · 1 comment
Assignees

Comments

@m-mohr
Copy link
Contributor

m-mohr commented Aug 24, 2020

STAC Index is planned to crawl all collections from STAC static catalogs and APIs.

We plan to use PySTAC for it as it allows migrating from 0.8 and 0.9 to 1.0 with ease, validates data and it's planned to give us an easy way to get all collections after stac-utils/pystac#169 has been implemented.

This also requires us to migrate to MongoDB, which is mostly compatible to nedb, but will need some minor changes (e.g. check timestamps, check case-insensitive sort, add schema, ...)

@m-mohr m-mohr changed the title Add crawler Add Crawler Aug 24, 2020
@m-mohr m-mohr self-assigned this Dec 31, 2020
@m-mohr
Copy link
Contributor Author

m-mohr commented Jan 7, 2021

Some ideas for faster crawling:

  • Don't crawl all items for APIs, instead use the API to query for specific data, but still crawl static catalogs.
  • Don't get an item per catalog, but an item per (1) root catalog and (2) per collection. Fewer items for a first run. All remaining items can be crawled later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant