Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache configuration without threads #64

Open
juntyr opened this issue Feb 27, 2025 · 3 comments
Open

Cache configuration without threads #64

juntyr opened this issue Feb 27, 2025 · 3 comments

Comments

@juntyr
Copy link

juntyr commented Feb 27, 2025

In the ESiWACE3 Online Compression Laboratory, we use earthkit-regrid in an environment without threads (Python in the web browser using WebAssembly: Pyodide). At the moment, we need to patch a pinned version of the earthkit-regrid source to change the default cache setting to none, since the user cache requires threads.

However, we would like to move towards a less invasive config patch that would allow us to support any future version of earthkit-regrid.

  1. Is the config file format now sufficiently stable that providing a config file in the expected location that overrides the cache setting would be a forward-compatible solution?

  2. Alternatively, would it be possible to patch earthkit-regrid upstream to fall back to no caching or a new thread-less solution if threads are not available? I could implement such a solution if desired.

  3. Is there any other way this problem could be solved?

@juntyr
Copy link
Author

juntyr commented Mar 9, 2025

@sandorkertesz do you have some ideas?

@sandorkertesz
Copy link
Collaborator

@juntyr , thank you for your question.

earthkit-regrid undergoes heavy refactoring, the modified code is only available in the develop branch with no definite date for a release. The changes will hopefully solve your problems related to the usage of the threaded cache. The first relevant feature is as follows:

  • Full configuration control, just-like in earthkit-data, was added to earthkit-regird. It is called "config" and not "settings" but is the very same thing. It allows turning off the cache completely.

However, please note that if you turn off the cache the interpolation matrix is downloaded every single time you use it!

The possible solutions are as follows:

  1. If you only use a limited set of matrices, I can provide you with the instructions to create a local matrix inventory on disk by subsetting the remote inventory. You can then directly use it in interpolate() and no cache will be involved at all.
  2. The next version of earthkit-regrid will be able to use ECMWF's MIR interpolation system in interpolate() where you would be able to choose between the "matrix" and "mir" interpolators/backends (the concrete interface is not yet finalised). Using MIR would not require the earthkit-regrid cache at all and you would be able to perform any interpolations MIR is capable of (all grids, subareas, projections etc.). The question of course if it would be possible to install MIR in your environment. My colleagues are working on making its Python bindings and binaries/libs (as a wheel) available from PyPI.
  3. If these options do not work for you we can look into making the thread usage in the cache optional. This requires some internal discussions and further considerations.

@juntyr
Copy link
Author

juntyr commented Mar 9, 2025

Thanks for your reply!

I can see in the docs that the under-development version will support overriding the config using environment variables (https://github.com/ecmwf/earthkit-regrid/blob/develop/docs/guide/config.rst) - this would be a good option for me.

I think what I would appreciate is if - once that new version has been released - setting EARTHKIT_REGRID_CACHE_POLICY=off will override the cache policy also in future versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants