Improve Regridding performance #17

mpiannucci · 2022-04-28T17:49:09Z

The /image/tile api uses xESMF for regridding on the fly. This means that the weights are recomputed every single time the /tile api is called which is absolutely terrible across the board.

Initial idea is to precompute the weights for every level, then only apply the grid, reproject and Clip the tile. Should prob just cache the regridded values but trying to keep things idempotent as possible

abkfenris · 2022-04-28T18:23:49Z

How about caching the weights on first access for a level?

mpiannucci · 2022-04-28T18:36:46Z

How about caching the weights on first access for a level?

Yeah I think that’s a good compromise. I am worried about when multiple tiles from a single level are requested concurrently, so prob need a mutex in there to avoid clashing, might try that tonight

mpiannucci · 2022-04-29T11:26:53Z

So in trying this last night, xESMF is very slow to generate weights for the regridder once you get past zoom level one. It wasn't as bad when originally regridding to only the tile extents.

Need to rethink the best way to manage this, ncwms seems to be able to do this very fast so will use that as inspiration

abkfenris · 2022-04-29T13:21:26Z

Is it actually delaying calculation? It might need to have an explicitly instantiated dask cluster in scope before it will defer calculation.

mpiannucci · 2022-04-29T13:31:34Z

Not sure. I have another idea using just reprojection i am going to test

jmunroe · 2022-04-29T13:37:48Z

I think @abkfenris is correct -- without a dask scheduler set up the calculation will not be lazy.

mpiannucci · 2022-04-29T14:34:06Z

We can get away with just reprojecting with rioxarray and its a lot faster than regridding with xesmf

mpiannucci · 2022-04-29T22:13:41Z

^^ but that doesn't solve longer term issues with lazy loading because rioxarray needs the whole dataset to reprojrct at the moment.

So good enough to get data to the zarr pyramid endpoints as a POC but not the long term solution for now

abkfenris · 2022-04-30T13:24:51Z

For checking if there are any dask clusters that have been implicitly created, distributed.client._global_client_index should give a dictionary of any dask clients currently in scope. via https://github.com/pangeo-forge/pangeo-forge-recipes/pull/350/files#r861207572

mpiannucci · 2022-04-30T14:26:32Z

So in testing, the bottleneck for using xESMF is generating the weights, which can not be done in parallel using dask or otherwise when using xESMF. The actual regridding is fast enough once the weights are generated

jmunroe · 2022-04-30T14:39:59Z

Are grid weights (from model grid to some “common” coordinate systems) something that should be precomputed and stored adjacent to the underlying dataset? Is that possible?

On Sat, Apr 30, 2022 at 10:26 AM Matthew Iannucci ***@***.***> wrote: So in testing, the bottleneck for using xESMF is generating the weights, which can not be done in parallel using dask or otherwise when using xESMF. The actual regridding is fast enough once the weights are generated — Reply to this email directly, view it on GitHub <#17 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABPFFO3N5GJZLWXQNRHYOQDVHU7KFANCNFSM5UTIR24Q> . You are receiving this because you commented.Message ID: ***@***.***>

-- JAMES MUNROE | ASSOCIATE PROFESSOR Department of Physics and Physical Oceanography Memorial University of Newfoundland 230 Elizabeth Avenue St. John’s, Newfoundland, Canada A1C 5S7 Chemistry and Physics Building | Room C 4060 T 709 864 7362 | M 709 771 0450 www.physics.mun.ca

abkfenris · 2022-05-01T17:22:14Z

While we could pre-compute weights ourselves, it would be nice to not have to define a single specific sidecar file. I'd still lean towards adding them to the cache.

How about adding a datasets/{dataset_id}/tree/cache route, that then fires off a background task to build up the weights and cache them? And we could either try to generate on the fly or fire off an error with a message to hit that endpoint if necessary if the weights aren't cached.

Also on the cache side of things we might want to explore overriding the current xpublish.get_cache(). The current cache store is a dict, so it's confined to a single process (also we might want to explore running it in gunicorn to enable multiple accesses), but cachey.Cache's data store is pluggable. We could try something like redis_collections.Dict or other MutableMappings that could be swapped in place.

mpiannucci mentioned this issue Apr 29, 2022

Fix tiling #23

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Regridding performance #17

Improve Regridding performance #17

mpiannucci commented Apr 28, 2022

abkfenris commented Apr 28, 2022

mpiannucci commented Apr 28, 2022

mpiannucci commented Apr 29, 2022

abkfenris commented Apr 29, 2022

mpiannucci commented Apr 29, 2022

jmunroe commented Apr 29, 2022

mpiannucci commented Apr 29, 2022

mpiannucci commented Apr 29, 2022

abkfenris commented Apr 30, 2022

mpiannucci commented Apr 30, 2022

jmunroe commented Apr 30, 2022 via email

abkfenris commented May 1, 2022 •

edited

Loading

Improve Regridding performance #17

Improve Regridding performance #17

Comments

mpiannucci commented Apr 28, 2022

abkfenris commented Apr 28, 2022

mpiannucci commented Apr 28, 2022

mpiannucci commented Apr 29, 2022

abkfenris commented Apr 29, 2022

mpiannucci commented Apr 29, 2022

jmunroe commented Apr 29, 2022

mpiannucci commented Apr 29, 2022

mpiannucci commented Apr 29, 2022

abkfenris commented Apr 30, 2022

mpiannucci commented Apr 30, 2022

jmunroe commented Apr 30, 2022 via email

abkfenris commented May 1, 2022 • edited Loading

abkfenris commented May 1, 2022 •

edited

Loading