Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve initial NetCDF fieldlist scanning speed #198

Open
sandorkertesz opened this issue Sep 20, 2023 · 0 comments
Open

Improve initial NetCDF fieldlist scanning speed #198

sandorkertesz opened this issue Sep 20, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@sandorkertesz
Copy link
Collaborator

sandorkertesz commented Sep 20, 2023

Sometimes reading a small NetCDF file (< 2 Mb) as a fieldlist takes a very long time.

There is one particular example reported to be very slow to read: https://get.ecmwf.int/repository/test-data/earhtkit-data/test-data/htessel_points.nc

Reading it using earthkit-data's from_source() method takes 30 seconds, while extracting all the numpy arrays using the netCDF4 package takes 0.02 seconds!

Having inspected the file we can see this structure (not all variables are shown):
image

earthkit-data is splitting this data by variable, level and time and generates 403006 (!) fields out of it, which obviously takes a lot of time. However, we can argue that this data should not be represented as a fieldlist since each "field" only contains a single point.

So the question is how to decide automatically whether a NetCDF dataset should be treated as a fieldlist.

@sandorkertesz sandorkertesz added the enhancement New feature or request label Sep 20, 2023
@sandorkertesz sandorkertesz self-assigned this Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant