Improve initial NetCDF fieldlist scanning speed #198

sandorkertesz · 2023-09-20T12:56:30Z

Sometimes reading a small NetCDF file (< 2 Mb) as a fieldlist takes a very long time.

There is one particular example reported to be very slow to read: https://get.ecmwf.int/repository/test-data/earhtkit-data/test-data/htessel_points.nc

Reading it using earthkit-data's from_source() method takes 30 seconds, while extracting all the numpy arrays using the netCDF4 package takes 0.02 seconds!

Having inspected the file we can see this structure (not all variables are shown):

earthkit-data is splitting this data by variable, level and time and generates 403006 (!) fields out of it, which obviously takes a lot of time. However, we can argue that this data should not be represented as a fieldlist since each "field" only contains a single point.

So the question is how to decide automatically whether a NetCDF dataset should be treated as a fieldlist.

The text was updated successfully, but these errors were encountered:

sandorkertesz added the enhancement New feature or request label Sep 20, 2023

sandorkertesz self-assigned this Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve initial NetCDF fieldlist scanning speed #198

Improve initial NetCDF fieldlist scanning speed #198

sandorkertesz commented Sep 20, 2023 •

edited

Loading

Improve initial NetCDF fieldlist scanning speed #198

Improve initial NetCDF fieldlist scanning speed #198

Comments

sandorkertesz commented Sep 20, 2023 • edited Loading

sandorkertesz commented Sep 20, 2023 •

edited

Loading