Support multiple array formats #13

sandorkertesz · 2024-05-03T09:12:51Z

Implements #9

I converted all the modules to be array format agnostic but still allowing numbers as input:

extreme
score
solar
stats
thermo
wind

All the methods (with one exception, see below) are working with the following array backends:

numpy
cupy
torch

All the tests had to be heavily refactored.

Details

The code goes like this:

from earthkit.meteo.utils.array import array_namespace
...
def specific_humidity_from_vapour_pressure(e, p, eps=1e-4):
   ....
 
    xp = array_namespace(e, p)
    v = xp.asarray(p + (constants.epsilon - 1) * e)
    v[xp.asarray(p - e) < eps] = xp.nan
    return constants.epsilon * e / v

So the namespace is not passed as a kwarg but determined from the input arrays.

The tests now work with multiple array formats and numbers too:

from earthkit.meteo.utils.testing import ARRAY_BACKENDS
...

@pytest.mark.parametrize(
    "vp, p, v_ref",
    [
        ([895.992614, 2862.662152, 10000], [700, 1000, 50], [0.008, 0.018, np.nan]),
        ([895.992614, 2862.662152, 100000], 700, [0.008, 0.0258354146, np.nan]),
        (895.992614, 700, 0.008),
        (100000, 700, np.nan),
    ],
)
@pytest.mark.parametrize("array_backend", ARRAY_BACKENDS)
def test_specific_humidity_from_vapour_pressure(vp, p, v_ref, array_backend):
    vp, p, v_ref = array_backend.asarray(vp, p, v_ref)
    p = p * 100
    q = thermo.array.specific_humidity_from_vapour_pressure(vp, p)

    assert array_backend.allclose(q, v_ref, equal_nan=True)

Dependencies

array-api-compat is now a required dependency.

Outstanding problem with cupy

cupy.quantile() behaves differently than np.quantile() when there are nans in the input array. For this reason the test test_quantiles_nan() is disabled for cupy! This problem has to be further investigated!

Problems

np.polynomial.polynomial.polyval() is used but not part of the array API standard. No polyval() is available in torch.
torch.sign() returns 0 for nan. According to the array API standard it should return nan. Numpy behaves correctly.
The following methods are used but not available in the array API standard. However, they are all available in numpy, cupy and torch.
- deg2rad
- atleast_1d
- fmax
- fmin
np.percentile() is not available in the array API standard. Torch only has quantile(), which is also in numpy/cupy. quantile() is not in the array API standard.
np.histogram2d() is not available in the array API standard. Torch only has histogramdd(), which is also in numpy/cupy. histogramdd() is not in the array API standard.
The following methods are only available in numpy:
- seterr
The following methods have a different name in the array API standard than in numpy.
- fabs() -> abs()
- power() -> pow()
On an array-api-compat namespace we can only call pow() and abs()!
```
import numpy as np
import array_api_compat.numpy as xp
a = np.ones(10)
xp.power(a, 2) # this fails
xp.pow(a, 2) # this works
```

Solutions

Modifications
++++++++++++++++

Replaced calls of power and fabs with pow and abs, respectively.

Patches
+++++++++++++++
The namespace returned by array_api_compat.array_namespace() is patched using methods from utils.compute. See: utils.array for details.

Note:

utils.compute.percentile is implemented with quantile.
utils.compute.histogram2d is implemented with histogramdd.
utils.compute.seterr does nothing and returns an empty dict

numpy:
- np.polynomial.polynomial.polyval is added as polyval

cupy:
- cupy.polynomial.polynomial.polyval is added as polyval
- utils.compute.seterr is added as seterr

torch:
- utils.compute.polyval is added as polyval
- utils.compute.percentile is added as percentile
- utils.compute.histogram2d is added as histogram2d
- utils.compute.seterr is added as seterr
- sign is modified to treat nans correctly

other namespaces (not tested, maybe not needed)
- utils.compute.polyval is added as polyval
- utils.compute.percentile is added as percentile
- utils.compute.histogram2d is added as histogram2d
- utils.compute.seterr is added as seterr

With this we can call polyval, percentile and histogram2d on an array-namespace:

def compute_wbpt(self, ept):
    xp = array_namespace(ept)
    t0 = 273.16
    x = ept / t0
    ....
    return ept - xp.exp(xp.polyval(x, a) / xp.polyval(x, b))

def sot(clim, ens, perc, eps=-1e4):
    xp = array_namespace(clim, ens, perc)
    clim = xp.asarray(clim)
    ens = xp.asarray(ens)
    ...
    qf = xp.percentile(ens, q=perc, axis=0)

TODO

Handle np.seterr
Add JAX support?

codecov-commenter · 2024-05-03T10:01:23Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

sandorkertesz · 2024-09-19T08:21:47Z

Hi @corentincarton, @oiffrig, any thought on this?

HCookie · 2025-02-19T11:40:23Z

src/earthkit/meteo/utils/namespace/torch.py

+def sign(x, *args, **kwargs):
+    """Reimplement the sign function to handle NaNs.
+
+    The problem is that torch.sign returns 0 for NaNs, but the array API
+    standard requires NaNs to be propagated.
+    """
+    x = _xp.asarray(x)
+    r = _xp.sign(x, *args, **kwargs)
+    r = _xp.asarray(r)
+    r[_xp.isnan(x)] = _xp.nan
+    return r


From what we discussed this seems like a good solution

HCookie

Looks great.
I understand what your were saying about the tests now...

HCookie · 2025-02-19T11:42:22Z

src/earthkit/meteo/extreme/array/sot.py

    # avoid divided by zero warning
-    err = np.seterr(divide="ignore", invalid="ignore")
+    try:
+        err = xp.seterr(divide="ignore", invalid="ignore")


Is this local to only numpy?
Could it make sense to check if numpy run func otherwise skip?

I created an empty implementation for the other backends so now we can use it just like this:

err = xp.seterr(divide="ignore", invalid="ignore")

But this could be misleading. So, the other solution is:

if array_api_compat.is_numpy_namespace(xp): err = xp.seterr(divide="ignore", invalid="ignore")

src/earthkit/meteo/solar/array/solar.py

src/earthkit/meteo/stats/array/quantiles.py

src/earthkit/meteo/wind/array/wind.py

tests/thermo/test_thermo.py

corentincarton · 2025-03-11T16:34:25Z

src/earthkit/meteo/stats/array/quantiles.py

    else:
-        qs = np.asarray(which)
+        qs = xp.asarray(which)

    if method == "numpy_bulk":


Shouldn't we change this api and rename the methods? This doesn't make sense anymore now that we have the Array API. @oiffrig what do you think?

Not sure what you mean. Happy to rename the methods to something that doesn't contain "numpy" (although we may want to keep a fallback. PProc uses the "sort" method, it should be kept (more memory-efficient than numpy)

What I mean is: does it still make sense to call it numpy when the backend may not be numpy anymore? It could be confusing when users try to use the function with copy arrays for instance.

Yes, let's give official names that don't contain numpy (may be worth keeping the numpy names for backwards compatibility, don't need to appear in the docs though)

Support multiple array formats

a541dbc

sandorkertesz marked this pull request as draft May 3, 2024 09:12

Support multiple array formats

05ec0e3

sandorkertesz requested review from oiffrig and corentincarton May 3, 2024 12:16

sandorkertesz added 2 commits May 8, 2024 15:27

Merge branch 'develop' into feature/array-formats

13a16b6

Support multiple array formats

d088112

sandorkertesz added 21 commits September 19, 2024 14:54

Merge branch 'develop' into feature/array-formats

dd2d97b

Support array formats

af717a1

Merge branch 'develop' into feature/array-formats

eabe951

Support array formats

646e537

Convert mixing_ratio_from_vapour_pressure

726cd36

Convert thermo

2bc1060

Convert thermo

cab7802

Convert thermo

d8f74d9

Convert thermo

1c08286

Convert thermo

0bc2d85

Convert thermo

b900d93

Convert thermo

6f8e72c

Convert thermo

7517434

Convert solar

c248f9c

Convert wind

e755564

Convert wind

dfb4b9a

Update docs

8e0eb2a

Update windrose

9a862fd

Convert score

fa1aa4d

Convert extreme

d33c280

Convert stats

e94e27a

sandorkertesz added 3 commits February 13, 2025 15:56

Add compat methods

4ee0000

Fix sot

675c447

Namespace patch

8f3a261

sandorkertesz changed the title ~~WIP: Support multiple array formats~~ Support multiple array formats Feb 14, 2025

sandorkertesz marked this pull request as ready for review February 14, 2025 13:35

sandorkertesz requested a review from HCookie February 14, 2025 13:35

sandorkertesz added 7 commits February 14, 2025 13:44

Fix docs

0a13db3

Update docs

04908c5

Update docs

c0be3c5

Merge branch 'develop' into feature/array-formats

fa2bbae

Fix histogram2d

427846a

Update docs

25374b4

Merge branch 'develop' into feature/array-formats

bb28825

HCookie reviewed Feb 19, 2025

View reviewed changes

sandorkertesz requested a review from iainrussell February 21, 2025 11:19

sandorkertesz added 6 commits February 21, 2025 11:40

Remove print

f6e07bf

Seterr

7d63844

Add cupy support

22c3823

Add cupy support

fe4c828

Add cupy support

05c71ec

Fix pre-commit errors

1eecdd8

corentincarton reviewed Mar 11, 2025

View reviewed changes

sandorkertesz added 3 commits March 11, 2025 19:17

Reenable pytorch tests

bce8d93

Improve namespace code

8b8480f

Fix docstring

c37f01d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple array formats #13

Support multiple array formats #13

sandorkertesz commented May 3, 2024 •

edited

Loading

codecov-commenter commented May 3, 2024

sandorkertesz commented Sep 19, 2024

HCookie Feb 19, 2025

HCookie left a comment

HCookie Feb 19, 2025

sandorkertesz Feb 24, 2025

corentincarton Mar 11, 2025

oiffrig Mar 12, 2025

corentincarton Mar 12, 2025

oiffrig Mar 12, 2025

Support multiple array formats #13

Are you sure you want to change the base?

Support multiple array formats #13

Conversation

sandorkertesz commented May 3, 2024 • edited Loading

Details

Dependencies

Outstanding problem with cupy

Problems

Solutions

TODO

codecov-commenter commented May 3, 2024

Welcome to Codecov 🎉

sandorkertesz commented Sep 19, 2024

HCookie Feb 19, 2025

Choose a reason for hiding this comment

HCookie left a comment

Choose a reason for hiding this comment

HCookie Feb 19, 2025

Choose a reason for hiding this comment

sandorkertesz Feb 24, 2025

Choose a reason for hiding this comment

corentincarton Mar 11, 2025

Choose a reason for hiding this comment

oiffrig Mar 12, 2025

Choose a reason for hiding this comment

corentincarton Mar 12, 2025

Choose a reason for hiding this comment

oiffrig Mar 12, 2025

Choose a reason for hiding this comment

sandorkertesz commented May 3, 2024 •

edited

Loading