Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple array formats #13

Open
wants to merge 44 commits into
base: develop
Choose a base branch
from

Conversation

sandorkertesz
Copy link
Collaborator

@sandorkertesz sandorkertesz commented May 3, 2024

Implements #9

I converted all the modules to be array format agnostic but still allowing numbers as input:

  • extreme
  • score
  • solar
  • stats
  • thermo
  • wind

All the methods (with one exception, see below) are working with the following array backends:

  • numpy
  • cupy
  • torch

All the tests had to be heavily refactored.

Details

The code goes like this:

from earthkit.meteo.utils.array import array_namespace
...
def specific_humidity_from_vapour_pressure(e, p, eps=1e-4):
   ....
 
    xp = array_namespace(e, p)
    v = xp.asarray(p + (constants.epsilon - 1) * e)
    v[xp.asarray(p - e) < eps] = xp.nan
    return constants.epsilon * e / v

So the namespace is not passed as a kwarg but determined from the input arrays.

The tests now work with multiple array formats and numbers too:

from earthkit.meteo.utils.testing import ARRAY_BACKENDS
...

@pytest.mark.parametrize(
    "vp, p, v_ref",
    [
        ([895.992614, 2862.662152, 10000], [700, 1000, 50], [0.008, 0.018, np.nan]),
        ([895.992614, 2862.662152, 100000], 700, [0.008, 0.0258354146, np.nan]),
        (895.992614, 700, 0.008),
        (100000, 700, np.nan),
    ],
)
@pytest.mark.parametrize("array_backend", ARRAY_BACKENDS)
def test_specific_humidity_from_vapour_pressure(vp, p, v_ref, array_backend):
    vp, p, v_ref = array_backend.asarray(vp, p, v_ref)
    p = p * 100
    q = thermo.array.specific_humidity_from_vapour_pressure(vp, p)

    assert array_backend.allclose(q, v_ref, equal_nan=True)

Dependencies

array-api-compat is now a required dependency.

Outstanding problem with cupy

cupy.quantile() behaves differently than np.quantile() when there are nans in the input array. For this reason the test test_quantiles_nan() is disabled for cupy! This problem has to be further investigated!

Problems

  • np.polynomial.polynomial.polyval() is used but not part of the array API standard. No polyval() is available in torch.

  • torch.sign() returns 0 for nan. According to the array API standard it should return nan. Numpy behaves correctly.

  • The following methods are used but not available in the array API standard. However, they are all available in numpy, cupy and torch.

    • deg2rad
    • atleast_1d
    • fmax
    • fmin
  • np.percentile() is not available in the array API standard. Torch only has quantile(), which is also in numpy/cupy. quantile() is not in the array API standard.

  • np.histogram2d() is not available in the array API standard. Torch only has histogramdd(), which is also in numpy/cupy. histogramdd() is not in the array API standard.

  • The following methods are only available in numpy:

    • seterr
  • The following methods have a different name in the array API standard than in numpy.

    • fabs() -> abs()
    • power() -> pow()

    On an array-api-compat namespace we can only call pow() and abs()!

    import numpy as np
    import array_api_compat.numpy as xp
    a = np.ones(10)
    xp.power(a, 2) # this fails
    xp.pow(a, 2) # this works

Solutions

Modifications
++++++++++++++++

  • Replaced calls of power and fabs with pow and abs, respectively.

Patches
+++++++++++++++
The namespace returned by array_api_compat.array_namespace() is patched using methods from utils.compute. See: utils.array for details.

Note:

  • utils.compute.percentile is implemented with quantile.
  • utils.compute.histogram2d is implemented with histogramdd.
  • utils.compute.seterr does nothing and returns an empty dict

numpy:
- np.polynomial.polynomial.polyval is added as polyval

cupy:
- cupy.polynomial.polynomial.polyval is added as polyval
- utils.compute.seterr is added as seterr

torch:
- utils.compute.polyval is added as polyval
- utils.compute.percentile is added as percentile
- utils.compute.histogram2d is added as histogram2d
- utils.compute.seterr is added as seterr
- sign is modified to treat nans correctly

other namespaces (not tested, maybe not needed)
- utils.compute.polyval is added as polyval
- utils.compute.percentile is added as percentile
- utils.compute.histogram2d is added as histogram2d
- utils.compute.seterr is added as seterr

With this we can call polyval, percentile and histogram2d on an array-namespace:

def compute_wbpt(self, ept):
    xp = array_namespace(ept)
    t0 = 273.16
    x = ept / t0
    ....
    return ept - xp.exp(xp.polyval(x, a) / xp.polyval(x, b))

def sot(clim, ens, perc, eps=-1e4):
    xp = array_namespace(clim, ens, perc)
    clim = xp.asarray(clim)
    ens = xp.asarray(ens)
    ...
    qf = xp.percentile(ens, q=perc, axis=0)

TODO

  • Handle np.seterr
  • Add JAX support?

@sandorkertesz sandorkertesz marked this pull request as draft May 3, 2024 09:12
@codecov-commenter
Copy link

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@sandorkertesz
Copy link
Collaborator Author

Hi @corentincarton, @oiffrig, any thought on this?

@sandorkertesz sandorkertesz changed the title WIP: Support multiple array formats Support multiple array formats Feb 14, 2025
@sandorkertesz sandorkertesz marked this pull request as ready for review February 14, 2025 13:35
Comment on lines +25 to +35
def sign(x, *args, **kwargs):
"""Reimplement the sign function to handle NaNs.

The problem is that torch.sign returns 0 for NaNs, but the array API
standard requires NaNs to be propagated.
"""
x = _xp.asarray(x)
r = _xp.sign(x, *args, **kwargs)
r = _xp.asarray(r)
r[_xp.isnan(x)] = _xp.nan
return r
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what we discussed this seems like a good solution

Copy link
Member

@HCookie HCookie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.
I understand what your were saying about the tests now...

# avoid divided by zero warning
err = np.seterr(divide="ignore", invalid="ignore")
try:
err = xp.seterr(divide="ignore", invalid="ignore")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this local to only numpy?
Could it make sense to check if numpy run func otherwise skip?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an empty implementation for the other backends so now we can use it just like this:

err = xp.seterr(divide="ignore", invalid="ignore")

But this could be misleading. So, the other solution is:

if array_api_compat.is_numpy_namespace(xp):
    err = xp.seterr(divide="ignore", invalid="ignore")

else:
qs = np.asarray(which)
qs = xp.asarray(which)

if method == "numpy_bulk":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we change this api and rename the methods? This doesn't make sense anymore now that we have the Array API. @oiffrig what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean. Happy to rename the methods to something that doesn't contain "numpy" (although we may want to keep a fallback. PProc uses the "sort" method, it should be kept (more memory-efficient than numpy)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is: does it still make sense to call it numpy when the backend may not be numpy anymore? It could be confusing when users try to use the function with copy arrays for instance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's give official names that don't contain numpy (may be worth keeping the numpy names for backwards compatibility, don't need to appear in the docs though)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants