Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to skip NaNs in xclim.analog.spatial_analogs() #1415

Open
2 tasks done
SarahG-579462 opened this issue Jul 6, 2023 · 1 comment
Open
2 tasks done

Allow to skip NaNs in xclim.analog.spatial_analogs() #1415

SarahG-579462 opened this issue Jul 6, 2023 · 1 comment
Labels
API Interfacing and User Concerns enhancement New feature or request

Comments

@SarahG-579462
Copy link
Contributor

SarahG-579462 commented Jul 6, 2023

Addressing a Problem?

When computing dissimilarity metrics, if any of the inputs have NaN values, the dissimilarity is given as NaN. This often doesn't make sense, e.g. if there's only one NaN.

Could we make it so that they compute the mask if skipna=True? This doesn't make sense for all metrics, e.g. if they change depending on array length, but for most it would be welcome.

Potential Solution

change metric to accept skipna, e.g. _metric_overhead:

def _metric_overhead(x, y, skipna=False, **kwargs):
        if np.any(np.isnan(x)) or np.any(np.isnan(y)):
            if not skipna: 
              return np.NaN
            else:
              mask = np.isfinite(x) | np.isfinite(y)
              x = np.where(x,mask)
              y = np.where(y,mask)

       [...]
        return func(x, y, **kwargs)
    )

Additional context

No response

Contribution

  • I would be willing/able to open a Pull Request to contribute this feature.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@SarahG-579462 SarahG-579462 added enhancement New feature or request API Interfacing and User Concerns labels Jul 6, 2023
@SarahG-579462
Copy link
Contributor Author

SarahG-579462 commented Jul 6, 2023

After testing this on some indices, this doesn't seem very viable for Zech-Aslan. It seems like allowing for NaNs significantly changes the CDF of the dissimilarity distribution, particularly for mixed indices. It's possible I may have had a bug, though, as it is pretty robust for randomly removing points for two normal distributions, for example.

Screenshot from 2023-07-06 18-00-54
Screenshot from 2023-07-06 18-06-21

The fact it only applies to mixed indices is maybe a dimensionality problem... Will investigate.

EDIT: this was an indexing bug. with the bug fixed, removing from 0 to 15 NaN values for First Fall Frost, on 8000 pairs of samples across NA, on ERA5, the distribution changes a bit, but this could be because I'm massively undersampling the space

Screenshot from 2023-07-07 10-41-48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Interfacing and User Concerns enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant