You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When computing dissimilarity metrics, if any of the inputs have NaN values, the dissimilarity is given as NaN. This often doesn't make sense, e.g. if there's only one NaN.
Could we make it so that they compute the mask if skipna=True? This doesn't make sense for all metrics, e.g. if they change depending on array length, but for most it would be welcome.
Potential Solution
change metric to accept skipna, e.g. _metric_overhead:
def _metric_overhead(x, y, skipna=False, **kwargs):
if np.any(np.isnan(x)) or np.any(np.isnan(y)):
if not skipna:
return np.NaN
else:
mask = np.isfinite(x) | np.isfinite(y)
x = np.where(x,mask)
y = np.where(y,mask)
[...]
return func(x, y, **kwargs)
)
Additional context
No response
Contribution
I would be willing/able to open a Pull Request to contribute this feature.
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
After testing this on some indices, this doesn't seem very viable for Zech-Aslan. It seems like allowing for NaNs significantly changes the CDF of the dissimilarity distribution, particularly for mixed indices. It's possible I may have had a bug, though, as it is pretty robust for randomly removing points for two normal distributions, for example.
The fact it only applies to mixed indices is maybe a dimensionality problem... Will investigate.
EDIT: this was an indexing bug. with the bug fixed, removing from 0 to 15 NaN values for First Fall Frost, on 8000 pairs of samples across NA, on ERA5, the distribution changes a bit, but this could be because I'm massively undersampling the space
Addressing a Problem?
When computing dissimilarity metrics, if any of the inputs have NaN values, the dissimilarity is given as NaN. This often doesn't make sense, e.g. if there's only one NaN.
Could we make it so that they compute the mask if skipna=True? This doesn't make sense for all metrics, e.g. if they change depending on array length, but for most it would be welcome.
Potential Solution
change metric to accept skipna, e.g. _metric_overhead:
Additional context
No response
Contribution
Code of Conduct
The text was updated successfully, but these errors were encountered: