-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor missing methods #2058
base: main
Are you sure you want to change the base?
Refactor missing methods #2058
Conversation
Nice! |
Benchmarking done with synthetic data of 36525x50 (10 years) with 4-year x 10 chunks. It starts at 15 keys in the dask graph. Test is:
Except for the old
Thus : No change of "ANY" and "PCT". A small improvement for "at least n" and a huge one for "WMO". @huard I realized that the WMO method had some issues:
|
Pull Request Checklist:
xclim.core.missing
should be refactored to use a functional programming approach #2000 and fixes Performance hit when using dask to computemissing_wmo
#1820 and should please @tlogan2000 .number
) and pull request (:pull:number
) has been addedWhat kind of change does this PR introduce?
Refactor of the Missing objects. I tried to follow a more orthodox OOP approach. In the new way:
__init__
to explicitly override the signature and document their options, but this method should not do anything.validate
, a static method, which returns False on invalid options (this is the same as before).is_missing
, which receivesnull
,count
andfreq
. It does the same as before._validate_src_timestep
, to validate thesrc_timestep
at call time. Only useful for MissingWMO which is restricted to daily inputs.__call__
, which is not meant to be overriden.null
as aDataArrayResample
object anymore, but as a normalDataArray
. This allows a bit more flexibility, which I use to optimiseMissingWMO
by usingresample_map
on thelongest_run
condition. Benchmarking to come.MissingTwoSteps
subclass used byMissingPct
andAtLeastNValid
(andMissingWMO
, but not in a new way). This adds asubfreq
option which can be used to divide the mask computation in two steps.subfreq
using the given methodfreq
using the "any" method.Does this PR introduce a breaking change?
Yes,
MissingBase
and all its children have been modified in breaking ways. However, these were not exposed in the public API. The convenience functions should work as they did before.Some users, though, might have implemented custom missing methods. These will break, sorry. I hope the new way makes more sense.
Other information:
I have yet to run
mypy
and tools in the like to see if I really fixed #2000.Also, I'll had some benchmarking to see if my change impacted performance. In preliminary tests,
missing_wmo
ran at least 10x faster on a dataset of 100 years x 50 points. And it had 1000x fewer dask tasks.