-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: add quantile
?
#795
Comments
Thanks for the proposal @mdhaber. I don't quite have an opinion yet - I think it in part depends on the situation with methods (see below). Also - how much do you actually need this? I only count two instances of it being used in SciPy, one of which is a test case. I just did a grep, so I may be missing some dynamic usage perhaps. The one instance is pretty simple, no keyword usage:
Which ones? Could we get away with only a default method, and hence no
|
Related is our previous discussion on |
Personally, I don't need it very badly. You're right that
One wrench I'd like to throw into my own proposal: |
I did a bit of digging, with the disclaimer that my digging is incomplete, but I'll try to summarize my initial findings below. OverviewI took a peak at which APIs were implemented across array libraries, spot-checked whether/how they were implemented, and did a search for which APIs were used in SciPy and sklearn. medianUsageImplementations
quantileUsageImplementations
percentileUsageImplementations
partitionImplementations
|
I reviewed
There are also uses of Most of these functions have something else that would make array API conversion challenging at the moment, but I don't think any have non-starters for array API support. I've omitted uses I don't think will get array API support any time soon (e.g. |
Ideally it would be nice to have support for weights and filtering missing values automatically. NumPy 2 recently added support for weighted percentiles/quantiles via the Adding support for nan value filtering is being discussed in #621. However, this is not necessarily easy to handle both weights and missing values (EDIT: actually it mostly means forcing the weight of nan entries to zero before computing the cumsum of weights of the sorted observations). There is a related pull request in NumPy here: In scikit-learn we maintain our own |
In my experience, quantiles are very important in practice and occur all over the places.
I could imagine to let the default method vary (for easier consensus), but each one must have a |
I would also advocate for |
@rgommers you were concerned about meetings. I wanted to check whether it was clear that these methods differ only in the choice of two numbers, as described in NumPy's If it helps, the methods I suggested initially can be parametrized by two floats in OTOH, at this point I'm leaning toward adding an implementation to SciPy in terms of array API functions, the key operations being |
@mdhaber You might want to contact/coordinate with scikit-learn for that. (Apart from numpy 2.0 not yet being the minimal dependency) The array API compatibility is meanwhile one of the main motivations to have our own implementation of it, see the recent PR scikit-learn/scikit-learn#29034. All technical details aside, what is the right approach to move this RFC forward? How/Where is the decision making body, etc.? |
That's fine if scikit-learn needs its own private function. We'll need something (public or private) in SciPy anyway. I opened scipy/scipy#22352 about that. |
Linking Note with Dask users are often recommended to use something like |
I'm working on adding array API support in
scipy.stats
(scipy/scipy#20544) and one of the the things I'll need is aquantile
function. If there is some support for this idea, I'll convert this issue into a proper proposal.Looks like there is already wide support:
numpy.quantile
torch.quantile
cupy.quantile
jax.numpy.quantile
dask.dataframe.DataFrame.quantile
tfp.stats.quantiles
xarray.DataArray.quantile
Previous discussions (not much):
There are many conventions for calculating quantiles. Only a few methods would be required by the standard, and if choice of a default is too contentious, perhaps the array-API can consider
method
to be a required keyword argument, and libraries would be welcome to keep their own default.The text was updated successfully, but these errors were encountered: