Improve type annotations in `sklearn.metrics._regression` #357

matkozak · 2025-03-24T13:23:49Z

Improve type annotations in scikit-learn's regression metrics to more accurately represent the relationship between input parameters and return types. Specifically:

Add overloads to correctly type the relationship between the multioutput parameter and return types. Functions return ndarray when multioutput="raw_values" and floats for other multioutput options.
Fix inconsistent float type annotations: built-in float for functions that explicitly convert their result with float(result), Float type alias for functions that return NumPy floating-point types (np.float64, etc.)
Correct d2_tweedie_score return type which claims to return "float or ndarray of floats" in the docstring but has no code path that returns an ndarray.

Various sklearn metrics return floats or ndarrays based on the value of `multioutput` parameter. This commit adds overloads for the separate paths.

Various sklearn metrics return either a standard Python float, or a numpy flating point scalar type. E.g. ``` >>> import numpy as np >>> from sklearn.metrics import mean_absolute_error, median_absolute_error >>> a = np.array([1,2,3]) >>> b = np.array([4,5,6]) >>> type(mean_absolute_error(a,b)) float >>> type(median_absolute_error(a,b)) numpy.float64 ``` This commit fixes the type annotations for the following functions: - `mean_absolute_error` - `mean_absolute_percentage_error` - `mean_squared_error` - `r2_score` - `mean_tweedie_deviance` - `d2_pinball_score` - `d2_absolute_error_score`

The docs say float or ndarray but there is not ndarray return path.

matkozak · 2025-03-25T09:47:13Z

@microsoft-github-policy-service agree

debonte · 2025-03-25T20:04:16Z

@matkozak, thanks for the contribution! Can you please add some unit tests for these overloads similar to https://github.com/microsoft/python-type-stubs/blob/main/tests/sklearn/preprocessing_tests.py?

Originally these were not overloads, but now they are.

matkozak added 3 commits March 24, 2025 12:52

Add path overloads in sklearn.metrics._regression

f21d3eb

Various sklearn metrics return floats or ndarrays based on the value of `multioutput` parameter. This commit adds overloads for the separate paths.

Fix d2_tweedie_score return

bda81cc

The docs say float or ndarray but there is not ndarray return path.

Merge branch 'main' into metrics

193bda6

debonte approved these changes May 30, 2025

View reviewed changes

Undo removed @Overloads from merge

c35a82f

Originally these were not overloads, but now they are.

debonte merged commit 4cb0e74 into microsoft:main May 30, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve type annotations in `sklearn.metrics._regression` #357

Improve type annotations in `sklearn.metrics._regression` #357

Uh oh!

matkozak commented Mar 24, 2025

Uh oh!

matkozak commented Mar 25, 2025

Uh oh!

debonte commented Mar 25, 2025

Uh oh!

Uh oh!

Uh oh!

Improve type annotations in sklearn.metrics._regression #357

Improve type annotations in sklearn.metrics._regression #357

Uh oh!

Conversation

matkozak commented Mar 24, 2025

Uh oh!

matkozak commented Mar 25, 2025

Uh oh!

debonte commented Mar 25, 2025

Uh oh!

Uh oh!

Uh oh!

Improve type annotations in `sklearn.metrics._regression` #357

Improve type annotations in `sklearn.metrics._regression` #357