fix: Accept X and y as positional argument with as_dict=True in train_test_split #1570

amltarek · 2025-04-20T20:47:15Z

I fixed the issue by updating the handling of keyword arguments when as_dict=True is used. Now, if all datasets are passed as keywords, the function directly returns them as a dictionary without extra processing. This makes the behavior more intuitive and avoids redundancy. I also tested the fix through test_train_split_test.py to ensure it works correctly.

skore/src/skore/sklearn/train_test_split/train_test_split.py

auguste-probabl

Can you add some tests? It would also help to demonstrate what the new behaviour looks like.

amltarek · 2025-04-22T16:45:25Z

@auguste-probabl
I added the tests and restored the function's documentation. Could you check it and let me know if there are any updates I can make?

github-actions · 2025-04-23T07:40:17Z

Documentation preview @ d6f9b44

skore/src/skore/sklearn/train_test_split/train_test_split.py

skore/tests/unit/sklearn/train_test_split/test_train_test_split.py

nkapila6 · 2025-04-24T11:40:44Z

skore/src/skore/sklearn/train_test_split/train_test_split.py

    if y is not None:
        new_arrays.append(y)
-        keys += ["y"]
-
-    if as_dict and arrays:


Why do you remove this? If both are true, it will cause a conflict.

nkapila6 · 2025-04-24T11:46:13Z

skore/src/skore/sklearn/train_test_split/train_test_split.py

+
+            new_arrays = list(keyword_arrays.values())
+
+        if X is not None:


Why is this repeated inside and outside the if? It is redundant.

nkapila6 · 2025-04-24T11:46:44Z

skore/src/skore/sklearn/train_test_split/train_test_split.py

+        if X is not None:
+            new_arrays.append(X)
+            keys.append("X")
+        if y is not None:


Why is this repeated inside and outside the if? It is redundant.

nkapila6 · 2025-04-24T11:48:33Z

skore/src/skore/sklearn/train_test_split/train_test_split.py

@@ -167,21 +177,20 @@ class labels.
        stratify=stratify,
    )

-    if X is None:
-        X = arrays[0] if len(arrays) == 1 else arrays[-2]
+    if X is None and len(arrays) >= 1:


Incorrect value for case when len(arrays)>1?

…_split

…thub.com/amltarek/skore into fix-Passing-all-datasets-by-keyword-1544

amltarek · 2025-04-24T14:11:41Z

@auguste-probabl I have implemented the changes as you requested and also tested it. Additionally, I modified another function, test_train_test_split_dict_kwargs(), because it was throwing an error when data was passed without keyword arguments while return_dict=True. This issue has now been fixed. Furthermore, I addressed all the changes requested by @nkapila6.

skore/tests/unit/sklearn/train_test_split/test_train_test_split.py

auguste-probabl · 2025-04-24T15:32:47Z

skore/tests/unit/sklearn/train_test_split/test_train_test_split.py

Can you add a test with

arr1 = [[1]] * 20 arr2 = [0] * 10 + [1] * 10 train_test_split(arr2, z=arr1, as_dict=True)

this is tested through two functions named( test_train_test_split_check_dict()) and test_train_test_split_dict_kwargs().

It's a bit different: right now we test either all arguments passed by keyword, or all arguments passed by position. I'd like to also test the combination of both (one array passed by position, one array passed by keyword).

skore/tests/unit/sklearn/train_test_split/test_train_test_split.py

amltarek · 2025-04-24T16:05:50Z

I have removed all the duplicate functions. Can you check the code? @auguste-probabl

auguste-probabl · 2025-04-25T08:16:53Z

skore/tests/unit/sklearn/train_test_split/test_train_test_split.py

+    result = train_test_split(
+        X=X,
+        y=y,
+        sample_weights=weights,
+        test_size=0.2,
+        as_dict=True,
+        random_state=0,
+    )


This test feels a bit redundant, but

train_test_split( X, y, sample_weights=weights, ... )

(i.e. a mix of positional and keyword arguments) would be interesting. See also my other comment

auguste-probabl · 2025-04-25T08:21:52Z

@amltarek Can you resolve comments when it's clear that you have addressed them? It helps review your code.

auguste-probabl · 2025-04-25T08:22:30Z

Also please sign your commits

skore/src/skore/sklearn/train_test_split/train_test_split.py

github-actions · 2025-04-25T08:27:50Z

Coverage Report for backend

File	Stmts	Miss	Cover	Missing
venv/lib/python3.12/site-packages/skore
__init__.py	22	0	100%
_config.py	28	0	100%
exceptions.py	4	4	0%	4–23
venv/lib/python3.12/site-packages/skore/persistence
__init__.py	0	0	100%
venv/lib/python3.12/site-packages/skore/persistence/item
__init__.py	55	1	98%	97
altair_chart_item.py	19	1	91%	14
item.py	22	1	95%	86
matplotlib_figure_item.py	36	1	95%	19
media_item.py	22	0	100%
numpy_array_item.py	27	1	94%	16
pandas_dataframe_item.py	29	1	94%	14
pandas_series_item.py	29	1	94%	14
pickle_item.py	22	0	100%
pillow_image_item.py	25	1	93%	15
plotly_figure_item.py	20	1	92%	14
polars_dataframe_item.py	27	1	94%	14
polars_series_item.py	22	1	92%	14
primitive_item.py	23	2	91%	13–15
sklearn_base_estimator_item.py	29	1	94%	15
venv/lib/python3.12/site-packages/skore/persistence/repository
__init__.py	2	0	100%
item_repository.py	59	5	91%	15–16, 202–203, 226
venv/lib/python3.12/site-packages/skore/persistence/storage
__init__.py	4	0	100%
abstract_storage.py	22	0	100%
disk_cache_storage.py	33	1	95%	44
in_memory_storage.py	20	0	100%
venv/lib/python3.12/site-packages/skore/project
__init__.py	2	0	100%
project.py	83	2	98%	280, 392
venv/lib/python3.12/site-packages/skore/sklearn
__init__.py	6	0	100%
_base.py	171	14	92%	45, 58, 126, 129, 182–191, 203–>209, 224, 227–228
find_ml_task.py	61	0	99%	136–>145
types.py	13	0	100%
venv/lib/python3.12/site-packages/skore/sklearn/_comparison
__init__.py	5	0	100%
metrics_accessor.py	165	2	97%	163, 164–>166, 1278
report.py	67	1	97%	17, 249–>252
venv/lib/python3.12/site-packages/skore/sklearn/_cross_validation
__init__.py	5	0	100%
metrics_accessor.py	190	0	99%	153–>155, 155–>157
report.py	110	1	98%	23
venv/lib/python3.12/site-packages/skore/sklearn/_estimator
__init__.py	7	0	100%
feature_importance_accessor.py	133	0	99%	483–>489, 569–>578
metrics_accessor.py	344	10	96%	174–183, 211–>220, 219, 249, 260–>262, 290, 317–321, 336, 371, 372–>374
report.py	148	1	98%	24, 253–>255
venv/lib/python3.12/site-packages/skore/sklearn/_plot
__init__.py	2	0	100%
base.py	6	0	100%
style.py	28	0	100%
utils.py	122	5	95%	51, 75–77, 81
venv/lib/python3.12/site-packages/skore/sklearn/_plot/metrics
__init__.py	4	0	100%
precision_recall_curve.py	173	1	99%	660
prediction_error.py	164	0	100%
roc_curve.py	176	1	99%	649
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split
__init__.py	0	0	100%
train_test_split.py	57	3	93%	16, 161, 177
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split/warning
__init__.py	8	0	100%
high_class_imbalance_too_few_examples_warning.py	17	1	90%	79
high_class_imbalance_warning.py	18	0	100%
random_state_unset_warning.py	12	1	88%	15
shuffle_true_warning.py	10	1	83%	46
stratify_is_set_warning.py	12	1	88%	15
time_based_column_warning.py	23	2	86%	17, 73
train_test_split_warning.py	4	0	100%
venv/lib/python3.12/site-packages/skore/utils
__init__.py	6	0	100%
_accessor.py	46	1	97%	102
_environment.py	27	0	97%	30–>35
_fixes.py	8	0	100%
_index.py	5	0	100%
_logger.py	22	4	85%	15–19
_measure_time.py	10	0	100%
_parallel.py	38	3	88%	23–33, 124
_patch.py	13	5	53%	21–37
_progress_bar.py	36	0	100%
_show_versions.py	33	0	100%
TOTAL	3191	84	96%

Tests	Skipped	Failures	Errors	Time
816	8 💤	0 ❌	0 🔥	53.579s ⏱️

amltarek · 2025-04-25T13:12:20Z

@auguste-probabl Can you check the updates?

auguste-probabl · 2025-04-25T13:15:16Z

skore/tests/unit/sklearn/train_test_split/test_train_test_split.py

+def test_empty_input():
+    """Tests that passing empty lists for X and y raises a ValueError."""
+    X = []
+    y = []
+    with pytest.raises(ValueError):
+        train_test_split(X, y)


I don't think this test is needed; this behaviour is not specific to our function, but rather to sklearn's.

auguste-probabl · 2025-04-25T13:15:37Z

skore/tests/unit/sklearn/train_test_split/test_train_test_split.py

+    assert "X_train" in result
+    assert "X_test" in result
+    assert "y_train" in result
+    assert "y_test" in result


No need for this

auguste-probabl · 2025-04-25T13:18:09Z

skore/src/skore/sklearn/train_test_split/train_test_split.py

+    >>> # When using positional arguments and as_dict=True
+    >>> # the first argument is assumed to be X, the second y
+    >>>  train_test_split(
+    ...     [[1], [2], [3], [4]], [0, 1, 0, 1], as_dict=True, random_state=0


Suggested change

... [[1], [2], [3], [4]], [0, 1, 0, 1], as_dict=True, random_state=0

... [[1], [2], [3], [4]], [0, 1, 0, 1], as_dict=True

No need for random_state since we don't check the output arrays. You can also remove random_state in the previous doctest

auguste-probabl · 2025-04-25T13:29:09Z

skore/tests/unit/sklearn/train_test_split/test_train_test_split.py

We should also test what happens with

train_test_split(X, X=X)

I think there should be an error like

X cannot be passed both by position and by keyword.

Same for y.

fixing as_dict=true

0ac8dec

github-actions bot assigned amltarek Apr 20, 2025

amltarek mentioned this pull request Apr 20, 2025

Passing all datasets by keyword makes it annoying to use as_dict=True #1544

Open

auguste-probabl reviewed Apr 22, 2025

View reviewed changes

skore/src/skore/sklearn/train_test_split/train_test_split.py Show resolved Hide resolved

auguste-probabl requested changes Apr 22, 2025

View reviewed changes

amltarek and others added 2 commits April 22, 2025 18:40

tests and documentation added

e55e078

Merge branch 'main' into fix-Passing-all-datasets-by-keyword-1544

c3b3c13

auguste-probabl reviewed Apr 23, 2025

View reviewed changes

skore/src/skore/sklearn/train_test_split/train_test_split.py Outdated Show resolved Hide resolved

auguste-probabl reviewed Apr 23, 2025

View reviewed changes

skore/tests/unit/sklearn/train_test_split/test_train_test_split.py Outdated Show resolved Hide resolved

glemaitre changed the title ~~fix:Passing all datasets by keyword makes it annoying to use as_dict=True #1544~~ fix: Passing all datasets by keyword makes it annoying to use as_dict=True #1544 Apr 23, 2025

glemaitre changed the title ~~fix: Passing all datasets by keyword makes it annoying to use as_dict=True #1544~~ fix: Passing all datasets by keyword makes it annoying to use as_dict=True Apr 23, 2025

glemaitre changed the title ~~fix: Passing all datasets by keyword makes it annoying to use as_dict=True~~ fix: Accept X and y as positional argument with as_dict=True in train_test_split Apr 23, 2025

nkapila6 suggested changes Apr 24, 2025

View reviewed changes

amltarek added 2 commits April 24, 2025 16:03

Accept X and y as positional argument with as_dict=True in train_test…

d6cf049

…_split

Merge branch 'fix-Passing-all-datasets-by-keyword-1544' of https://gi…

c83a16a

…thub.com/amltarek/skore into fix-Passing-all-datasets-by-keyword-1544

auguste-probabl reviewed Apr 24, 2025

View reviewed changes

skore/tests/unit/sklearn/train_test_split/test_train_test_split.py Outdated Show resolved Hide resolved

auguste-probabl reviewed Apr 24, 2025

View reviewed changes

skore/tests/unit/sklearn/train_test_split/test_train_test_split.py Outdated Show resolved Hide resolved

auguste-probabl reviewed Apr 24, 2025

View reviewed changes

skore/tests/unit/sklearn/train_test_split/test_train_test_split.py Show resolved Hide resolved

auguste-probabl reviewed Apr 24, 2025

View reviewed changes

skore/tests/unit/sklearn/train_test_split/test_train_test_split.py Outdated Show resolved Hide resolved

auguste-probabl reviewed Apr 24, 2025

View reviewed changes

skore/tests/unit/sklearn/train_test_split/test_train_test_split.py Outdated Show resolved Hide resolved

auguste-probabl reviewed Apr 24, 2025

View reviewed changes

skore/tests/unit/sklearn/train_test_split/test_train_test_split.py Outdated Show resolved Hide resolved

duplicated functions deleted

5879f6b

empty function

d6f9b44

auguste-probabl reviewed Apr 25, 2025

View reviewed changes

skore/src/skore/sklearn/train_test_split/train_test_split.py Show resolved Hide resolved

amltarek and others added 2 commits April 25, 2025 16:11

docset example

68eac88

Merge branch 'main' into fix-Passing-all-datasets-by-keyword-1544

d0e521d

auguste-probabl reviewed Apr 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Accept X and y as positional argument with as_dict=True in train_test_split #1570

fix: Accept X and y as positional argument with as_dict=True in train_test_split #1570

amltarek commented Apr 20, 2025 •

edited by glemaitre

Loading

auguste-probabl left a comment

amltarek commented Apr 22, 2025

github-actions bot commented Apr 23, 2025 •

edited

Loading

nkapila6 Apr 24, 2025

nkapila6 Apr 24, 2025

nkapila6 Apr 24, 2025

nkapila6 Apr 24, 2025 •

edited

Loading

amltarek commented Apr 24, 2025

auguste-probabl Apr 24, 2025 •

edited

Loading

amltarek Apr 25, 2025

auguste-probabl Apr 25, 2025

amltarek commented Apr 24, 2025

auguste-probabl Apr 25, 2025 •

edited

Loading

auguste-probabl commented Apr 25, 2025

auguste-probabl commented Apr 25, 2025

github-actions bot commented Apr 25, 2025

amltarek commented Apr 25, 2025

auguste-probabl Apr 25, 2025 •

edited

Loading

auguste-probabl Apr 25, 2025

auguste-probabl Apr 25, 2025

auguste-probabl Apr 25, 2025

	... [[1], [2], [3], [4]], [0, 1, 0, 1], as_dict=True, random_state=0
	... [[1], [2], [3], [4]], [0, 1, 0, 1], as_dict=True

fix: Accept X and y as positional argument with as_dict=True in train_test_split #1570

Are you sure you want to change the base?

fix: Accept X and y as positional argument with as_dict=True in train_test_split #1570

Conversation

amltarek commented Apr 20, 2025 • edited by glemaitre Loading

auguste-probabl left a comment

Choose a reason for hiding this comment

amltarek commented Apr 22, 2025

github-actions bot commented Apr 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nkapila6 Apr 24, 2025 • edited Loading

Choose a reason for hiding this comment

amltarek commented Apr 24, 2025

auguste-probabl Apr 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amltarek commented Apr 24, 2025

auguste-probabl Apr 25, 2025 • edited Loading

Choose a reason for hiding this comment

auguste-probabl commented Apr 25, 2025

auguste-probabl commented Apr 25, 2025

github-actions bot commented Apr 25, 2025

amltarek commented Apr 25, 2025

auguste-probabl Apr 25, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amltarek commented Apr 20, 2025 •

edited by glemaitre

Loading

github-actions bot commented Apr 23, 2025 •

edited

Loading

nkapila6 Apr 24, 2025 •

edited

Loading

auguste-probabl Apr 24, 2025 •

edited

Loading

auguste-probabl Apr 25, 2025 •

edited

Loading

auguste-probabl Apr 25, 2025 •

edited

Loading