Maint/to pytest test dataset sparse dataset #1418

Taniya-Das · 2025-06-18T14:56:39Z

Metadata

Reference Issue:
New Tests Added:
Documentation Updated:
Change Log Entry:

Details

The downloaded sparse data file is trimmed by removing some rows and columns to decrease the file size for testing.

codecov-commenter · 2025-06-18T15:04:51Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.73%. Comparing base (6103874) to head (8a75cb4).

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1418      +/-   ##
===========================================
+ Coverage    53.71%   53.73%   +0.01%     
===========================================
  Files           38       38              
  Lines         5229     5229              
===========================================
+ Hits          2809     2810       +1     
+ Misses        2420     2419       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

LennartPurucker

Minor comments

tests/conftest.py

LennartPurucker · 2025-06-19T11:06:34Z

tests/test_datasets/test_dataset.py

-        assert isinstance(X, pd.DataFrame)
-        assert isinstance(X.dtypes[0], pd.SparseDtype)
-        assert X.shape == (600, 20000)
+@pytest.mark.production        


If we mock the test, do we still need the mark here? I think we can remove it as long as we do not connect anymore to any server

By default it tries to connect to test server https://test.openml.org/ otherwise.
As it is just a mock, I can mock the files to test server. But this might not a be a good thing, as these datasets don't exist on test server.

We technically also have the @pytest.mark.server() marker for things that actually connect to the server. So in that way it makes sense if we update this to be consistent, @pytest.mark.production just means a production configuration, and @pytest.mark.server means an actual network operation is performed (not everything is mocked).

So what I am saying is that @pytest.mark.production() is needed for any production server configuration, even if it does not actually access the production (it's about how URLs are formed internally). Otherwise we would still end up with that race condition again. Either that or modify the URLs and the files to use the test server constants -- but that's not particularly clear either because the mocks are based on production data.

LennartPurucker · 2025-06-19T11:09:56Z

tests/conftest.py

+
+    description_file = base_path / "description.xml"
+    requests_mock.get(
+        "https://www.openml.org/api/v1/xml/data/395",


maybe make the API base path a fixture as well

ok, but it doesn't make a lot of difference. the generic base is just test_files_directory / "mock_responses".

…ttestsparse

PGijsbers · 2025-06-19T17:25:20Z

tests/test_datasets/test_dataset.py

+def test_get_sparse_categorical_data_id_395(mock_sparse_categorical_395):
+
+    dataset = openml.datasets.get_dataset(395, download_data=True)
+    feature = dataset.features[3758]
+    assert isinstance(dataset, OpenMLDataset)
+    assert isinstance(feature, OpenMLDataFeature)
+    assert dataset.name == "re1.wc"
+    assert feature.name == "CLASS_LABEL"
+    assert feature.data_type == "nominal"
+    assert len(feature.nominal_values) == 25


It looks like this is the only test which uses mock_sparse_categorical_395, is that correct?
If so, we can remove the data file from the repository, and remove download_data=True since it looks like we are only interested in accessing features.
And on that note, we could also remove most of the features xml file, and only keep the feature we are interested in analysing. Let me know if you have any questions about it.

Taniya-Das added 2 commits June 18, 2025 16:51

resolve merge conflict

18c64d1

test get_sparse_data

261849b

Taniya-Das changed the title ~~Maint/to pytest test dataset open m ldatasettestsparse~~ Maint/to pytest test dataset sparse dataset Jun 18, 2025

convert test_get_sparse_daatset_rowid_and_ignore_and_target

e5267d6

Taniya-Das marked this pull request as draft June 19, 2025 08:05

convert test_sparse_categorical_395

d73df95

Taniya-Das marked this pull request as ready for review June 19, 2025 10:17

Taniya-Das added 2 commits June 19, 2025 12:55

remove repetition using fixture

879fe7a

remove old fixture

a6aae77

LennartPurucker requested changes Jun 19, 2025

View reviewed changes

Taniya-Das added 2 commits June 19, 2025 15:34

minor correction

69e8072

merge conflict resolve

cbaf4bf

Taniya-Das marked this pull request as draft June 19, 2025 14:16

Taniya-Das and others added 2 commits June 19, 2025 16:27

Merge branch 'develop' into maint/to_pytest_test_dataset_OpenMLdatase…

8a75cb4

…ttestsparse

resolve comments

28fff8a

Taniya-Das requested a review from LennartPurucker June 19, 2025 14:45

Taniya-Das marked this pull request as ready for review June 19, 2025 14:54

PGijsbers reviewed Jun 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Maint/to pytest test dataset sparse dataset #1418

Maint/to pytest test dataset sparse dataset #1418

Uh oh!

Taniya-Das commented Jun 18, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 18, 2025 •

edited

Loading

Uh oh!

LennartPurucker left a comment

Uh oh!

Uh oh!

LennartPurucker Jun 19, 2025

Uh oh!

Taniya-Das Jun 19, 2025

Uh oh!

PGijsbers Jun 19, 2025

Uh oh!

PGijsbers Jun 19, 2025

Uh oh!

LennartPurucker Jun 19, 2025

Uh oh!

Taniya-Das Jun 19, 2025

Uh oh!

PGijsbers Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

Maint/to pytest test dataset sparse dataset #1418

Are you sure you want to change the base?

Maint/to pytest test dataset sparse dataset #1418

Uh oh!

Conversation

Taniya-Das commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Metadata

Details

Uh oh!

codecov-commenter commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

LennartPurucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LennartPurucker Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Taniya-Das Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

PGijsbers Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

PGijsbers Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

LennartPurucker Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Taniya-Das Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

PGijsbers Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Taniya-Das commented Jun 18, 2025 •

edited

Loading

codecov-commenter commented Jun 18, 2025 •

edited

Loading