Skip to content

Submodule for datasets #41

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

mauro-milella
Copy link
Member

This pr addresses issue #40.

This is what I have done until now:

  • define 4 loaders for different datasets (HuGaDB, NATOPS, Libras, Epilepsy);
  • each loader is enriched with a docstring, explaining where to download the datasets and, of course, the meaning of each Arguments and Keyword Arguments;
  • define a submodule, Datasets, which exports the loaders;
  • include the submodule in SoleData, and @reexport .Dataset;
  • move an old "load_arff_dataset" dispatch from SoleData to within .Datasets submodule: this dispatch is still widely used in tests, but essentially, it is a messy version of load_NATOPS loader.

@mauro-milella mauro-milella linked an issue May 9, 2025 that may be closed by this pull request
@mauro-milella
Copy link
Member Author

Some tests are not working, but #39 should fix them.

To test the loaders I propose, it would be necessary to load them inside test/data. Can we do this?
This could be an overhead too big for certain datasets (e.g., HuGaDB is 220MB).
Maybe we could crop them to just a few instances?

@alberto-paparella
Copy link
Member

Dear @mauro-milella, yes using only a smaller set (warning the user) of the dataset could be a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Submodule for datasets
2 participants