Submodule for datasets #41

mauro-milella · 2025-05-09T14:55:28Z

This pr addresses issue #40.

This is what I have done until now:

define 4 loaders for different datasets (HuGaDB, NATOPS, Libras, Epilepsy);
each loader is enriched with a docstring, explaining where to download the datasets and, of course, the meaning of each Arguments and Keyword Arguments;
define a submodule, Datasets, which exports the loaders;
include the submodule in SoleData, and @reexport .Dataset;
move an old "load_arff_dataset" dispatch from SoleData to within .Datasets submodule: this dispatch is still widely used in tests, but essentially, it is a messy version of load_NATOPS loader.

mauro-milella · 2025-05-09T15:02:03Z

Some tests are not working, but #39 should fix them.

To test the loaders I propose, it would be necessary to load them inside test/data. Can we do this?
This could be an overhead too big for certain datasets (e.g., HuGaDB is 220MB).
Maybe we could crop them to just a few instances?

alberto-paparella · 2025-05-13T12:34:39Z

Dear @mauro-milella, yes using only a smaller set (warning the user) of the dataset could be a good idea.

mauro-milella added 10 commits May 9, 2025 15:45

[ADD] .Datasets submodule definition and exports

210030a

[ADD] loader for Epilepsy dataset

b6e34b7

[FIX] using moved from epilepsy-loader to datasets

90a2dbf

[ADD] loader for HuGaDB dataset

89b0fa7

[ADD] loader for Libras dataset

3363d2e

[ADD] loader for NATOPS dataset

02709fc

[MOVE] example-datasets.jl to datasets submodule

900530b

@reexport using .Datasets

602f508

using DataStructures: OrderedDict

d0facfb

[FIX] S<:AbstractString

f7538c5

mauro-milella requested review from ferdiu, giopaglia, alberto-paparella and Perro2110 May 9, 2025 14:55

mauro-milella linked an issue May 9, 2025 that may be closed by this pull request

Submodule for datasets #40

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submodule for datasets #41

Submodule for datasets #41

mauro-milella commented May 9, 2025

mauro-milella commented May 9, 2025

alberto-paparella commented May 13, 2025

Submodule for datasets #41

Are you sure you want to change the base?

Submodule for datasets #41

Conversation

mauro-milella commented May 9, 2025

mauro-milella commented May 9, 2025

alberto-paparella commented May 13, 2025