Skip to content

[FT] Relax upper bound on datasets dependency #910

@lewtun

Description

@lewtun

Issue encountered

datasets version 4.0.0 was released 1 month ago and provides several nice features like multi-processing uploads for large datasets. It would be good to relax the upper bound datasets<=4.0.0 in lighteval so the latest features can be used.

Note that datasets>=4.0.0 deprecates the use of dataset loading scripts, so some care is needed to ensure that existing benchmarks like livecodebench still work. For those benchmarks, the simplest thing would be to convert the dataset loading scripts into parquet format and host under the lighteval org. This could be done with e.g.

datasets-cli convert_to_parquet <dataset_id> --trust_remote_code

cc @lhoestq for viz who has also worked a lot to ensure popular eval datasets are already migrated to Parquet

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions