-
Notifications
You must be signed in to change notification settings - Fork 329
Closed
Description
Issue encountered
datasets
version 4.0.0 was released 1 month ago and provides several nice features like multi-processing uploads for large datasets. It would be good to relax the upper bound datasets<=4.0.0
in lighteval
so the latest features can be used.
Note that datasets>=4.0.0
deprecates the use of dataset loading scripts, so some care is needed to ensure that existing benchmarks like livecodebench
still work. For those benchmarks, the simplest thing would be to convert the dataset loading scripts into parquet format and host under the lighteval
org. This could be done with e.g.
datasets-cli convert_to_parquet <dataset_id> --trust_remote_code
cc @lhoestq for viz who has also worked a lot to ensure popular eval datasets are already migrated to Parquet
Metadata
Metadata
Assignees
Labels
No labels