Releases · evaleval/every_eval_ever

27 Mar 08:42

github-actions

v0.2.2

411f60b

v0.2.2 Latest

Latest

What's Changed

fix: handle missing metadata in Inspect adapter by @mrshu in #53
added handling for Inspect eval str only sandbox args by @joeda in #55
Add metric identity and deterministic aggregate-instance linkage fields by @yananlong in #52
Update HELM leaderboards parser for schema v0.2.1 by @damian1996 in #56
fix: handle list[ChatMessage] input in inspect instance level adapter by @SirGankalot in #62
Add different file uuid for directory mode in inspect converter by @damian1996 in #61
Move non-core dependencies to optional extras by @j-chim in #71
Add eval_library into HELM leaderboards scrapper by @damian1996 in #73
[New Feature] Moving validation from jsonschema -> pydantic + github action to regenerate datamodels and apply post-modification upon schema modification by @nelaturuharsha in #69
fix action uv.lock by @nelaturuharsha in #74
chore: regenerate Pydantic types by @github-actions[bot] in #80
Refactor code into every_eval_ever namespace and add modal CLI by @Erotemic in #57
Adapter fixes by @nelaturuharsha in #85

New Contributors

@joeda made their first contribution in #55
@yananlong made their first contribution in #52
@SirGankalot made their first contribution in #62
@j-chim made their first contribution in #71
@github-actions[bot] made their first contribution in #80
@Erotemic made their first contribution in #57

Full Changelog: v0.2.1...v0.2.2

Contributors

joeda, mrshu, and 6 other contributors

Assets 4

19 Feb 20:11

nelaturuharsha

v0.2.1

0ce59f1

v0.2.1

What's Changed

Main Schema Changes

Breaking Changes

Added EvalLibrary to capture evaluation library and related information which is mandatory.
Changed interactions and relevant instances of word 'interactions' to messages for consistency with general terminology based on feedback.

Non-breaking Changes

Cleanup to remove ambiguous types that were messing with parquet conversion (and the HF viewer)
additional_details scope is replaced and is now clearer, added to judge_config, metric_config, source_metadata, detailed_evaluation_results and eval_library.

Other Changes

Added post_codegen to modify the datamodels generated as they didn't capture additional constraints.

Fix dead data/ links, point to HuggingFace datastore by @janbatzner in #49
[Docs] Add PR Naming Convention Section to README.md by @andrewtran117 in #50
Schema -> 0.2.1 by @nelaturuharsha in #51

Full Changelog: v0.2.0...v0.2.1

Contributors

nelaturuharsha, andrewtran117, and janbatzner

Assets 2

17 Feb 13:57

akornilo

v0.2.0

18b7903

v0.2.0 Update

What's Changed

Main Schema Changes

Breaking Changes

Moves source_data into individual evaluation_results to allow different sources per metric/result

Non-breaking Changes

Multiple new properties were defined including:

Uncertainty measures on scores (e.g. Confidence Intervals) in score_details
A schema for LLM-as-a-Judge evaluations to the metric_config
A schema for 'agentic' properties to the generation config (agentic_eval_config in generation_config)
A top-level evaluation_timestamp

Other Changes

Adds instance-level schema for storing individual results
Moves data from this repo to a dedicated Hugging Face dataset
New Adapter for lm-eval-harness,
Improved parsers and adapters for HELM and Inspect

Full Changelog: v0.1.0...v0.2.0

Assets 2

16 Dec 14:07

akornilo

v0.1.0

4d807d0

v0.1.0 Schema Update

This schema changes the version from v0.0.1 --> v0.1.0 and introduces 3 updates:

Moves evaluation_source under source_metadata
Changes source_type enum values to 'documentation' and 'evaluation_run'
Adds two additional common generation parameters: 'reasoning' and 'execution_command'

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

Main Schema Changes

Other Changes

Contributors

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

Main Schema Changes

Other Changes

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: evaleval/every_eval_ever

v0.2.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.1

What's Changed

Main Schema Changes

Other Changes

Contributors

Uh oh!

v0.2.0 Update

What's Changed

Main Schema Changes

Other Changes

Uh oh!

v0.1.0 Schema Update

Uh oh!