Releases: evaleval/every_eval_ever
Releases · evaleval/every_eval_ever
v0.2.2
What's Changed
- fix: handle missing metadata in Inspect adapter by @mrshu in #53
- added handling for Inspect eval str only sandbox args by @joeda in #55
- Add metric identity and deterministic aggregate-instance linkage fields by @yananlong in #52
- Update HELM leaderboards parser for schema v0.2.1 by @damian1996 in #56
- fix: handle list[ChatMessage] input in inspect instance level adapter by @SirGankalot in #62
- Add different file uuid for directory mode in inspect converter by @damian1996 in #61
- Move non-core dependencies to optional extras by @j-chim in #71
- Add eval_library into HELM leaderboards scrapper by @damian1996 in #73
- [New Feature] Moving validation from jsonschema -> pydantic + github action to regenerate datamodels and apply post-modification upon schema modification by @nelaturuharsha in #69
- fix action uv.lock by @nelaturuharsha in #74
- chore: regenerate Pydantic types by @github-actions[bot] in #80
- Refactor code into every_eval_ever namespace and add modal CLI by @Erotemic in #57
- Adapter fixes by @nelaturuharsha in #85
New Contributors
- @joeda made their first contribution in #55
- @yananlong made their first contribution in #52
- @SirGankalot made their first contribution in #62
- @j-chim made their first contribution in #71
- @github-actions[bot] made their first contribution in #80
- @Erotemic made their first contribution in #57
Full Changelog: v0.2.1...v0.2.2
v0.2.1
What's Changed
Main Schema Changes
Breaking Changes
- Added EvalLibrary to capture evaluation library and related information which is mandatory.
- Changed interactions and relevant instances of word 'interactions' to messages for consistency with general terminology based on feedback.
Non-breaking Changes
- Cleanup to remove ambiguous types that were messing with parquet conversion (and the HF viewer)
- additional_details scope is replaced and is now clearer, added to judge_config, metric_config, source_metadata, detailed_evaluation_results and eval_library.
Other Changes
- Added post_codegen to modify the datamodels generated as they didn't capture additional constraints.
- Fix dead data/ links, point to HuggingFace datastore by @janbatzner in #49
- [Docs] Add PR Naming Convention Section to README.md by @andrewtran117 in #50
- Schema -> 0.2.1 by @nelaturuharsha in #51
Full Changelog: v0.2.0...v0.2.1
v0.2.0 Update
What's Changed
Main Schema Changes
Breaking Changes
- Moves
source_datainto individualevaluation_resultsto allow different sources per metric/result
Non-breaking Changes
Multiple new properties were defined including:
- Uncertainty measures on scores (e.g. Confidence Intervals) in
score_details - A schema for LLM-as-a-Judge evaluations to the
metric_config - A schema for 'agentic' properties to the generation config (
agentic_eval_configingeneration_config) - A top-level
evaluation_timestamp
Other Changes
- Adds instance-level schema for storing individual results
- Moves data from this repo to a dedicated Hugging Face dataset
- New Adapter for
lm-eval-harness, - Improved parsers and adapters for HELM and Inspect
Full Changelog: v0.1.0...v0.2.0
v0.1.0 Schema Update
This schema changes the version from v0.0.1 --> v0.1.0 and introduces 3 updates:
- Moves
evaluation_sourceundersource_metadata - Changes source_type enum values to 'documentation' and 'evaluation_run'
- Adds two additional common generation parameters: 'reasoning' and 'execution_command'