Skip to content

Releases: evaleval/every_eval_ever

v0.2.2

27 Mar 08:42
411f60b

Choose a tag to compare

What's Changed

  • fix: handle missing metadata in Inspect adapter by @mrshu in #53
  • added handling for Inspect eval str only sandbox args by @joeda in #55
  • Add metric identity and deterministic aggregate-instance linkage fields by @yananlong in #52
  • Update HELM leaderboards parser for schema v0.2.1 by @damian1996 in #56
  • fix: handle list[ChatMessage] input in inspect instance level adapter by @SirGankalot in #62
  • Add different file uuid for directory mode in inspect converter by @damian1996 in #61
  • Move non-core dependencies to optional extras by @j-chim in #71
  • Add eval_library into HELM leaderboards scrapper by @damian1996 in #73
  • [New Feature] Moving validation from jsonschema -> pydantic + github action to regenerate datamodels and apply post-modification upon schema modification by @nelaturuharsha in #69
  • fix action uv.lock by @nelaturuharsha in #74
  • chore: regenerate Pydantic types by @github-actions[bot] in #80
  • Refactor code into every_eval_ever namespace and add modal CLI by @Erotemic in #57
  • Adapter fixes by @nelaturuharsha in #85

New Contributors

Full Changelog: v0.2.1...v0.2.2

v0.2.1

19 Feb 20:11
0ce59f1

Choose a tag to compare

What's Changed

Main Schema Changes

Breaking Changes

  • Added EvalLibrary to capture evaluation library and related information which is mandatory.
  • Changed interactions and relevant instances of word 'interactions' to messages for consistency with general terminology based on feedback.

Non-breaking Changes

  • Cleanup to remove ambiguous types that were messing with parquet conversion (and the HF viewer)
  • additional_details scope is replaced and is now clearer, added to judge_config, metric_config, source_metadata, detailed_evaluation_results and eval_library.

Other Changes

  • Added post_codegen to modify the datamodels generated as they didn't capture additional constraints.

Full Changelog: v0.2.0...v0.2.1

v0.2.0 Update

17 Feb 13:57
18b7903

Choose a tag to compare

What's Changed

Main Schema Changes

Breaking Changes

  • Moves source_data into individual evaluation_results to allow different sources per metric/result

Non-breaking Changes

Multiple new properties were defined including:

  • Uncertainty measures on scores (e.g. Confidence Intervals) in score_details
  • A schema for LLM-as-a-Judge evaluations to the metric_config
  • A schema for 'agentic' properties to the generation config (agentic_eval_config in generation_config)
  • A top-level evaluation_timestamp

Other Changes

  • Adds instance-level schema for storing individual results
  • Moves data from this repo to a dedicated Hugging Face dataset
  • New Adapter for lm-eval-harness,
  • Improved parsers and adapters for HELM and Inspect

Full Changelog: v0.1.0...v0.2.0

v0.1.0 Schema Update

16 Dec 14:07
4d807d0

Choose a tag to compare

This schema changes the version from v0.0.1 --> v0.1.0 and introduces 3 updates:

  1. Moves evaluation_source under source_metadata
  2. Changes source_type enum values to 'documentation' and 'evaluation_run'
  3. Adds two additional common generation parameters: 'reasoning' and 'execution_command'