Skip to content

Conversation

clefourrier
Copy link
Member

@clefourrier clefourrier commented Aug 18, 2025

This PR does several things.

  1. Contrain the Metric object creation.
    We go from
    f1_score_macro = CorpusLevelMetric(
        metric_name="f1",
        sample_level_fn=GenerativePreparator().prepare,
        category=SamplingMethod.GENERATIVE,
        corpus_level_fn=CorpusLevelF1Score(average="macro").compute,
        higher_is_better=True,
    )

to

    f1_score_macro = CorpusLevelMetric(
        metric_name="f1",
        sample_level_fn=GenerativePreparator(),
        category=SamplingMethod.GENERATIVE,
        corpus_level_fn=CorpusLevelF1Score(average="macro"),
        higher_is_better=True,
    )
  • sample_level_fn must derive from either a SampleLevelComputation or Preparator class. The former must implement a compute method, the latter a prepare one.
  • corpus_level_fn either is a function (np.mean and so on), or a CorpusLevelComputation which must implement a .compute_corpus class.
  1. All metrics with parametrizable sample_level_fn can now be parametrized at CLI call, example: "lighteval|math_500@k=1|0|0" (user can also use normalization function names if correctly defined in the normalization file). Corpus level parametrization is not supported but could probably be if we choose another symbol. This parametrization of Metrics.MyMetric relies on a trick in making the enum callable.

  2. The metric list has been simplified to remove duplicate metrics which were only different by some params

  3. All tasks have therefore been changed to use the new metrics names.

  4. The test suite has been updated

… all old evals to the new format and figure out how to provide sane defaults
@clefourrier clefourrier marked this pull request as draft August 18, 2025 15:35
@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@clefourrier clefourrier marked this pull request as ready for review August 20, 2025 09:45
@clefourrier clefourrier requested review from NathanHB and lewtun August 20, 2025 10:25
Copy link
Member

@NathanHB NathanHB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first round of review, few changes to do but overall so much better, great PR !

@clefourrier clefourrier requested a review from NathanHB August 20, 2025 14:34
@clefourrier clefourrier merged commit 52d3d33 into main Aug 25, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants