Skip to content

Benchmark metadata content: what fields and categories should we include? #2

@arishofmann

Description

@arishofmann

This issue is for collecting ideas on what metadata should be part of the BenchmarkCards, beyond what the schema currently covers. Open for everyone to contribute.

From the last meeting, two things came up that are currently missing from the cards:

Capabilities vs risks categorization

The cards currently include risk mappings but don't classify benchmarks at a higher level into capabilities vs risks. This was flagged as important for the EvalCard frontend, where people want to see benchmarks organized by what they actually measure. A few starting points were mentioned in the meeting:

  • The Eval Factsheets categorization from the Meta paper
  • The IBM capabilities taxonomy already in AI Atlas Nexus
  • The clustering approach from the survey paper that groups benchmarks by what they measure

Domain taxonomy

Some benchmarks are domain-specific (medical, legal, code, etc.). HF dataset cards sometimes have this but not consistently. Anna's BenchNavigator dataset might already cover a lot of this.

How to contribute

If you have ideas for fields, categories, or taxonomies that should be included, please comment here. Help is welcome on:

  • Reviewing what's already in the BenchmarkCard schema and mapping it against what's missing
  • Exploring what's feasible to populate automatically from existing sources (Anna's dataset, HF metadata, IBM taxonomies)
  • Proposing additional metadata fields that would be useful

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions