Benchmark metadata content: what fields and categories should we include?

This issue is for collecting ideas on what metadata should be part of the BenchmarkCards, beyond what the schema currently covers. Open for everyone to contribute.

From the last meeting, two things came up that are currently missing from the cards:

## Capabilities vs risks categorization

The cards currently include risk mappings but don't classify benchmarks at a higher level into capabilities vs risks. This was flagged as important for the EvalCard frontend, where people want to see benchmarks organized by what they actually measure. A few starting points were mentioned in the meeting:

- The Eval Factsheets categorization from the Meta paper
- The IBM capabilities taxonomy already in AI Atlas Nexus
- The clustering approach from the survey paper that groups benchmarks by what they measure

## Domain taxonomy

Some benchmarks are domain-specific (medical, legal, code, etc.). HF dataset cards sometimes have this but not consistently. Anna's BenchNavigator dataset might already cover a lot of this.

## How to contribute

If you have ideas for fields, categories, or taxonomies that should be included, please comment here. Help is welcome on:

- Reviewing what's already in the BenchmarkCard schema and mapping it against what's missing
- Exploring what's feasible to populate automatically from existing sources (Anna's dataset, HF metadata, IBM taxonomies)
- Proposing additional metadata fields that would be useful

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark metadata content: what fields and categories should we include? #2

Capabilities vs risks categorization

Domain taxonomy

How to contribute

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmark metadata content: what fields and categories should we include? #2

Description

Capabilities vs risks categorization

Domain taxonomy

How to contribute

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions