CapaBench: Who's the MVP? A Game-Theoretic Evaluation Benchmark for Modular Attribution in LLM Agents

🔥 News

[2025.2.19] We have released the GitHub Page.

📖 Overview

Modular architectures in Large Language Model (LLM) agents integrate components like planning, reasoning, and reflection, yet quantifying their individual contributions remains challenging. We introduce CapaBench, a Shapley Value-based evaluation framework that systematically measures capability modules' marginal impacts. With 1,000+ multi-domain task scenarios, CapaBench enables combinatorial analysis through module substitution and interaction testing.

📊 Data

Some part of CapaBench is open-source, we also release the fully evaluated results of the models in the paper.

The other part of CapaBench is not open-source, for each benchmark, we provide 5 problems and 1 traj per problem as examples.

📝 How to Evaluate

Some part of CapaBench is open-source, they're coming soon!

📑 Citation

If you find our work useful, please cite us!

@misc{yang2025whosmvpgametheoreticevaluation,
      title={Who's the MVP? A Game-Theoretic Evaluation Benchmark for Modular Attribution in LLM Agents}, 
      author={Yingxuan Yang and Bo Huang and Siyuan Qi and Chao Feng and Haoyi Hu and Yuxuan Zhu and Jinbo Hu and Haoran Zhao and Ziyi He and Xiao Liu and Zongyu Wang and Lin Qiu and Xuezhi Cao and Xunliang Cai and Yong Yu and Weinan Zhang},
      year={2025},
      eprint={2502.00510},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2502.00510}, 
}

📧 Contact Us

If you have any questions, please feel free to contact us via email [email protected] and [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
code		code
data		data
docs/static/images		docs/static/images
images		images
README.md		README.md
README_zh.md		README_zh.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CapaBench: Who's the MVP? A Game-Theoretic Evaluation Benchmark for Modular Attribution in LLM Agents

🔥 News

📖 Overview

📊 Data

📝 How to Evaluate

📑 Citation

📧 Contact Us

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

zoe-yyx/CapaBench

Folders and files

Latest commit

History

Repository files navigation

CapaBench: Who's the MVP? A Game-Theoretic Evaluation Benchmark for Modular Attribution in LLM Agents

🔥 News

📖 Overview

📊 Data

📝 How to Evaluate

📑 Citation

📧 Contact Us

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages