CapaBench: Who's the MVP? A Game-Theoretic Evaluation Benchmark for Modular Attribution in LLM Agents
- [2025.2.19] We have released the GitHub Page.
Modular architectures in Large Language Model (LLM) agents integrate components like planning, reasoning, and reflection, yet quantifying their individual contributions remains challenging. We introduce CapaBench, a Shapley Value-based evaluation framework that systematically measures capability modules' marginal impacts. With 1,000+ multi-domain task scenarios, CapaBench enables combinatorial analysis through module substitution and interaction testing.
Some part of CapaBench is open-source, we also release the fully evaluated results of the models in the paper.
The other part of CapaBench is not open-source, for each benchmark, we provide 5 problems and 1 traj per problem as examples.
Some part of CapaBench is open-source, they're coming soon!
If you find our work useful, please cite us!
@misc{yang2025whosmvpgametheoreticevaluation,
title={Who's the MVP? A Game-Theoretic Evaluation Benchmark for Modular Attribution in LLM Agents},
author={Yingxuan Yang and Bo Huang and Siyuan Qi and Chao Feng and Haoyi Hu and Yuxuan Zhu and Jinbo Hu and Haoran Zhao and Ziyi He and Xiao Liu and Zongyu Wang and Lin Qiu and Xuezhi Cao and Xunliang Cai and Yong Yu and Weinan Zhang},
year={2025},
eprint={2502.00510},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2502.00510},
}
If you have any questions, please feel free to contact us via email [email protected]
and [email protected]