Hello ClawArena Team,
I am interested in evaluating some Agents using the ClawArena benchmark and would like to clarify a few details regarding the evaluation ecosystem:
- Leaderboard Existence: Is there currently a public leaderboard where results are officially tracked and ranked? If so, could you provide the link to it?
- Submission Policy: Is the benchmark open for external submissions? If it is, what is the standard procedure (e.g., via a specific repository, a web portal, or by opening a Pull Request with results)?
- Evaluation Protocol: Are there specific requirements or scripts we should use to ensure our local results are valid for a potential leaderboard entry?
kind regards,
Breesiu
Hello ClawArena Team,
I am interested in evaluating some Agents using the ClawArena benchmark and would like to clarify a few details regarding the evaluation ecosystem:
kind regards,
Breesiu