Inquiry regarding 12-scenario evaluation subset and dataset consistency

Hello ClawArena Team,
Thank you for your excellent work on this project. I am reaching out to ask a few details regarding the evaluation setup:
- 12-scenario Subset: Regarding the subset mentioned in Section 3.1 of the paper and the png of README.md leaderboard: which 12 specific samples constitute this subset, and is there a script or configuration file in the codebase to reproduce it?
- Dataset Discrepancy: I noticed there are only 62 evaluation entries in data/clawarena/opencode/workspaces, while the paper states there are 64. Could you clarify if some samples were deprecated or if I am looking in the wrong location?

Thank you for your time and assistance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry regarding 12-scenario evaluation subset and dataset consistency #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inquiry regarding 12-scenario evaluation subset and dataset consistency #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions