Hi, authors, wonderful work.
I notice you have validated the world state model's performance on three datasets:
- agentrewardbench: already open-sourced
- OSworld-full trajectories
- Prof/Office trajectories
Since we are trying to make a fair comparison, your release of the last two evaluation datasets will help us a lot.