Could you share the evaluation scripts for unified vision-language models?

Great benchmarking work!

Would it be possible to release the evaluation/inference scripts for the recent unified vision-language models (e.g., Show-o, Anole, etc.)? This would be very helpful for reproducing results and building on your work. Thanks!