Great benchmarking work!
Would it be possible to release the evaluation/inference scripts for the recent unified vision-language models (e.g., Show-o, Anole, etc.)? This would be very helpful for reproducing results and building on your work. Thanks!
Great benchmarking work!
Would it be possible to release the evaluation/inference scripts for the recent unified vision-language models (e.g., Show-o, Anole, etc.)? This would be very helpful for reproducing results and building on your work. Thanks!