Hi, thanks for your great work on UniPic2.0! I have a question regarding the training resources and time cost of different stages:
For UniPic2-MetaQuery (Stage 1), when training the connector (with frozen MLLM and DiT), how many computational resources (e.g., GPUs, type, total GPU hours) were used, and approximately how many days did it take?
For UniPic2-MetaQuery (Stage 2), when jointly fine-tuning the connector and the SD3.5M-Kontext (DiT), how many resources were used and how long did this stage take?
It would be very helpful to know the approximate training setup (e.g., number/type of GPUs, training days) for both stages.
Thanks!