-
Notifications
You must be signed in to change notification settings - Fork 40
Open
Description
Similar to #9, I'm also confused why SigLIP2 isn't used in the code.
The paper says, 'Image understanding is performed using a SigLIP2 encoder to extract rich visual features, which are subsequently passed to an LLM for autoregressive text generation.'
However, the code in image2text.py uses VAE+MAR.
https://github.com/SkyworkAI/UniPic/blob/main/UniPic-1/scripts/image2text.py#L64
Also, the loss calculation does not involve the SigLIP2.
https://github.com/SkyworkAI/UniPic/blob/main/UniPic-1/src/models/skywork_unipic_dev.py#L334
Metadata
Metadata
Assignees
Labels
No labels