Skip to content

SigLIP2 in UniPic-1 #22

@wavinflaghxm

Description

@wavinflaghxm

Similar to #9, I'm also confused why SigLIP2 isn't used in the code.

The paper says, 'Image understanding is performed using a SigLIP2 encoder to extract rich visual features, which are subsequently passed to an LLM for autoregressive text generation.'

However, the code in image2text.py uses VAE+MAR.
https://github.com/SkyworkAI/UniPic/blob/main/UniPic-1/scripts/image2text.py#L64

Also, the loss calculation does not involve the SigLIP2.
https://github.com/SkyworkAI/UniPic/blob/main/UniPic-1/src/models/skywork_unipic_dev.py#L334

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions