GitHub - amazon-science/vigor

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling (ECCV 2024)

This repository contains the evaluation dataset of image descriptions as described in the paper. This dataset contains paragraph-length descriptions of MSCOCO images, along with human annotated judgment of each description's correctness relative to the corresponding image and its creativity. The dataset is provided in the standard JSON format.

Attributions

The underlying images were selected from MS-COCO. The model used to generate the descriptions is LLaVA. The underlying LLM is based on LLaMA v1 by Meta (see the applicable license agreement).

Security

See CONTRIBUTING for more information.

License

This dataset is licensed under the CC-BY-NC-4.0 License. See the LICENSE file.

Citation: Bibtex

@inproceedings{yan-eccv2024-vigor,
    title = "ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling",
    author = "Yan, Siming  and
      Bai, Min and 
      Chen, Weifeng and 
      Zhou, Xiong and 
      Huang, Qixing and 
      Li, Erran",
    booktitle = "Proceedings of European Conference on Computer Vision 2024",
    year = "2024",
    url = "https://arxiv.org/abs/2402.06118",
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
LLaMA_LICENSE_AGREEMENT.docx		LLaMA_LICENSE_AGREEMENT.docx
README.md		README.md
annotations.json		annotations.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling (ECCV 2024)

Attributions

Security

License

Citation: Bibtex

About

Uh oh!

Releases

Packages

Uh oh!

License

amazon-science/vigor

Folders and files

Latest commit

History

Repository files navigation

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling (ECCV 2024)

Attributions

Security

License

Citation: Bibtex

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages