Skip to content

amazon-science/vigor

Repository files navigation

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling (ECCV 2024)

This repository contains the evaluation dataset of image descriptions as described in the paper. This dataset contains paragraph-length descriptions of MSCOCO images, along with human annotated judgment of each description's correctness relative to the corresponding image and its creativity. The dataset is provided in the standard JSON format.

Attributions

The underlying images were selected from MS-COCO. The model used to generate the descriptions is LLaVA. The underlying LLM is based on LLaMA v1 by Meta (see the applicable license agreement).

Security

See CONTRIBUTING for more information.

License

This dataset is licensed under the CC-BY-NC-4.0 License. See the LICENSE file.

Citation: Bibtex

@inproceedings{yan-eccv2024-vigor,
    title = "ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling",
    author = "Yan, Siming  and
      Bai, Min and 
      Chen, Weifeng and 
      Zhou, Xiong and 
      Huang, Qixing and 
      Li, Erran",
    booktitle = "Proceedings of European Conference on Computer Vision 2024",
    year = "2024",
    url = "https://arxiv.org/abs/2402.06118",
}

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published