PERSONALIZED VISUAL INSTRUCTION TUNING

Official Repository of the paper: Personalized Visual Instruct Tuning

Update

🚀🚀 PVIT-3M dataset has released in Huggingface.
Our paper is now available at: https://arxiv.org/abs/2410.07113.

To Do List (Full code will be released after the work is accepted.)

Release PVIT-3M dataset here.
Release scripts for generating PVIT dataset.
Release our code for training.

Introduction

Recent advancements in multimodal large language models (MLLMs) have demonstrated significant progress; however, these models exhibit a notable limitation, which we refer to as “face blindness”. Specifically, they can engage in general conversations but fail to conduct personalized dialogues targeting at specific individuals. This deficiency hinders the application of MLLMs in personalized settings, such as tailored visual assistants on mobile devices, or domestic robots that need to recognize members of the family. In this paper, we introduce Personalized Visual Instruction Tuning (PVIT), a novel data curation and training framework designed to enable MLLMs to identify target individuals within an image and engage in personalized and coherent dialogues. Our approach involves the development of a sophisticated pipeline that autonomously generates training data containing personalized conversations. This pipeline leverages the capabilities of various visual experts, image generation models, and (multi-modal) large language models. To evaluate the personalized potential of MLLMs, we present a benchmark called P-Bench, which encompasses various question types with different levels of difficulty. The experiments demonstrate a substantial personalized performance enhancement after fine-tuning with our curated dataset.

Case Study

Dataset Statistic

Citation

If you find our work useful, please cite using this BibTeX:

@misc{pi2024personalizedvisualinstructiontuning,
      title={Personalized Visual Instruction Tuning}, 
      author={Renjie Pi and Jianshu Zhang and Tianyang Han and Jipeng Zhang and Rui Pan and Tong Zhang},
      year={2024},
      eprint={2410.07113},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.07113}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
construction/template		construction/template
prompt		prompt
visual_concept		visual_concept
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PERSONALIZED VISUAL INSTRUCTION TUNING

Update

To Do List (Full code will be released after the work is accepted.)

Introduction

Case Study

Dataset Statistic

Citation

About

Releases

Packages

Languages

sterzhang/PVIT

Folders and files

Latest commit

History

Repository files navigation

PERSONALIZED VISUAL INSTRUCTION TUNING

Update

To Do List (Full code will be released after the work is accepted.)

Introduction

Case Study

Dataset Statistic

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages