This repository collects research on the hallucination problem of the Multimodal Large Language Model(MLLM), including their papers and codes/datasets.
Besides, we have extracted the name or the core solution's category of each paper for you to read in a targeted manner, while we believe we should re-summarize them to reach a more reasonable classification when a certain number is reached. π
If you find some interesting papers not included, please feel free to contact me. We will continue to update this repository! βοΈ
π· citation >= 20 β | β β citation >= 50 β | β π₯ citation >= 100
Here are some works that could evaluate the hallucination performances of MLLMs, including some popular benchmarks. Most work products fine-tuning using their benchmark dataset, which could reduce the likelihood of hallucinating without sacrificing its performance on other benchmarks. And some papers have designed clever ways to construct such datasets.
Here are some labels that represent the core points of the papers, corresponding to mitigation methods from different angles, you could read the surveys mentioned earlier to further understand these categories:
data.
: data improvement (most benchmarks) β | β vis.
: vision enhancement β | β
align.
: multimodal alignment β | β
dec.
: decoding optimization β | β post.
: post-process β | β other.
: other kinds
Here are some papers that are not directly related to MLLM hallucinations, but may have unexpected inspiration for you.