Skip to content

A comprehensive collection of resources focused on addressing and understanding hallucination phenomena in MLLMs.

Notifications You must be signed in to change notification settings

lorraine021/Awesome-MLLM-Hallucination

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

94 Commits
Β 
Β 

Repository files navigation

Awesome MLLM HallucinationAwesome

This repository collects research on the hallucination problem of the Multimodal Large Language Model(MLLM), including their papers and codes/datasets.

✈️ The main aspects involved are Surveys, Benchmarks, Hallucination Mitigation methods and some interesting papers that are not directly related to the current topic. Since some of the papers are relatively new and cannot be sure whether they have been included in the specific conferences, they are currently only marked according to the conference acceptance status of the articles that Google Scholar can find.

Besides, we have extracted the name or the core solution's category of each paper for you to read in a targeted manner, while we believe we should re-summarize them to reach a more reasonable classification when a certain number is reached. πŸŽ†

If you find some interesting papers not included, please feel free to contact me. We will continue to update this repository! β˜€οΈ

πŸ”· citation >= 20   |   ⭐ citation >= 50   |   πŸ”₯ citation >= 100

Contents

Papers

Surveys

Number Title Venue Paper Repo Citation
1 A Survey of Hallucination in β€œLarge” Foundation Models arxiv(23.09) arXiv βž– ⭐
2 A Survey on Hallucination in Large Vision-Language Models arxiv(24.02) arXiv βž– βž–

Benchmarks

Here are some works that could evaluate the hallucination performances of MLLMs, including some popular benchmarks. Most work products fine-tuning using their benchmark dataset, which could reduce the likelihood of hallucinating without sacrificing its performance on other benchmarks. And some papers have designed clever ways to construct such datasets.

Number Title Venue Paper Repo Citation Benchmark Name
1 Evaluating Object Hallucination in Large Vision-Language Models EMNLP(2023) arXiv βž– πŸ”₯ POPE
2 MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models arxiv(23.06) arXiv GitHub Page πŸ”₯ MME (comprehensive)
3 MMBench: Is Your Multi-modal Model an All-around Player? arxiv(23.07) arXiv βž– πŸ”₯ MMBench (comprehensive)
4 Evaluation and Analysis of Hallucination in Large Vision-Language Models arxiv(23.08) arXiv GitHub Page πŸ”· HaELM
5 Aligning Large Multimodal Models with Factually Augmented RLHF arxiv(23.09) arXiv GitHub Page πŸ”· MMHAL-BENCH
6 HALLUSIONBENCH: An Advanced Diagnostic Suite for Entangled Language Hallucination & Visual Illusion in Large Vision-Language Models arxiv(23.10) arXiv Google Drive βž– HALLUSIONBENCH
7 Negative object presence evaluation (nope) to measure object hallucination in vision-language models arxiv(23.10) arXiv βž– βž– NOPE
8 HALLE-SWITCH: CONTROLLING OBJECT HALLUCINATION IN LARGE VISION LANGUAGE MODELS arxiv(23.10) arXiv GitHub Page βž– CCEval
9 Ferret: Refer and ground anything anywhere at any granularity arxiv(23.10) arXiv GitHub Page πŸ”· Ferret-Bench (consider the refer-and-ground capability)
10 Holistic Analysis of Hallucination in GPT-4V(ision):Bias and Interference Challenges arxiv(23.11) arXiv GitHub Page πŸ”· Bingo
11 AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation arxiv(23.11) arXiv GitHub Page βž– AMBER
12 Faithscore: Evaluating hallucinations in large vision-language models arxiv(23.11) arXiv GitHub Page βž– Faithscore (metric)
13 Mitigating Hallucination in Visual Language Models with Visual Supervision arxiv(23.11) arXiv βž– βž– RAHBench
14 Mitigating Open-Vocabulary Caption Hallucinations arxiv(23.12) arXiv GitHub Page βž– OpenCHAIR
15 RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback arxiv(23.12) arXiv GitHub Page βž– MHumanEval
16 Ciem: Contrastive instruction evaluation method for better instruction tuning NeurIPS(2023) Workshop arXiv βž– βž– Ciem (and CIT for mitigation)
17 Mitigating hallucination in large multimodal models via robust instruction tuning ICLR(2024) arXiv βž– πŸ”· GAVIE
18 Detecting and Preventing Hallucinations in Large Vision Language Models AAAI(2024) arXiv GitHub Page πŸ”· M-HalDetect
19 Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites MMM(2024) arXiv GitHub Page βž– FGHE/FOHE (An upgraded version of POPE)
20 Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models AAAI-ReLM Workshop(2024) arXiv βž– βž– MSG-MCQ
21 Eyes wide shut? exploring the visual shortcomings of multimodal llms arxiv(24.01) arXiv βž– βž– MMVP
22 Visual Hallucinations of Multi-modal Large Language Models arxiv(24.02) arXiv GitHub Page βž– two benchmarks generated by VHTest
23 Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models arxiv(24.02) arXiv βž– βž– Hal-Eval(a new category: Event Hallucination)
24 GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data arxiv(24.02) arXiv GitHub Page βž– GenCeption (no need with high-quality annotation)
25 How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts arxiv(24.02) arXiv βž– βž– MAD-Bench (a new category:Visual Confusion)
26 Unified Hallucination Detection for Multimodal Large Language Models arxiv(24.02) arXiv GitHub Page βž– MHaluBench
27 The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs arxiv(24.02) arXiv GitHub Page βž– CorrelationQA
28 Definition, Quantification, and Prescriptive Remediations arxiv(24.03) arXiv βž– βž– VHILT
29 EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models arxiv(24.03) arXiv GitHub Page βž– EgoThink

Hallucination Mitigation methods

Here are some labels that represent the core points of the papers, corresponding to mitigation methods from different angles, you could read the surveys mentioned earlier to further understand these categories:
data.: data improvement (most benchmarks)   |   vis.: vision enhancement   |   align.: multimodal alignment   |   dec.: decoding optimization   |   post.: post-process   |   other.: other kinds

Number Title Venue Paper Repo Citation Core
1 VCoder: Versatile Vision Encoders for Multimodal Large Language Models CVPR(2024) arXiv GitHub Page βž– vis.
2 Ferret: Refer and ground anything anywhere at any granularity arxiv(23.10) arXiv GitHub Page πŸ”· vis.
3 Enhancing the Spatial Awareness Capability of Multi-Modal Large Language Model arxiv(23.10) arXiv βž– βž– vis.
4 Video-LLaVA: Learning United Visual Representation by Alignment Before Projection arxiv(23.11) arXiv GitHub Page πŸ”· vis.
5 Mitigating Hallucination in Visual Language Models with Visual Supervision arxiv(23.11) arXiv βž– βž– vis. (with SAM -> in-context)
6 LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge arxiv(23.11) arXiv GitHub Page βž– vis.
7 DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models arxiv(24.02) arXiv GitHub Page βž– vis.
8 LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images arxiv(24.03) arXiv GitHub Page βž– vis.
9 Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models arxiv(23.08) arXiv GitHub Page πŸ”· vis. align.
10 GROUNDHOG : Grounding Large Language Models to Holistic Segmentation arxiv(24.02) arXiv GitHub Page βž– vis. align.
11 Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training arxiv(23.08) arXiv βž– πŸ”· align.
12 Hallucination Augmented Contrastive Learning for Multimodal Large Language Model arxiv(23.12) arXiv GitHub Page βž– align.
13 OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation CVPR(2024) arXiv GitHub Page βž– dec.
14 Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding (VCD) arxiv(23.11) arXiv GitHub Page βž– dec.
15 Seeing is believing mitigating hallucination in large vision-language models via clip-guided decoding arxiv(24.02) arXiv βž– βž– dec.
16 IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding arxiv(24.02) arXiv βž– βž– dec.
17 HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding arxiv(24.03) arXiv GitHub Page βž– dec.
18 Woodpecker: Hallucination Correction for Multimodal Large Language Models arxiv(23.10) arXiv GitHub Page πŸ”· post.
19 Analyzing and mitigating object hallucination in large vision-language models (LURE) arxiv(23.10) arXiv GitHub Page πŸ”· post.
20 TEMPORAL INSIGHT ENHANCEMENT: MITIGATING TEMPORAL HALLUCINATION IN MULTIMODAL LARGE LANGUAGE MODELS arxiv(24.01) arXiv βž– βž– post. (Correct with Tools)
21 VIGC: Visual Instruction Generation and Correction arxiv(23.08) arXiv GitHub Page βž– other. (Iterative Generation)
22 Can We Edit Multimodal Large Language Models? EMNLP(2023) arXiv GitHub Page βž– other. (Model Edition)
23 HALO:Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models arxiv(23.08) arXiv GitHub Page βž– other. (Knowledge Injection and Teacher-Student Approaches)
24 VOLCANO: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision arxiv(23.11) arXiv GitHub Page βž– other. (Self-Feedback as Visual Cues -> in-context)
25 Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization (HA-DPO) arxiv(23.11) arXiv GitHub Page βž– other. (trained to Favor the Non-Hallucinating Response as a Preference Selection Task)
26 SILKIE: Preference Distillation for Large Visual Language Models arxiv(23.12) arXiv GitHub Page βž– other. (Preference Distillation)
27 Mitigating Open-Vocabulary Caption Hallucinations (MOCHa) arxiv(23.12) arXiv GitHub Page βž– other. (Multi-Objective RL)
28 Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective arxiv(24.02) arXiv GitHub Page βž– other. (Selective EOS Supervision; Data Filtering)
29 Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models arxiv(24.02) arXiv GitHub Page βž– other. (through Logical Closed Loops [answer verification])
30 EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models arxiv(24.02) arXiv βž– βž– other. (Unlearning)
31 Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models arxiv(24.02) arXiv βž– βž– other. (COT)
32 All in a Single Image: Large Multimodal Models are In-Image Learners arxiv(24.02) arXiv GitHub Page βž– other. (In-Image Learning Mechanism)
33 Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance (MARINE) arxiv(24.02) arXiv βž– βž– other. (classifier-free guidance)
34 SKIP \N: A SIMPLE METHOD TO REDUCE HALLUCINATION IN LARGE VISION-LANGUAGE MODELS arxiv(24.02) arXiv GitHub Page βž– other. (Suppress Misleading Sign '\N')
35 Evaluating and Mitigating Number Hallucinations in Large Vision-Language Models: A Consistency Perspective arxiv(24.03) arXiv βž– βž– other. (inconsistency for number hallucination)
36 Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation arxiv(24.04) arXiv GitHub Page βž– other. (CAG)
37 Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining WACV(2024) arXiv GitHub Page βž– data.
38 Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning arxiv(24.04) arXiv βž– βž– data.
39 TextSquare: Scaling up Text-Centric Visual Instruction Tuning arxiv(24.04) arXiv βž– βž– data.

Others

Here are some papers that are not directly related to MLLM hallucinations, but may have unexpected inspiration for you.

Number Title Venue Paper Repo Citation
1 Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts ICML(2022) arXiv GitHub Page πŸ”₯
2 Locating and Editing Factual Associations in GPT NeurIPS(2022) arXiv GitHub Page πŸ”₯
3 Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances COLING(2022) arXiv GitHub Page βž–
4 Hallucination improves the performance of unsupervised visual representation learning ICCV(2023) arXiv βž– βž–
5 Direct Preference Optimization: Your Language Model is Secretly a Reward Model NeurIPS(2023) arXiv βž– πŸ”₯
6 A Survey on Multimodal Large Language Models arxiv(23.06) arXiv GitHub Page πŸ”₯
7 Recognize Anything: A Strong Image Tagging Model arxiv(23.06) arXiv GitHub Page ⭐
8 RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback arxiv(23.09) arXiv βž– πŸ”₯
9 Cognitive Mirage: A Review of Hallucinations in Large Language Models arxiv(23.09) arXiv GitHub Page πŸ”·
10 The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) arxiv(23.09) arXiv βž– πŸ”₯
11 Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models arxiv(23.11) arXiv GitHub Page πŸ”·
12 Polos: Multimodal Metric Learning from Human Feedback for Image Captioning CVPR(2024) arXiv GitHub Page βž–
13 Successfully Guiding Humans with Imperfect Instructions by Highlighting Potential Errors and Suggesting Corrections arxiv(24.02) arXiv GitHub Page βž–

About

A comprehensive collection of resources focused on addressing and understanding hallucination phenomena in MLLMs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published