Awesome MLLM Hallucination

This repository collects research on the hallucination problem of the Multimodal Large Language Model(MLLM), including their papers and codes/datasets.

✈️ The main aspects involved are Surveys, Benchmarks, Hallucination Mitigation methods and some interesting papers that are not directly related to the current topic. Since some of the papers are relatively new and cannot be sure whether they have been included in the specific conferences, they are currently only marked according to the conference acceptance status of the articles that Google Scholar can find.

Besides, we have extracted the name or the core solution's category of each paper for you to read in a targeted manner, while we believe we should re-summarize them to reach a more reasonable classification when a certain number is reached. 🎆

If you find some interesting papers not included, please feel free to contact me. We will continue to update this repository! ☀️

🔷 citation >= 20 | ⭐ citation >= 50 | 🔥 citation >= 100

Papers

Surveys

Number	Title	Venue	Paper	Repo	Citation
1	A Survey of Hallucination in “Large” Foundation Models	arxiv(23.09)		➖	⭐
2	A Survey on Hallucination in Large Vision-Language Models	arxiv(24.02)		➖	➖

Benchmarks

Here are some works that could evaluate the hallucination performances of MLLMs, including some popular benchmarks. Most work products fine-tuning using their benchmark dataset, which could reduce the likelihood of hallucinating without sacrificing its performance on other benchmarks. And some papers have designed clever ways to construct such datasets.

Number	Title	Venue	Repo	Citation	Benchmark Name
1	Evaluating Object Hallucination in Large Vision-Language Models	EMNLP(2023)	➖	🔥	POPE
2	MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models	arxiv(23.06)		🔥	MME (comprehensive)
3	MMBench: Is Your Multi-modal Model an All-around Player?	arxiv(23.07)	➖	🔥	MMBench (comprehensive)
4	Evaluation and Analysis of Hallucination in Large Vision-Language Models	arxiv(23.08)		🔷	HaELM
5	Aligning Large Multimodal Models with Factually Augmented RLHF	arxiv(23.09)		🔷	MMHAL-BENCH
6	HALLUSIONBENCH: An Advanced Diagnostic Suite for Entangled Language Hallucination & Visual Illusion in Large Vision-Language Models	arxiv(23.10)		➖	HALLUSIONBENCH
7	Negative object presence evaluation (nope) to measure object hallucination in vision-language models	arxiv(23.10)	➖	➖	NOPE
8	HALLE-SWITCH: CONTROLLING OBJECT HALLUCINATION IN LARGE VISION LANGUAGE MODELS	arxiv(23.10)		➖	CCEval
9	Ferret: Refer and ground anything anywhere at any granularity	arxiv(23.10)		🔷	Ferret-Bench (consider the refer-and-ground capability)
10	Holistic Analysis of Hallucination in GPT-4V(ision):Bias and Interference Challenges	arxiv(23.11)		🔷	Bingo
11	AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation	arxiv(23.11)		➖	AMBER
12	Faithscore: Evaluating hallucinations in large vision-language models	arxiv(23.11)		➖	Faithscore (metric)
13	Mitigating Hallucination in Visual Language Models with Visual Supervision	arxiv(23.11)	➖	➖	RAHBench
14	Mitigating Open-Vocabulary Caption Hallucinations	arxiv(23.12)		➖	OpenCHAIR
15	RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback	arxiv(23.12)		➖	MHumanEval
16	Ciem: Contrastive instruction evaluation method for better instruction tuning	NeurIPS(2023) Workshop	➖	➖	Ciem (and CIT for mitigation)
17	Mitigating hallucination in large multimodal models via robust instruction tuning	ICLR(2024)	➖	🔷	GAVIE
18	Detecting and Preventing Hallucinations in Large Vision Language Models	AAAI(2024)		🔷	M-HalDetect
19	Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites	MMM(2024)		➖	FGHE/FOHE (An upgraded version of POPE)
20	Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models	AAAI-ReLM Workshop(2024)	➖	➖	MSG-MCQ
21	Eyes wide shut? exploring the visual shortcomings of multimodal llms	arxiv(24.01)	➖	➖	MMVP
22	Visual Hallucinations of Multi-modal Large Language Models	arxiv(24.02)		➖	two benchmarks generated by VHTest
23	Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models	arxiv(24.02)	➖	➖	Hal-Eval(a new category: Event Hallucination)
24	GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data	arxiv(24.02)		➖	GenCeption (no need with high-quality annotation)
25	How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts	arxiv(24.02)	➖	➖	MAD-Bench (a new category:Visual Confusion)
26	Unified Hallucination Detection for Multimodal Large Language Models	arxiv(24.02)		➖	MHaluBench
27	The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs	arxiv(24.02)		➖	CorrelationQA
28	Definition, Quantification, and Prescriptive Remediations	arxiv(24.03)	➖	➖	VHILT
29	EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models	arxiv(24.03)		➖	EgoThink

Hallucination Mitigation methods

Here are some labels that represent the core points of the papers, corresponding to mitigation methods from different angles, you could read the surveys mentioned earlier to further understand these categories:
data.: data improvement (most benchmarks) | vis.: vision enhancement | align.: multimodal alignment | dec.: decoding optimization | post.: post-process | other.: other kinds

Number	Title	Venue	Repo	Citation	Core
1	VCoder: Versatile Vision Encoders for Multimodal Large Language Models	CVPR(2024)		➖	`vis.`
2	Ferret: Refer and ground anything anywhere at any granularity	arxiv(23.10)		🔷	`vis.`
3	Enhancing the Spatial Awareness Capability of Multi-Modal Large Language Model	arxiv(23.10)	➖	➖	`vis.`
4	Video-LLaVA: Learning United Visual Representation by Alignment Before Projection	arxiv(23.11)		🔷	`vis.`
5	Mitigating Hallucination in Visual Language Models with Visual Supervision	arxiv(23.11)	➖	➖	`vis.` (with SAM -> in-context)
6	LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge	arxiv(23.11)		➖	`vis.`
7	DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models	arxiv(24.02)		➖	`vis.`
8	LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images	arxiv(24.03)		➖	`vis.`
9	Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models	arxiv(23.08)		🔷	`vis.` `align.`
10	GROUNDHOG : Grounding Large Language Models to Holistic Segmentation	arxiv(24.02)		➖	`vis.` `align.`
11	Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training	arxiv(23.08)	➖	🔷	`align.`
12	Hallucination Augmented Contrastive Learning for Multimodal Large Language Model	arxiv(23.12)		➖	`align.`
13	OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation	CVPR(2024)		➖	`dec.`
14	Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding (VCD)	arxiv(23.11)		➖	`dec.`
15	Seeing is believing mitigating hallucination in large vision-language models via clip-guided decoding	arxiv(24.02)	➖	➖	`dec.`
16	IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding	arxiv(24.02)	➖	➖	`dec.`
17	HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding	arxiv(24.03)		➖	`dec.`
18	Woodpecker: Hallucination Correction for Multimodal Large Language Models	arxiv(23.10)		🔷	`post.`
19	Analyzing and mitigating object hallucination in large vision-language models (LURE)	arxiv(23.10)		🔷	`post.`
20	TEMPORAL INSIGHT ENHANCEMENT: MITIGATING TEMPORAL HALLUCINATION IN MULTIMODAL LARGE LANGUAGE MODELS	arxiv(24.01)	➖	➖	`post.` (Correct with Tools)
21	VIGC: Visual Instruction Generation and Correction	arxiv(23.08)		➖	`other.` (Iterative Generation)
22	Can We Edit Multimodal Large Language Models?	EMNLP(2023)		➖	`other.` (Model Edition)
23	HALO:Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models	arxiv(23.08)		➖	`other.` (Knowledge Injection and Teacher-Student Approaches)
24	VOLCANO: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision	arxiv(23.11)		➖	`other.` (Self-Feedback as Visual Cues -> in-context)
25	Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization (HA-DPO)	arxiv(23.11)		➖	`other.` (trained to Favor the Non-Hallucinating Response as a Preference Selection Task)
26	SILKIE: Preference Distillation for Large Visual Language Models	arxiv(23.12)		➖	`other.` (Preference Distillation)
27	Mitigating Open-Vocabulary Caption Hallucinations (MOCHa)	arxiv(23.12)		➖	`other.` (Multi-Objective RL)
28	Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective	arxiv(24.02)		➖	`other.` (Selective EOS Supervision; Data Filtering)
29	Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models	arxiv(24.02)		➖	`other.` (through Logical Closed Loops [answer verification])
30	EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models	arxiv(24.02)	➖	➖	`other.` (Unlearning)
31	Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models	arxiv(24.02)	➖	➖	`other.` (COT)
32	All in a Single Image: Large Multimodal Models are In-Image Learners	arxiv(24.02)		➖	`other.` (In-Image Learning Mechanism)
33	Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance (MARINE)	arxiv(24.02)	➖	➖	`other.` (classifier-free guidance)
34	SKIP \N: A SIMPLE METHOD TO REDUCE HALLUCINATION IN LARGE VISION-LANGUAGE MODELS	arxiv(24.02)		➖	`other.` (Suppress Misleading Sign '\N')
35	Evaluating and Mitigating Number Hallucinations in Large Vision-Language Models: A Consistency Perspective	arxiv(24.03)	➖	➖	`other.` (inconsistency for number hallucination)
36	Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation	arxiv(24.04)		➖	`other.` (CAG)
37	Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining	WACV(2024)		➖	`data.`
38	Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning	arxiv(24.04)	➖	➖	`data.`
39	TextSquare: Scaling up Text-Centric Visual Instruction Tuning	arxiv(24.04)	➖	➖	`data.`

Others

Here are some papers that are not directly related to MLLM hallucinations, but may have unexpected inspiration for you.

Number	Title	Venue	Repo	Citation
1	Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts	ICML(2022)		🔥
2	Locating and Editing Factual Associations in GPT	NeurIPS(2022)		🔥
3	Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances	COLING(2022)		➖
4	Hallucination improves the performance of unsupervised visual representation learning	ICCV(2023)	➖	➖
5	Direct Preference Optimization: Your Language Model is Secretly a Reward Model	NeurIPS(2023)	➖	🔥
6	A Survey on Multimodal Large Language Models	arxiv(23.06)		🔥
7	Recognize Anything: A Strong Image Tagging Model	arxiv(23.06)		⭐
8	RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback	arxiv(23.09)	➖	🔥
9	Cognitive Mirage: A Review of Hallucinations in Large Language Models	arxiv(23.09)		🔷
10	The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)	arxiv(23.09)	➖	🔥
11	Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models	arxiv(23.11)		🔷
12	Polos: Multimodal Metric Learning from Human Feedback for Image Captioning	CVPR(2024)		➖
13	Successfully Guiding Humans with Imperfect Instructions by Highlighting Potential Errors and Suggesting Corrections	arxiv(24.02)		➖