Last Update: 2025.01.26; All codes are implemented by my team including me.
This is UNOFFICIAL code implementation for the paper:
"SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation (2024.12)"
- I only implemented for SEED-LLaMA model (Discrete MLLM).
- By training only 1 iteration, the paper results are reproduced. (Authors trained for 3 iterations.)
- 1. Compositional Prompt Generation
- 2. Diverse Image Generation
- 3. Decompositional Self-Questioning
- 4. VQA-based Self Feedback
- 5. Learning from Self-Feedback
Please refer to this original SEED-LLaMA repository.