Skip to content

This is UNOFFICIAL code implementation for paper, "SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation".

Notifications You must be signed in to change notification settings

dhdbsrlw/SILMM-Implementation

Repository files navigation

SILMM-Implementation

Last Update: 2025.01.26; All codes are implemented by my team including me.

This is UNOFFICIAL code implementation for the paper:

📍 Notice

  1. I only implemented for SEED-LLaMA model (Discrete MLLM).
  2. By training only 1 iteration, the paper results are reproduced. (Authors trained for 3 iterations.)

📍 5-Step (code implementation)

  • 1. Compositional Prompt Generation
  • 2. Diverse Image Generation
  • 3. Decompositional Self-Questioning
  • 4. VQA-based Self Feedback
  • 5. Learning from Self-Feedback

📍 Baseline Model

Please refer to this original SEED-LLaMA repository.

About

This is UNOFFICIAL code implementation for paper, "SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages