Closed
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
"Meta Chameleon is a family of models that can combine text and images as input and output any combination of text and images with a single unified architecture for both encoding and decoding. While most current late-fusion models use diffusion-based learning, Meta Chameleon uses tokenization for text and images. This enables a more unified approach and makes the model easier to design, maintain, and scale. The possibilities are endless—imagine generating creative captions for images or using a mix of text prompts and images to create an entirely new scene."
Motivation
This would be a great addition to llama.cpp!
The image features look interesting but it can also simply do Text -> Text and a lot of other combinations:
- Text -> Text
- Image -> Image
- Text -> Image
- Image -> Text
- Image -> Text + Image
- Text + Image -> Text
- Text -> Text + Image
- Text + Image -> Text + Image