Skip to content

Feature Request: Support for Meta Chameleon 7B and 34B #7995

Closed
@arch-btw

Description

@arch-btw

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

"Meta Chameleon is a family of models that can combine text and images as input and output any combination of text and images with a single unified architecture for both encoding and decoding. While most current late-fusion models use diffusion-based learning, Meta Chameleon uses tokenization for text and images. This enables a more unified approach and makes the model easier to design, maintain, and scale. The possibilities are endless—imagine generating creative captions for images or using a mix of text prompts and images to create an entirely new scene."

Motivation

This would be a great addition to llama.cpp!

The image features look interesting but it can also simply do Text -> Text and a lot of other combinations:

  • Text -> Text
  • Image -> Image
  • Text -> Image
  • Image -> Text
  • Image -> Text + Image
  • Text + Image -> Text
  • Text -> Text + Image
  • Text + Image -> Text + Image
chameleon.mp4

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions