Skip to content

Question regarding bit allocation #3

@Ali-Flt

Description

@Ali-Flt

Hi,

In your paper you mention that you allocated 2, 3, or 4 bits to each layer of the model using a criteria. But in Fig. 1(d): Construct LUT and Query&Add, the binary weights are shown to be 8-bit. This has confused me a bit. Is the figure created with 8-bit weights in mind instead of <= 4 bit weights? Or am I misunderstanding the flow?

Another way I tried to interpret Fig. 1 is that the FP16 Shift and Query&Add blocks have to run once for every bit of W. For instance, if we have allocated 3 bits to a weight W, the ShiftAddLLM block runs 3 times, each time for one bit of the W. In this interpretation, each bit in the 8-bit binary weights in Fig. 1(d) correspond to one of the activation (x) values.

Could you please elaborate more on how the bit allocation maps to the ShiftAddLLM architecture?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions