Skip to content

Investigate different options for splitting up target model matrices #243

@danbraunai

Description

@danbraunai

Oli mentioned this in a convo earlier "Couldn't we just combine the up_proj and gate_proj weight matrices in a target model SWIGLU and decompose it as a single matrix, and then later split it up?" This is similar to how we have the choice of whether to decompose QKV as a single matrix (or QK/QV/KV) or to decompose Q, K, V separately, or even to decompose each head separately.

I never really appreciated the optionality we have here. I think we can utilise this optionality in order to test the method. SPD + clustering should arrive at the same results regardless of the way you split these things up. If it doesn't, finding out where and why it doesn't would be helpful in debugging the methods.

This is probably more of a future investigation once we have some confidence that our decomp method (+ clustering) is reasonable.

The easiest way to manage this would probably be to just train new target models with the different architectures using https://github.com/goodfire-ai/simple_stories_train/tree/dev .

Metadata

Metadata

Assignees

No one assigned

    Labels

    Priority: nInUnot Important & not Urgentexp/investigationA research investigation or experiment

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions